1, Sound is a wave , Can be heard , Its vibration frequency is 20~20 000 Hz between .

2, The process of speech production

The formation of speech : Air from lungs to throat , Enter the vocal tract through the vocal cord , Finally, sound waves radiate from the mouth , Forming voice .
3, Classification of sound ( concept : understand + memory )

Dullness : Vocal cord tension , When the air flows through, the opening will become a periodic action of opening and closing , Cause periodic excited air flow , as a,o;

      ( A sound produced by the vibration of a vocal cord ), Including all vowels and some consonants .

voiceless consonant : Vocal cords fully extended , A narrow passage formed by contraction of a part of the vocal tract , Generate air turbulence , as t,d;

  ( A sound not produced by vocal cord vibration )

Plosive sound : Vocal cords fully extended , A part of the vocal tract is completely closed , Once the closing point suddenly opens , Rapid release of air pressure , as b,p.
4, Two important acoustic characteristics of speech —— Pitch frequency and formant ( memory )

Pitch frequency (F0): By vocal cord size , Characteristics and tension of vocal cords , Its value is equal to the reciprocal of the time when the vocal cords open and close once ( This is the definition of the pitch period ). The range of human pitch frequency is 80~
500 Hz about .

Formant (Fn , n=1,2,...): The channel is a resonator , It amplifies some frequency components of the sound stream and attenuates others , The amplified frequency is called the resonance peak or resonance peak frequency .

5, Formant characteristics :( understand )

Formant is an important acoustic characteristic of a channel . Channel response to an excitation signal , It can be approximately described by a linear system with many pairs of poles . Each pair of poles corresponds to a resonant peak frequency
. The frequency response characteristic of this linear system is called the formant characteristic , It determines the overall profile of the signal spectrum , Or spectral envelope .

The frequency characteristic of speech is mainly determined by the formant . And the formant characteristic of the channel determines the spectrum characteristic of the generated sound , I.e. timbre . 

The timbre and distinguishing features of vowels mainly depend on the formant characteristics of the channel . The formant characteristics can be observed from the amplitude frequency characteristics of speech signal spectrum analysis .

6, A complete digital model of speech signal generation :( Can draw pictures + Explain the principle and characteristics of each part of the model )

We can regard speech signal as quasi periodic sequence or random noise sequence as the output of the excitation linear non shift system , This model can be divided into three parts : Incentive model , Channel model , Radiation model

A complete digital model of speech signal ( a key )

   One , Incentive model

    a. Dullness stimulation : When the air flows through the tight vocal cords , Impulse vocal cords produce vibration , To form periodic pulses at the glottis , And use it to stimulate the vocal tract .

       Because the pulse train is similar to the triangle pulse , Therefore, the unit sampling sequence string with the pitch period as the period is used as the excitation

    b. Unvoiced encouragement : The vocal cords relax without vibration , Air flows directly into the vocal tract through the glottis .

       Due to voiceless , The channel is blocked to form turbulence , The excitation can be simulated as random white noise

   Two , Channel model

    a. Acoustic tube model : The sound channel is regarded as a system composed of several pipes of different cross-sectional areas in series

    b. Formant model : Channel as a resonator , The resonance peak is the resonance frequency of the cavity

       Cascade type

         Suitable for general monophonic , It is considered that the channel is a series of second-order resonators , Using an all pole model

       Parallel type

         It is suitable for unusual vowels and most consonants , When these sounds are uttered, the sound cavity has anti resonance characteristics , Zero point must be added to the model to weaken the harmonic

         Vibration strength , So we should consider the pole zero model

       mixed type

         We can automatically switch series or parallel channels according to the needs of pronunciation , In addition, the parallel part has a direct path , The amplitude control factor is AB

         This is for some phonemes with flat spectral characteristics, such as [f],[p],[b] And consider , To enhance the anti resonance characteristics

   Three , Radiation model

The process in which air flow from the vocal tract radiates through the tip of the lip to the ear of the listener , The sound signal will decay , And it has the characteristics of high pass filtering

A first-order digital high pass filter is often used to simulate

Model summary :

1. 
This model is not the most complete one , Because it's not applicable to some sounds, such as fricative sound in voiced sound , This kind of sound has two kinds of incentives: voiced sound and pure sound , Instead of simple superposition , We can use more accurate models to simulate these sounds . 

2.  Gain control in digital model of speech generation ( Yes Av or AN) Represents the sound intensity of the output voice ;

The time-varying linear system is mainly used to simulate the characteristics of the channel ;

3.    Two basic problems in digital speech processing , Speech analysis and speech synthesis , It's all based on this model ;

4.    Features of this digital model :

System parameters are fixed —— short-time analysis ;

All pole property —— Zero point can be approximated by many Poles ;

Excitation source and channel are independent of each other —— Suitable for most digital voice processing .

7, Narrow band , Definition of broad band spectrogram , characteristic ( profound understanding )

Spectrogram : Spectrum of speech signal , The abscissa of spectrogram is time , Ordinate is frequency

Narrow band spectrogram : The generation of spectrogram is based on Fourier transform , When we use a longer analysis window ( about 20ms, The corresponding bandwidth is about 45 Hz) Time , High frequency resolution obtained , The components of resonance can be seen in the spectrum .
Showing equidistant black and white horizontal lines on the spectrogram , The distance between them is the fundamental frequency (F0).

Broad band spectrogram : If less sampling points are used in conversion calculation ( Analysis window approx 3ms , Corresponding bandwidth is about 300 Hz) No resonance component can be seen in the spectrum ,
There is no equidistant black and white on the spectrogram . Low frequency resolution , On the contrary, the resolution on the time axis is higher , Visible vertical lines .

Formant :

In frequency domain , The energy concentration is the resonance peak , On the spectrogram, it's where the color is darker .

On vowels , Strong voice , The fundamental frequency and resonance frequency of the vocal cord , The resonance peak can also be seen clearly , Energy is concentrated at low frequencies .

If it's a consonant , And the vocal cords don't vibrate , We can't see the resonance frequency . Usually the consonant has a small pitch , The color looks lighter , And the energy is concentrated in the high frequency .

If you are in a neutral position without voice , On the spectrogram , There's a gap .

Technology
©2019-2020 Toolsou All rights reserved,
mybatis Return result mapping of series ( Essence )2020 year 7 month 15 day Wechat applet assembly Component Use of Vue Get the text and option value of the drop-down box Share has surpassed Ningde Era !LG Chemical confirmation to spin off battery business unit SpringBoot JpaRepository Database addition, deletion, modification and query Three methods of value transfer between non parent and child components java Several common runtime exceptions and simple examples Redis Counter High concurrency applications C#/.NET System optimization (redis Chapter 6 data structure 【List】)Python realization switch method