1, Sound is a wave , Can be heard , Its vibration frequency is 20~20 000 Hz between .

2, The process of speech production

The formation of speech : Air from lungs to throat , Enter the vocal tract through the vocal cord , Finally, sound waves radiate from the mouth , Forming voice .
3, Classification of sound ( concept : understand + memory )

Dullness : Vocal cord tension , When the air flows through, the opening will become a periodic action of opening and closing , Cause periodic excited air flow , as a,o;

      ( A sound produced by the vibration of a vocal cord ), Including all vowels and some consonants .

voiceless consonant : Vocal cords fully extended , A narrow passage formed by contraction of a part of the vocal tract , Generate air turbulence , as t,d;

  ( A sound not produced by vocal cord vibration )

Plosive sound : Vocal cords fully extended , A part of the vocal tract is completely closed , Once the closing point suddenly opens , Rapid release of air pressure , as b,p.
4, Two important acoustic characteristics of speech —— Pitch frequency and formant ( memory )

Pitch frequency (F0): By vocal cord size , Characteristics and tension of vocal cords , Its value is equal to the reciprocal of the time when the vocal cords open and close once ( This is the definition of the pitch period ). The range of human pitch frequency is 80~
500 Hz about .

Formant (Fn , n=1,2,...): The channel is a resonator , It amplifies some frequency components of the sound stream and attenuates others , The amplified frequency is called the resonance peak or resonance peak frequency .

5, Formant characteristics :( understand )

Formant is an important acoustic characteristic of a channel . Channel response to an excitation signal , It can be approximately described by a linear system with many pairs of poles . Each pair of poles corresponds to a resonant peak frequency
. The frequency response characteristic of this linear system is called the formant characteristic , It determines the overall profile of the signal spectrum , Or spectral envelope .

The frequency characteristic of speech is mainly determined by the formant . And the formant characteristic of the channel determines the spectrum characteristic of the generated sound , I.e. timbre . 

The timbre and distinguishing features of vowels mainly depend on the formant characteristics of the channel . The formant characteristics can be observed from the amplitude frequency characteristics of speech signal spectrum analysis .

6, A complete digital model of speech signal generation :( Can draw pictures + Explain the principle and characteristics of each part of the model )

We can regard speech signal as quasi periodic sequence or random noise sequence as the output of the excitation linear non shift system , This model can be divided into three parts : Incentive model , Channel model , Radiation model

A complete digital model of speech signal ( a key )

   One , Incentive model

    a. Dullness stimulation : When the air flows through the tight vocal cords , Impulse vocal cords produce vibration , To form periodic pulses at the glottis , And use it to stimulate the vocal tract .

       Because the pulse train is similar to the triangle pulse , Therefore, the unit sampling sequence string with the pitch period as the period is used as the excitation

    b. Unvoiced encouragement : The vocal cords relax without vibration , Air flows directly into the vocal tract through the glottis .

       Due to voiceless , The channel is blocked to form turbulence , The excitation can be simulated as random white noise

   Two , Channel model

    a. Acoustic tube model : The sound channel is regarded as a system composed of several pipes of different cross-sectional areas in series

    b. Formant model : Channel as a resonator , The resonance peak is the resonance frequency of the cavity

       Cascade type

         Suitable for general monophonic , It is considered that the channel is a series of second-order resonators , Using an all pole model

       Parallel type

         It is suitable for unusual vowels and most consonants , When these sounds are uttered, the sound cavity has anti resonance characteristics , Zero point must be added to the model to weaken the harmonic

         Vibration strength , So we should consider the pole zero model

       mixed type

         We can automatically switch series or parallel channels according to the needs of pronunciation , In addition, the parallel part has a direct path , The amplitude control factor is AB

         This is for some phonemes with flat spectral characteristics, such as [f],[p],[b] And consider , To enhance the anti resonance characteristics

   Three , Radiation model

The process in which air flow from the vocal tract radiates through the tip of the lip to the ear of the listener , The sound signal will decay , And it has the characteristics of high pass filtering

A first-order digital high pass filter is often used to simulate

Model summary :

1. 
This model is not the most complete one , Because it's not applicable to some sounds, such as fricative sound in voiced sound , This kind of sound has two kinds of incentives: voiced sound and pure sound , Instead of simple superposition , We can use more accurate models to simulate these sounds . 

2.  Gain control in digital model of speech generation ( Yes Av or AN) Represents the sound intensity of the output voice ;

The time-varying linear system is mainly used to simulate the characteristics of the channel ;

3.    Two basic problems in digital speech processing , Speech analysis and speech synthesis , It's all based on this model ;

4.    Features of this digital model :

System parameters are fixed —— short-time analysis ;

All pole property —— Zero point can be approximated by many Poles ;

Excitation source and channel are independent of each other —— Suitable for most digital voice processing .

7, Narrow band , Definition of broad band spectrogram , characteristic ( profound understanding )

Spectrogram : Spectrum of speech signal , The abscissa of spectrogram is time , Ordinate is frequency

Narrow band spectrogram : The generation of spectrogram is based on Fourier transform , When we use a longer analysis window ( about 20ms, The corresponding bandwidth is about 45 Hz) Time , High frequency resolution obtained , The components of resonance can be seen in the spectrum .
Showing equidistant black and white horizontal lines on the spectrogram , The distance between them is the fundamental frequency (F0).

Broad band spectrogram : If less sampling points are used in conversion calculation ( Analysis window approx 3ms , Corresponding bandwidth is about 300 Hz) No resonance component can be seen in the spectrum ,
There is no equidistant black and white on the spectrogram . Low frequency resolution , On the contrary, the resolution on the time axis is higher , Visible vertical lines .

Formant :

In frequency domain , The energy concentration is the resonance peak , On the spectrogram, it's where the color is darker .

On vowels , Strong voice , The fundamental frequency and resonance frequency of the vocal cord , The resonance peak can also be seen clearly , Energy is concentrated at low frequencies .

If it's a consonant , And the vocal cords don't vibrate , We can't see the resonance frequency . Usually the consonant has a small pitch , The color looks lighter , And the energy is concentrated in the high frequency .

If you are in a neutral position without voice , On the spectrogram , There's a gap .

Technology
©2019-2020 Toolsou All rights reserved,
It's unexpected Python Cherry tree (turtle The gorgeous style of Library )html Writing about cherry trees , Writing about cherry trees java Four functional interfaces ( a key , simple ) Browser kernel ( understand )06【 Interpretation according to the frame 】 Data range filtering -- awesome HashMap Explain in detail os Simple use of module computer network --- Basic concepts of computer network ( agreement , system ) Some East 14 Pay change 16 salary , Sincerity or routine ?