<>HHT principle

Hiibert Huang The transformation is made by Huang Et al 1998 A signal analysis method proposed in , It mainly consists of two parts : Empirical model decomposition (Empirical Mode
Decomposition, EMD) And Hilbert transformation （Hilbert Transform,HT）, among EMD Is the core .

Empirical mode decomposition method is adaptive , Efficient data decomposition method . Because this decomposition is based on local time scale , So it is suitable for nonlinearity , Nonstationary process . By decomposition , Original data series available IMF weight
ci(t)c_i(t)ci​(t) And the rest rn(t)r_n(t)rn​(t) express ：
x(t)=∑i=1nci(t)+rn(t)(1) x(t)=\sum_{i=1}^n c_i(t)+r_n(t)\tag{1} x(t)=i=1∑n​ci​(
t)+rn​(t)(1)
EMD Signal x(t)x(t)x(t) Break down into n individual IMF, For each IMF branch amount Namely ci(t)c_i(t)ci​(t)
do HT, Then we can find each IMF Instantaneous frequency and amplitude information of . The instantaneous frequency of this kind of eigenmode function （Instantaneous Frequency, IF） With clear physical significance .
yi(t)=1π∫−∞∞ci(t′)t−t′dt′(2) y_i(t)=\frac{1}{π}\int_{-\infty}^\infty
\frac{c_i(t')}{t-t'}dt'\tag{2}yi​(t)=π1​∫−∞∞​t−t′ci​(t′)​dt′(2)
ci(t)c_i(t)ci​(t) and yi(t)y_i(t)yi​(t) Constitute analytical signal z(t)z(t)z(t)：
zi(t)=ci(t)+jyi(t)=aiejθi(t)(3) z_i(t)=c_i(t)+jy_i(t)=a_i e^{j\theta
_i(t)}\tag{3}zi​(t)=ci​(t)+jyi​(t)=ai​ejθi​(t)(3)
among ai(t)=ci(t)2+yi(t)2a_i(t)=\sqrt{c_i(t)^2+y_i(t)^2}ai​(t)=ci​(t)2+yi​(t)2​
Represents instantaneous amplitude ,θi=arctan(yi(t)/ci(t))\theta _i=arctan(y_i(t)/c_i(t))θi​=arctan(yi​(t)/ci
​(t)) Represents instantaneous phase ,ωi(t)=dθi(t)/dt\omega _i(t)=d\theta _i(t)/dtωi​(t)=dθi​(t)/dt Represents instantaneous frequency .

From instantaneous amplitude ai(t)a_i(t)ai​(t) And instantaneous frequency ωi(t)\omega _i(t)ωi​(t) Signals can be expressed as ：
x(t)=∑i=1nai(t)ej∫ωi(t)dt(4) x(t)=\sum_{i=1}^n
a_i(t)e^{j\int\omega_i(t)dt}\tag{4}x(t)=i=1∑n​ai​(t)ej∫ωi​(t)dt(4)
The remaining items are omitted in the above formula rn(t)r_n(t)rn​(t), because rn(t)r_n(t)rn​(t)
Small amplitude , Either a monotone function or a constant , There is no substantial impact on signal analysis and signal extraction . At time - Draw each on the frequency plane IMF Instantaneous frequency curve weighted by its amplitude , This time - Frequency distribution spectrum
The picture is Hilbert Spectrum , Recorded as H(ω,t)H(ω,t)H(ω,t). From formula (4) It can be seen that ,Hilbert Spectrum is actually Fourier transform
An extension of . Compared with constant amplitude and fixed frequency in Fourier transform , type (4) Amplitude and frequency with time-varying , It can more reflect the nonlinear and nonstationary characteristic information of the signal .

<> be based on HHT Pitch detection method based on

Hilbert-Huang The transformation is applicable to the processing of non-linear and non-stationary signals , be based on HHT The flow chart of the pitch detection method of is as follows ：

（1） Band pass filtering . Because the pitch frequency range of human is 60-500Hz, So first pass the original signal through a 60-900Hz Preliminary processing of bandpass filter based on .

（2） Framing . although HHT The method does not need to make the assumption of short-term stability of speech signal , Therefore, it is not necessary to deal with the speech signal by framing and windowing , But the length of voice data is too long EMD Efficiency of decomposition , So it is necessary to frame the speech signal ,
The purpose of framing is no longer to ensure the short-term stability of the data within the frame .

（3） Determination of voiceless sound , Get the voiced part that needs to be tested for pitch .

（4）EMD decompose .EMD Decomposition is adaptive , High time-frequency resolution Data decomposition method of force . adopt EMD decompose , The local frequency structure of the signal is separated at each time , Namely IMF weight .

（5）Hilbert Transformation . adopt HT It can be concluded that IMF Instantaneous frequency and amplitude of components .

（6） Calculate the instantaneous energy of the signal . You can use the formula (5) calculation :
E(t)=∫H2(ω,t)dω(5) E(t)=\int H^2(\omega,t)d\omega\tag{5} E(t)=∫H2(ω,t)dω(5)

（7） First derivative of instantaneous energy . At the moment of glottal pulse , The instantaneous energy must be increased ,, The first derivative of instantaneous energy must be greater than a certain positive number , therefore , Theoretically, we can determine the time of glottal pulse by determining the local maximum of the first derivative of instantaneous energy , To determine the pitch .

<>MATLAB Programming implementation

Used in the program emd and hht function , This is MATLAB2018a Then it was officially introduced , The version used in this experiment is 2020a

<> Single frame processing

First of all, we select a frame of voiced frame for processing , The frame length is 100ms, use y0 express , Its time-domain waveform is as follows ：

Execute the following statement ：
emd(y0);
Display after execution y0 Of IMF The components are as follows ：

You can do this with the following statement y0 Of Hilbert-Huang Transformation
imf=emd(y0); [hs,f,t,imfinsf,imfinse]=hht(imf,fs); % imfinsf instantaneous frequency imfinse Instantaneous energy
Separate input hht(imf,fs,'FrequencyLimits',[0 1600]), You can display y0 Of HHT Spectrum

Next, for the instantaneous energy imfinse Find the first derivative
deg=[0;diff(imfinse(:,1))]; % Instantaneous energy difference plot(deg);
give the result as follows ：

Continue thresholding , And find the local maximum , The time when these local maximums occur is considered as the time when the glottal pulse occurs
for i=1:length(deg) if(deg(i)<0.1e-3) deg(i)=0; end end deg=medfilt1(deg,3);
TF=islocalmax(deg,'MinSeparation',fs*0.005);
t=linspace(0,dT_ms/1000,length(deg)); plot(t,deg,t(TF),deg(TF),'r*')
give the result as follows ：

Finally, the time difference of these local maximum values can be calculated , This time difference can be considered as the period of the pitch , The pitch frequency of the above frame signal can be calculated as 104.9180Hz. Visible use HHT This method can extract glottal pulse from voiced signal , Next apply HHT Pitch detection of a complete speech

<> pitch detection

The procedure is as follows ：
t=0:1/fs:(n-1)/fs; hf=HF; lf=LF; y=filter(hf.Numerator,1,y); % high pass 60Hz
y=filter(lf.Numerator,1,y); % Lowpass 900Hz figure() subplot(3,1,1) plot(t,y);
xlabel(" time /s"); ylabel(" amplitude "); title(" original signal "); % Framing dT_ms=50; % Duration of one frame signal
w=floor(dT_ms/1000*fs); % Sample number of one frame signal overlap=w/2; % Frameshift frame=floor(n/(overlap));
% Frames Y=zeros(frame,w); % Store the signals after framing and windowing y=[y;zeros((frame+1)*overlap-n,1)]; % Complement zero
for i=1:frame interval=(i-1)*(overlap)+(1:w); Y(i,:)=y(interval).*hamming(w); %
Framing and windowing end % Determination of voiceless sound voice=zeros(frame,1); for i=1:frame y0=Y(i,:); imf=emd(y0);
[hs,f,t]=hht(imf,fs); hs=full(hs); F=zeros(w,1); for m=1:size(hs,2) for
n=1:size(hs,1) F(m)=F(m)+hs(n,m)*f(n); end end voice(i)=mean(F);
if(voice(i)<10e-3) voice(i)=0; else voice(i)=1; end end % pitch detection
pitch=zeros(frame,1); for i=1:frame y0=Y(i,:); imf=emd(y0); % EMD decompose
[hs,f,t,imfinsf,imfinse]=hht(imf,fs); % HT Instantaneous energy obtained by transformation imfinse if(voice(i)==1) %
If it's voiced deg=[0;diff(imfinse(:,1))]; % Instantaneous energy difference for k=1:length(deg) if(deg(k)<1e-6)
deg(k)=0; end end deg=medfilt1(deg,3);
TF=islocalmax(deg,'MinSeparation',fs*0.005); % Find the location of local maximum T=find(TF==1);
T=diff(T); T=mean(T); T=T/fs; ff=1/T; pitch(i)=ff; else pitch(i)=0; end end
t=linspace(0,(length(y)-1)/fs,length(pitch)); subplot(3,1,2) plot(t,pitch);
xlim([min(t),max(t)]); ylim([0,200]); xlabel(" time /s"); ylabel(" frequency /Hz");
title(" Pitch tracking curve "); pitch=medfilt1(pitch,3); % median filtering subplot(3,1,3) plot(t,pitch);
xlim([min(t),max(t)]); ylim([0,200]); xlabel(" time /s"); ylabel(" frequency /Hz");
title(" Pitch tracking curve after median filtering ");
The input voice signal is “ I'm learning speech signal processing ”, The results are as follows ：

Technology
Daily Recommendation