<>HHT principle

Hiibert Huang The transformation is made by Huang Et al 1998 A signal analysis method proposed in , It mainly consists of two parts : Empirical model decomposition (Empirical Mode

Decomposition, EMD) And Hilbert transformation （Hilbert Transform,HT）, among EMD Is the core .

Empirical mode decomposition method is adaptive , Efficient data decomposition method . Because this decomposition is based on local time scale , So it is suitable for nonlinearity , Nonstationary process . By decomposition , Original data series available IMF weight

ci(t)c_i(t)ci(t) And the rest rn(t)r_n(t)rn(t) express ：

x(t)=∑i=1nci(t)+rn(t)(1) x(t)=\sum_{i=1}^n c_i(t)+r_n(t)\tag{1} x(t)=i=1∑nci(

t)+rn(t)(1)

EMD Signal x(t)x(t)x(t) Break down into n individual IMF, For each IMF branch amount Namely ci(t)c_i(t)ci(t)

do HT, Then we can find each IMF Instantaneous frequency and amplitude information of . The instantaneous frequency of this kind of eigenmode function （Instantaneous Frequency, IF） With clear physical significance .

yi(t)=1π∫−∞∞ci(t′)t−t′dt′(2) y_i(t)=\frac{1}{π}\int_{-\infty}^\infty

\frac{c_i(t')}{t-t'}dt'\tag{2}yi(t)=π1∫−∞∞t−t′ci(t′)dt′(2)

ci(t)c_i(t)ci(t) and yi(t)y_i(t)yi(t) Constitute analytical signal z(t)z(t)z(t)：

zi(t)=ci(t)+jyi(t)=aiejθi(t)(3) z_i(t)=c_i(t)+jy_i(t)=a_i e^{j\theta

_i(t)}\tag{3}zi(t)=ci(t)+jyi(t)=aiejθi(t)(3)

among ai(t)=ci(t)2+yi(t)2a_i(t)=\sqrt{c_i(t)^2+y_i(t)^2}ai(t)=ci(t)2+yi(t)2

Represents instantaneous amplitude ,θi=arctan(yi(t)/ci(t))\theta _i=arctan(y_i(t)/c_i(t))θi=arctan(yi(t)/ci

(t)) Represents instantaneous phase ,ωi(t)=dθi(t)/dt\omega _i(t)=d\theta _i(t)/dtωi(t)=dθi(t)/dt Represents instantaneous frequency .

From instantaneous amplitude ai(t)a_i(t)ai(t) And instantaneous frequency ωi(t)\omega _i(t)ωi(t) Signals can be expressed as ：

x(t)=∑i=1nai(t)ej∫ωi(t)dt(4) x(t)=\sum_{i=1}^n

a_i(t)e^{j\int\omega_i(t)dt}\tag{4}x(t)=i=1∑nai(t)ej∫ωi(t)dt(4)

The remaining items are omitted in the above formula rn(t)r_n(t)rn(t), because rn(t)r_n(t)rn(t)

Small amplitude , Either a monotone function or a constant , There is no substantial impact on signal analysis and signal extraction . At time - Draw each on the frequency plane IMF Instantaneous frequency curve weighted by its amplitude , This time - Frequency distribution spectrum

The picture is Hilbert Spectrum , Recorded as H(ω,t)H(ω,t)H(ω,t). From formula (4) It can be seen that ,Hilbert Spectrum is actually Fourier transform

An extension of . Compared with constant amplitude and fixed frequency in Fourier transform , type (4) Amplitude and frequency with time-varying , It can more reflect the nonlinear and nonstationary characteristic information of the signal .

<> be based on HHT Pitch detection method based on

Hilbert-Huang The transformation is applicable to the processing of non-linear and non-stationary signals , be based on HHT The flow chart of the pitch detection method of is as follows ：

（1） Band pass filtering . Because the pitch frequency range of human is 60-500Hz, So first pass the original signal through a 60-900Hz Preliminary processing of bandpass filter based on .

（2） Framing . although HHT The method does not need to make the assumption of short-term stability of speech signal , Therefore, it is not necessary to deal with the speech signal by framing and windowing , But the length of voice data is too long EMD Efficiency of decomposition , So it is necessary to frame the speech signal ,

The purpose of framing is no longer to ensure the short-term stability of the data within the frame .

（3） Determination of voiceless sound , Get the voiced part that needs to be tested for pitch .

（4）EMD decompose .EMD Decomposition is adaptive , High time-frequency resolution Data decomposition method of force . adopt EMD decompose , The local frequency structure of the signal is separated at each time , Namely IMF weight .

（5）Hilbert Transformation . adopt HT It can be concluded that IMF Instantaneous frequency and amplitude of components .

（6） Calculate the instantaneous energy of the signal . You can use the formula (5) calculation :

E(t)=∫H2(ω,t)dω(5) E(t)=\int H^2(\omega,t)d\omega\tag{5} E(t)=∫H2(ω,t)dω(5)

（7） First derivative of instantaneous energy . At the moment of glottal pulse , The instantaneous energy must be increased ,, The first derivative of instantaneous energy must be greater than a certain positive number , therefore , Theoretically, we can determine the time of glottal pulse by determining the local maximum of the first derivative of instantaneous energy , To determine the pitch .

<>MATLAB Programming implementation

Used in the program emd and hht function , This is MATLAB2018a Then it was officially introduced , The version used in this experiment is 2020a

<> Single frame processing

First of all, we select a frame of voiced frame for processing , The frame length is 100ms, use y0 express , Its time-domain waveform is as follows ：

Execute the following statement ：

emd(y0);

Display after execution y0 Of IMF The components are as follows ：

You can do this with the following statement y0 Of Hilbert-Huang Transformation

imf=emd(y0); [hs,f,t,imfinsf,imfinse]=hht(imf,fs); % imfinsf instantaneous frequency imfinse Instantaneous energy

Separate input hht(imf,fs,'FrequencyLimits',[0 1600]), You can display y0 Of HHT Spectrum

Next, for the instantaneous energy imfinse Find the first derivative

deg=[0;diff(imfinse(:,1))]; % Instantaneous energy difference plot(deg);

give the result as follows ：

Continue thresholding , And find the local maximum , The time when these local maximums occur is considered as the time when the glottal pulse occurs

for i=1:length(deg) if(deg(i)<0.1e-3) deg(i)=0; end end deg=medfilt1(deg,3);

TF=islocalmax(deg,'MinSeparation',fs*0.005);

t=linspace(0,dT_ms/1000,length(deg)); plot(t,deg,t(TF),deg(TF),'r*')

give the result as follows ：

Finally, the time difference of these local maximum values can be calculated , This time difference can be considered as the period of the pitch , The pitch frequency of the above frame signal can be calculated as 104.9180Hz. Visible use HHT This method can extract glottal pulse from voiced signal , Next apply HHT Pitch detection of a complete speech

<> pitch detection

The procedure is as follows ：

clear all; clc; [y,fs]=audioread('filename.wav'); n=length(y);

t=0:1/fs:(n-1)/fs; hf=HF; lf=LF; y=filter(hf.Numerator,1,y); % high pass 60Hz

y=filter(lf.Numerator,1,y); % Lowpass 900Hz figure() subplot(3,1,1) plot(t,y);

xlabel(" time /s"); ylabel(" amplitude "); title(" original signal "); % Framing dT_ms=50; % Duration of one frame signal

w=floor(dT_ms/1000*fs); % Sample number of one frame signal overlap=w/2; % Frameshift frame=floor(n/(overlap));

% Frames Y=zeros(frame,w); % Store the signals after framing and windowing y=[y;zeros((frame+1)*overlap-n,1)]; % Complement zero

for i=1:frame interval=(i-1)*(overlap)+(1:w); Y(i,:)=y(interval).*hamming(w); %

Framing and windowing end % Determination of voiceless sound voice=zeros(frame,1); for i=1:frame y0=Y(i,:); imf=emd(y0);

[hs,f,t]=hht(imf,fs); hs=full(hs); F=zeros(w,1); for m=1:size(hs,2) for

n=1:size(hs,1) F(m)=F(m)+hs(n,m)*f(n); end end voice(i)=mean(F);

if(voice(i)<10e-3) voice(i)=0; else voice(i)=1; end end % pitch detection

pitch=zeros(frame,1); for i=1:frame y0=Y(i,:); imf=emd(y0); % EMD decompose

[hs,f,t,imfinsf,imfinse]=hht(imf,fs); % HT Instantaneous energy obtained by transformation imfinse if(voice(i)==1) %

If it's voiced deg=[0;diff(imfinse(:,1))]; % Instantaneous energy difference for k=1:length(deg) if(deg(k)<1e-6)

deg(k)=0; end end deg=medfilt1(deg,3);

TF=islocalmax(deg,'MinSeparation',fs*0.005); % Find the location of local maximum T=find(TF==1);

T=diff(T); T=mean(T); T=T/fs; ff=1/T; pitch(i)=ff; else pitch(i)=0; end end

t=linspace(0,(length(y)-1)/fs,length(pitch)); subplot(3,1,2) plot(t,pitch);

xlim([min(t),max(t)]); ylim([0,200]); xlabel(" time /s"); ylabel(" frequency /Hz");

title(" Pitch tracking curve "); pitch=medfilt1(pitch,3); % median filtering subplot(3,1,3) plot(t,pitch);

xlim([min(t),max(t)]); ylim([0,200]); xlabel(" time /s"); ylabel(" frequency /Hz");

title(" Pitch tracking curve after median filtering ");

The input voice signal is “ I'm learning speech signal processing ”, The results are as follows ：

Technology

- Java426 articles
- Python242 articles
- Vue127 articles
- Linux119 articles
- MySQL100 articles
- javascript77 articles
- SpringBoot72 articles
- C++68 articles
- more...

Daily Recommendation

©2019-2020 Toolsou All rights reserved,

html Writing about cherry trees , Writing about cherry trees It's unexpected Python Cherry tree （turtle The gorgeous style of Library ） Browser kernel （ understand ）HashMap Explain in detail java Four functional interfaces （ a key , simple ）os Simple use of module Some East 14 Pay change 16 salary , Sincerity or routine ?