Detection of Glottal Closing Instant on Electroglottographic Signal by a Threshold Method and on Acoustic Signal by Wavelets Transform

Vu Ngoc Tuan, C. d’Alessandro

Introduction

In the production of voice, the vocal folds produces a excitation signal to the vocal tract. The vocal folds opens and closes by the effect of the air-flow coming from the lungs. A closing of the glottal induces a acoustic signal with a great amplitude. So after each glottal closing instant (GCI), the acoustic signal has a local maximum, the instant of this occurence is called LMI.

The method developped is as following : first, detection of the GCI on the derivative of an electroglottographic signal (DEGG), second, detection of LMI by a Wavelets Transform. These two detection methods are perfomed on a corpus of vietnameses syllables. The inverse of the duration between two GCI (or two LMI) is defined as a GCI-fundamental frequency , $F0_{GCI}$, or respectively LMI-fundamental frequency $F0_{LMI}$.

The correlation between GCI and LMI is examined by comparing these two fundamental frequencies.

GCI detection by electroglottography (EGG)

We use the derivative of th electroglottographic (DEGG) signal to detect the GCI, which is marked by a great positive peak. The detection of these peaks is done by a threshold method.[ref 1]. The duration between two peaks is the period of the acoustic signal and the F0 is calculated as the inverse of this period.

LMI detection by Wavelets Transform (WT) of the acoustic signal

A bank filter (6 filters) is applied to the acoustic signal ( $x(t)$) to calculate the WT of this signal. The transfert function ( $H_s(f)$) of these filters are defined by : $|H(f)|= \frac{\tau}{\sqrt{1 + (2\pi (f-f_o)\tau)^2}}$ $|H_s(f) = s|H(sf)|$ $f_o = 5000$ Hz ; $\tau = \frac{1}{f_o}$ et s=1,2,4,8,16 et 32.

The output signal of the filters ( $y_s(t)$) are calcultated as following : $X(f)$ = Fourier Transform of $x(t)$ $Y_s(f)$ = Fourier Transform of $y_s(t)$ $Y_s(f) = |H_s(f)|X(f)$ $y_s(t) = TF^{-1}(Y_s(f))$

An example of the result of such calculation is shown is the next figure :

The signal at the top is $x(t)$ and the others are $y_s(t)$ from top ( $s=0$) to bottom ( $s=5$)

The maxima of $x(t)$ are reproduced in the $y_s(t)$, these maxima are followed by dynamic programming, which defines lines of maxima. The summations of the amplitude along each line are proceeded. GCI are the instant where the lines, with the greatest summation, point at the output of the filter with s=0 .

Relation between GCI and LMI

After each GCI there is a LMI, to point out the correlation between these two events, two F0 are calculated : $F0_{GCI}$ deduced from the GCI and $F0_{LMI}$ from LMI. These calculations are done on the signal of a vietnamese syllables, consisted of a consonant followed by the vowel /a/, with 21 consonants and six tones for each syllable. The effect of tone is to modulate F0. The syllables are realised by 6 speakers, three females and three males.

For each speaker and each tone $F0_{LMI}$ and the differences $F0_{GCI} - F0_{LMI}$ are calculated. Statistical average values are calculated by taking into account the 21 consonants : $=$ average of $F0_{LMI}$ ; $=$ average of $F0_{GCI} - F0_{LMI}$ ;

relative error $\epsilon = 100\frac{}{}$ ; standard deviation $\sigma$

and the ratio $\rho = 100\frac{\sigma}{}$

The results show that in most of the case, $\epsilon$ is less than 5 % and $\rho$ less than 10 %. So in the time intervlas, where such results are obtained, LMI and GCI are correlated. In these time intervals, the acoustic signal is stationnary, i.d the voice quality is well defined (breath, whisper or voiced phonation). Between two stationnary intervals, the acoustic signal evolves and there are several maxima whose amplitude are nearly equal in the same period, in this case the LMI detection fails.

Conclusion

This article presents a method for detecting the glottal closing instant on the derivative of electroglottographic signal and an another method using Wavelets Transform for detecting the local maxima of the acoustic signal, which are induced by the closing of the glottal. The results show that these two events are correlated in the case of stable vowel, where the voice quality is well defined. Between two time intervals where the voice quality is well defined, the detection of the local maxima fails.