Detection of Glottal Closing Instant on Electroglottographic Signal by a Threshold Method and on Acoustic Signal by Wavelets Transform

From 2007 Scientific Report

Jump to: navigation, search

Vu Ngoc Tuan, C. d’Alessandro

1 Introduction
2 GCI detection by electroglottography (EGG)
3 LMI detection by Wavelets Transform (WT) of the acoustic signal
4 Relation between GCI and LMI
5 Conclusion
6 References
- 6.1 Links

Introduction

In the production of voice, the vocal folds produces a excitation signal to the vocal tract. The vocal folds opens and closes by the effect of the air-flow coming from the lungs. A closing of the glottal induces a acoustic signal with a great amplitude. So after each glottal closing instant (GCI), the acoustic signal has a local maximum, the instant of this occurence is called LMI.

The aim of this article is to examine the correlation between GCI and LMI.

The method developped is as following : first, detection of the GCI on the derivative of an electroglottographic signal (DEGG), second, detection of LMI by a Wavelets Transform. These two detection methods are perfomed on a corpus of vietnameses syllables. The inverse of the duration between two GCI (or two LMI) is defined as a GCI-fundamental frequency , $F0_{GCI}$ , or respectively LMI-fundamental frequency $F0_{LMI}$ .

The correlation between GCI and LMI is examined by comparing these two fundamental frequencies.

GCI detection by electroglottography (EGG)

We use the derivative of th electroglottographic (DEGG) signal to detect the GCI, which is marked by a great positive peak. The detection of these peaks is done by a threshold method.[ref 1]. The duration between two peaks is the period of the acoustic signal and the F0 is calculated as the inverse of this period.

LMI detection by Wavelets Transform (WT) of the acoustic signal

A bank filter (6 filters) is applied to the acoustic signal ( $x(t)$ ) to calculate the WT of this signal. The transfert function ( $H_s(f)$ ) of these filters are defined by :

$|H(f)|= \frac{\tau}{\sqrt{1 + (2\pi (f-f_o)\tau)^2}}$

$|H_s(f) = s|H(sf)|$

$f_o = 5000$ Hz ; $\tau = \frac{1}{f_o}$ et s=1,2,4,8,16 et 32.

The output signal of the filters ( $y_s(t)$ ) are calcultated as following :

$X(f)$ = Fourier Transform of $x(t)$

$Y_s(f)$ = Fourier Transform of $y_s(t)$

$Y_s(f) = |H_s(f)|X(f)$

$y_s(t) = TF^{-1}(Y_s(f))$

An example of the result of such calculation is shown is the next figure :

The signal at the top is $x(t)$ and the others are $y_s(t)$ from top ( $s=0$ ) to bottom ( $s=5$ )

The maxima of $x(t)$ are reproduced in the $y_s(t)$ , these maxima are followed by dynamic programming, which defines lines of maxima. The summations of the amplitude along each line are proceeded. GCI are the instant where the lines, with the greatest summation, point at the output of the filter with s=0 .

Relation between GCI and LMI

After each GCI there is a LMI, to point out the correlation between these two events, two F0 are calculated : $F0_{GCI}$ deduced from the GCI and $F0_{LMI}$ from LMI. These calculations are done on the signal of a vietnamese syllables, consisted of a consonant followed by the vowel /a/, with 21 consonants and six tones for each syllable. The effect of tone is to modulate F0. The syllables are realised by 6 speakers, three females and three males.

For each speaker and each tone $F0_{LMI}$ and the differences $F0_{GCI} - F0_{LMI}$ are calculated. Statistical average values are calculated by taking into account the 21 consonants :

$<f>=$ average of $F0_{LMI}$ ; $<df>=$ average of $F0_{GCI} - F0_{LMI}$ ;

relative error $\epsilon = 100\frac{<df>}{<f>}$ ; standard deviation $\sigma$

and the ratio $\rho = 100\frac{\sigma}{<f>}$

The results show that in most of the case, $\epsilon$ is less than 5 % and $\rho$ less than 10 %. So in the time intervlas, where such results are obtained, LMI and GCI are correlated. In these time intervals, the acoustic signal is stationnary, i.d the voice quality is well defined (breath, whisper or voiced phonation). Between two stationnary intervals, the acoustic signal evolves and there are several maxima whose amplitude are nearly equal in the same period, in this case the LMI detection fails.

Conclusion

This article presents a method for detecting the glottal closing instant on the derivative of electroglottographic signal and an another method using Wavelets Transform for detecting the local maxima of the acoustic signal, which are induced by the closing of the glottal. The results show that these two events are correlated in the case of stable vowel, where the voice quality is well defined. Between two time intervals where the voice quality is well defined, the detection of the local maxima fails.

References

- Robust Glottal Closure Detection using the Wavelet Transform

Vu Ngoc Tuan & Christophe d'Alessandro

Proceedings of Eurospeech 99 Budapest

Budapest, Hungary (1999) vol.6 pages 2805-2808

- Glottal Closure Detection using EGG and the Wavelet Transform

Vu Ngoc Tuan & Christophe d'Alessandro

Advances in Quantitative Laryngoscopy, Voice and Speech Research

Proceedings of the 4th International Workshop

Jena, Germany (2000) pages 147-154

- Using Open Quotient for Characterisation of Vietnamese Glottalised Tones

Vu Ngoc Tuan, Christophe d'Alessandro, Alexis Michaud

Proceedings of Interspeech Lisbonne Lisbonne, Portugal (2005) pages 2885-2889

Links

PS (CPU)