Identification and characterisation of accents in French
This research is dedicated to the identification and characterisation of accents in French. For both foreign and regional accents, we started with perceptual identification experiments, we measured phonetic features which may characterise these accents using automatic phoneme alignment, and we ranked the most discriminating features by using classification techniques.
Perceptual experiments were designed to determine to what extent naïve French listeners are able to identify foreign and regional accents in French. Each experiment involved speech excerpts of about 10 seconds from 36 speakers (scripted and spontaneous speech), and at least 25 listeners. Six foreign accents were studied: Arabic, English, German, Italian, Portuguese and Spanish . Among other things, the listeners’ task consisted of rating the speakers’ degree of accentedness and identifying the accent origin. A similar experiment was replicated with listeners from the surroundings of Paris and Marseilles, who were asked to locate the geographical origin of native French regional accents [2,3]. It was based on recordings of speakers from six Francophone regions from the PFC corpus: Normandy, Vendee, Romand Switzerland, Provence, Languedoc and the Basque Country. Phonetic analyses were conducted so as to investigate cues that differentiate the six foreign accents in French . We first measured vowel formants, consonant duration and voicing, as well as prosodic cues. Second, we introduced pronunciation variants combining French and foreign acoustic units (e.g. a “rolled r”). Likewise, we examined phonetic characteristics such as the pronunciation of oral and nasal vowels on larger corpora of northern (i.e. standard) and southern French varieties . Tens of features were kept for training classifiers to discriminate speakers’ foreign accents . Formant frequency measurements provided by two widely-used tools (Praat and Snack) were also compared on PFC and another large corpus of conversational telephone speech, also including northern and southern French . Finally, data analysis techniques were applied to sort out the most influential factors.
Results and prospects
In both the foreign and regional accent experiments, the speakers’ degree of accentedness was evaluated as average. In the first experiment, the speakers’ mother tongue was successfully recognised in 52% of cases . The Spanish/Italian and English/German accents were the most mistaken ones. The second experiment, with 43% correct identification, suggests that a fine-grained discrimination among regional accents is more difficult (see Fig. 1) [2,3]. Contrary to the speech type (scripted or spontaneous speech) and the listeners’ region of origin (Paris or Marseilles), the speakers’ degree of accentedness has a major effect and interacts with the speakers’ age. The origin of the oldest speakers is better identified than the origin of the youngest ones. However confusions are frequent among the Southern varieties. On the whole, two or three accents can be distinguished: northern French (including Romand Swiss) vs. southern French (by clustering techniques); northern French, southern French and Romand Swiss (by multidimensional scaling). We then focussed on northern and southern French accents.
Fig. 1: results of the perceptual experiments for foreign-accented French (left) and regional-accented French (right).
Results of phonetic analyses reveal discriminating accent-specific pronunciations which, to a large extent, confirm both linguistic predictions and human listeners’ judgments. They support the idea that differences in the realisation of vowels and consonants outweigh prosodic cues . Relevant cues for foreign accents are the Arabic /e/~/i/ merger (see Fig. 2), the English and German aspirated or devoiced stops, the Italian rolled [r], the Portuguese schwa pronunciation, the Spanish [s] (possibly substituting the French /z/) and /b/~/v/ confusions. Concerning native French accents, we quantified well-known and less-known tendencies. In particular a sharp contrast exists in the fronting of the open /O/ towards [œ] in the North (see Fig. 2) and the denasalisation of nasal vowels in the South . Linguistic changes in progress below the conscious awareness threshold may affect these vowels.
Fig. 2: normalised vocalic triangles normalised by Nearey’s log-mean procedure for 36 speakers who read a 400 word text in foreign-accented French (left), in regional-accented French (right).
Finally, data mining techniques were used to select the most discriminating features and classify speakers according to their accents. Identification scores were computed by utilising a cross-validation method including unseen data — a few sentences from another 36 speakers of French. With a dozen features and considering the six foreign accents under investigation, we obtained an average of over 50% correct identification . The best selected features (e.g. the /b/~/v/ confusion for Spanish speakers of French) are robust to corpus change and make sense with regard to linguistic knowledge (see Fig. 3).
Fig. 3: decision trees obtained for foreign-accented French (left) and regional-accented French on the basis of vowel formants (right). F1@ and F2@ stand for the first two formants of the schwa. Formant values are normalised for foreign-accented French but not normalised for northern/southern French because the corpus is larger.
Northern and southern French varieties mainly differ in the second formant of the open /O/, despite differences between values extracted by Praat and Snack (especially on telephone speech). /O/ fronting in northern French (with F2 values greater than 1100 Hz for males and 1200 Hz for females) is by far the most discriminating feature among oral vowel formants, according to decision tree classification . This feature alone yields 73–97% correct North/South identification on corpora of more than 100 speakers. It seems to challenge the better-known pronunciation of schwa and nasal vowels in southern French. More work is needed to validate or refute this sound change. A diachronic study of /O/ fronting is also scheduled, throughout audio archives of the second half of the twentieth century. More generally, accent and speaking style modelling through spoken language processing is considered, as well as an application to automatic speech recognition.
 B. Vieru-Dimulescu & P. Boula de Mareüil (2006), “Perceptual identification of 6 foreign accents in French”, 9th International Conference on Spoken Language Processing, Pittsburgh (pp. 411-414).
 C. Woehrling & p. Boula de Mareüil (2006), “Identification d’accents régionaux en français : perception et analyse”, Revue PArole 37 (pp. 25–65).
 C. Woehrling & P. Boula de Mareüil (2006), “Identification of regional accents in French: perception and categorization”, 9th International Conference on Spoken Language Processing, Pittsburgh (pp. 1511–1514).
 B. Vieru-Dimulescu, P. Boula de Mareüil & M. Adda-Decker (2007), “Characterizing non-native French Accents using automatic alignement”, 16th International Congress of Phonetic Sciences, Saarbrücken (pp. 2217–2220).
 P. Boula de Mareüil, M. Adda-Decker & C. Woehrling (2007), “Analysis of oral and nasal vowel realisation in northern and southern French varieties”, 16th International Congress of Phonetic Sciences, Saarbrücken (pp. 2221–2224).
 B. Vieru-Dimulescu, P. Boula de Mareüil & M. Adda-Decker (2007), “Identification of foreign-accented French using data-mining techniques”, International Workshop on Paralinguistic Speech, Saarbrücken (pp. 47–52).
 C. Woehrling & P. Boula de Mareüil (2007), “Comparing Praat and Snack formant measurements on two large corpora of northern and southern French”, 8th Annual Meeting of the International Speech Communication Association, Antwerp (pp. 1006–1009).