Perceptual evaluation of HRTF notches versus peaks for vertical localization
Brian FG Katz, Raphaël Greff
Head-Related Transfer Functions (HRTF) vary with frequency, azimuth, and elevation. Binaural cues such as the interaural time and level differences are the primary cues for estimation of the azimuth and distance of a sound source. Monaural information such as the spectral content of the HRTF is known to be an important cue for estimation of elevation, localization in the median plane, resolution of front-back confusion, etc. HRTF spectral cues can be represented by an ensemble of peaks and notches in the frequency spectrum. Some previous studies indicate that the spectral notches are responsible for localization of elevated sound sources in the median plane, while others discuss the importance of spectral peaks. This study presents results of an experiment aimed at studying this question. A listening test using modified HRTF spectral cues has been carried out to evaluate the localization bias resulting from spectral peak and notch modifications. This is a summary of the conference paper “Perceptual Evaluation of HRTF Notches versus Peaks for Vertical Localisation.” [ICA2007]
Humans have the ability to localize sound sources, to determine their range, azimuth, and elevation. Interaural Time Difference (ITD) and Interaural Level Differences (ILD) are known to provide primary cues for localization in the horizontal plane, i.e. to determine the azimuth of a sound source [1, 2]. However these differences are not sufficient to uniquely locate sounds in 3D space, resolving positions on the so called “cones of confusion” defined by constant ITD and ILD cues. For a source located in the median plane, for example, ITD and ILD are both zero, and spectral cues are the only information available for source localization . These spectral features are due to diffraction and reflection of the incident sound on the body, shoulder, head, and pinna. The prominent features contributed by the pinna are the sharp spectral notches and peaks at higher frequencies. There is evidence to support the hypothesis that the spectral notches due to the pinna are important cues for vertical localization [4, 5, 6]. At the same time, the physics that creates a spectral notch at one frequency also creates a spectral peak at another. This study presents an experiment whose aim is to investigate the influence of spectral notches versus the influence of spectral peaks for vertical localization. The experiment consists of an auditory localization test in the median plane for three HRTF sets comprising a reference set (A) and two other sets derived from this reference, one where the pattern of notches is kept (B), the other where the pattern of peaks is kept (C). For each subject, reference A is chosen from an HRTF database  based on subjective rated spatial impression. An identical signal is presented to left and right ears using only the left ear data component of the HRTF set, forcing the ITD and ILD to be zero.
The aim of the processing applied to HRTF set A is to determine the boundary spectrum used to distinguish spectral notches from spectral peaks. “Notches only” set B is obtained by collapsing peaks to this boundary spectrum. In a similar manner “peaks only” set C is obtained by collapsing the notches. The boundary spectrum is calculated using a smoothing algorithm. To better account for the frequency response of the human ear, the data was warped on the frequency bark scale and transformed into decibels before applying smoothing.
An auditory localization test was carried by 8 subjects, repeated 3 times. For each subject, the best matching HRTF set from the database was chosen. The reference set A was processed to obtain the “notches only” B set and the “peaks only” C set. A total of 19 positions on the median plane were chosen, equally spaced from -45° to 225°, where 0° corresponds to the front direction horizontal plane, 90° to above and 180° to the rear. Each position was presented from each set (A, B, and C), resulting in a total of 171 sounds for the subjects to localize (3 sets × 19 positions × 3 repetitions). The sound stimulus was a 200 ms pink noise burst modulated at 20 Hz. Even though an HRTF selection procedure was used, due to the absence of individual HRTFs, a novel approach was chosen in the hopes of improving subject’s performance in the localization task. This approach consisted of coupling the stimulus to localize (SL) with a spatially referenced stimulus (SR) corresponding to the frontal direction (0°) for the same HRTF set in use. The test stimulus S was constructed by coupling SR and SL with a 200 ms silence between SR and SL, with SR always preceding SL. The stimulus couple S was repeated 5 times. Subjects were told the position of the reference stimuli SR. As such, it was hoped that they could use the reference position to calibrate the HRTF and thereby correctly determine position of the sound to localize; SL. Subjects indicated the position of the target sound SL on a visual interface.
Results and Discussion of Absolute localization error
An evaluation of the median plane localization bias between the reported angle and the target angle was carried out. In analyzing the error between reported angle and target angle for each subject shows that half of the 8 subjects were not able to properly localize sounds even with the reference set A, supposedly the best match. It was decided to reject the data of these “poorly localizing” subjects and to concentrate further analysis on the 4 subjects that were able to localize with set A. It is understood that 4 subjects is a small selection, but it permits the extraction of some basic characteristic patterns. Another phenomenon appeared on the remaining data, a phenomenon that subjects reported after the test. Subjects explain that even if they felt an elevation impression, often, they were not able to decide between the front and the back hemisphere. It is uncommon to deal with front-back confusion in the median plane, but this subjective report indicated that a consideration should be made to correct this confusion in order to be able to exploit the data. A recent study  shows that there exists frontal plane symmetry in the magnitude spectrum of HRTF data that can explain front-back confusions even in the median plane. It was therefore decided to take the symmetric position of reported data when target and reported positions were not in the same front-back hemisphere. As subjects performed localization tasks with different reference sets, statistics are done only per position for both analyses. Figure 1 shows the mean reported angles by position for each of the three HRTF sets A, B and C. Table 1 indicates the mean unsigned error for each set.
There appears to be a good correlation between the “notches only” set and the reference set, with elevation localization accuracy being similar. In contrast, vertical localization performance with the “peaks only” set seems to rather poor, with reported positions being collapsed to the horizontal plane, concentrated directly front or directly rear.
Table I. Mean unsigned error relative to the target angle.
It is understood that the experiment reported in this article contained only 4 reliable subjects. Half of the test subjects were rejected due to poor median plane localization performance using non-individual HRTFs (even though the subjects were able to chose the HRTF based on perceived spatial quality). Despite this limitation, significant pattern have been observed.
Results of the absolute error analysis are coherent with those of the relative error analysis. Localization tasks in the median plane using a “notches only” HRTF set and a reference set are comparable. A bias in localization was observed for elevated sound source between the “peaks only” HRTF set and the reference set. These results appears to indicate that peaks alone are not sufficient cues for vertical localization, and that notches provide the prominent cues necessary for median plane localization.
[ICA2007] Greff, Raphaël & Katz, Brian F.G., “Perceptual Evaluation of HRTF Notches versus Peaks for Vertical Localisation.” Proceedings of the 19th International Congress on Acoustics, Madrid, 2-7 September 2007.
 J. Blauert: Spatial hearing: The Psychophysics of Human Sound Localization. The MIT Press, Cambridge, Massachusetts (1996)
 D. R. Begault: 3-D Sound for Virtual Reality and Multimedia. Academic Press, Cambridge, Massachusetts (1994)
 J. Hebrank, D. Wright: Spectral cues used in the localisation of sound sources on the median plane. Journal of the Acoustical Society of America 56 (1974) 1829–1834
 V. C. Raykar, R. Duraiswami, B. Yegnanarayana: Extracting frequencies of the pinna spectral notches in measured head related impulse response. Technical report CS-TR-4609, Perceptual Interfaces and Reality Laboratory, University of Maryland (2004)
 S. G. Rodriguez, M. A. Ramirez: Extracting and modelling approximated pinna-related transfer functions from HRTF data. Proceedings of the 2005 International Conference on Auditory Display (2005)
 M. Morimoto, M. Itoh, K. Iida: 3-D sound image localization by interaural differences and the median plane HRTF. Proceedings of the 2002 International Conference on Auditory Display (2002)
 P. Vovor: Application de techniques d’apprentissages statistiques à la prédiction d’HRTF. Mémoire pour le Master Science et Technologie, Université Pierre et Marie Curie, France Telecom Recherche et Développement (2005).
Round Robin Comparison of HRTF Measurement and Simulations Systems
(Omitted from RS2007, to be presented in the next Scientific Report)
[ICA2007b] Katz, Brian F.G. & Begault, Durand R., “Round Robin Comparison of HRTF Measurement Systems : Preliminary Results.” Proceedings of the 19th International Congress on Acoustics, Madrid, 2-7 September 2007.
[AES2007] Raphaël Greff & Brian F.G. Katz, “Round Robin Comparison of HRTF Simulation Results: Preliminary Results.” Proceedings of the 123rd Convention of the Audio Engineering Society, New York, 5-8 October, 2007.