Abstract - Comprehension, Memory, Emotion


Intelligibility of Emotional Speech

Kate Dupuis and Kathy Pichora-Fuller
University of Toronto


Speech intelligibility tests are used by clinicians and researchers to measure how spoken language understanding is influenced by stimulus properties in different listening conditions. While these tests control for certain acoustical and linguistic stimulus properties, the emotional characteristics of the stimuli have been overlooked. Unlike natural speech, stimuli are typically recorded by a trained talker with little emotional prosody. In the current project, intelligibility was tested for words spoken with different emotions. Stimuli from a commonly-used speech intelligibility test were previously re-recorded by two actresses (aged 26 and 64 years) in seven emotions (anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral). The stimuli were equated for audibility and presented to younger listeners in the presence of background noise in order to examine whether emotional prosody would influence intelligibility. Fearful and pleasantly surprised voices were the most intelligible, while sad and angry voices were the least intelligible. This is a first attempt to create an ecologically-relevant emotional intelligibility test which will inform future research on emotional speech production and perception.

 

Back to Comprehension, Memory, Emotion

 


Understanding Stories in Distracting Background

Older adults often complain that they have difficulty understanding conversations when someone else is talking in the background. In fact, the distracting effect of background speech (the so-called “irrelevant speech effect”) is not unique to seniors at all, although there are debates about whether seniors are more susceptible to irrelevant speech than are younger adults.  In a previous study (Schneider et al., 2000), younger and older adults listened to stories in quiet and in babble noise before answering questions concerning the material.  When the listening situation was adjusted for individual differences in hearing, younger and older adults were equally adept at remembering the contents of the passages in both quiet and in two levels of noise.  However, when no adjustments were made to compensate for the poorer hearing of older adults (all participants tested under identical listening conditions), older adults could not recall as much detail as younger adults, either in quiet or in noise.  The results indicated that the speech-comprehension difficulties of older adults primarily reflect declines in hearing rather than in cognitive ability.  Then what would happen if the background noise were not meaningless babble, but meaningful stories?  Previous studies found that older adults were not as good as younger people at inhibiting useless information.  In this study we present stories in quiet or with another unrelated story in the background to determine if older adults can inhibit background speech as well as their younger counterparts even after the listening situations are adjusted. It was found that both normal-hearing and hearing-impaired older adults performed poorer than younger adults when everyone was tested in identical listening situations.  However, when the listening situation was individually adjusted to compensate for age differences in the ability to recognize individual words in noise, age difference in comprehension disappeared. The results indicated that older adults’ speech comprehension difficulties with maskers were mainly due to declines in their hearing capacities rather than their cognitive functions.

Back to Comprehension, Memory, Emotion

 


 

How Perceived Spatial Position Affects Comprehension and Memory for Dialogue

Being able to communicate efficiently in a noisy multi-talker environment is an essential requirement in our everyday lives. A situation where there are more than two people talking, increases the complexity of both the perceptual and cognitive processes required for understanding the conversation. In this study, we asked our young and older participants to listen to dialogues (Plays with 2 characters) presented with babble background. The plays were presented either with both talkers coming from the same loudspeaker or coming from two different loudspeakers. Participants were asked to answer multiple-choice questions regarding the content of the conversation they heard. The overall results of this study show that once the signal and noise levels are adjusted in order to compensate for individual differences in hearing sensitivity, no significant differences in performance were found between young and older participants.

Back to Comprehension, Memory, Emotion

 


 

Speech Disfluencies and the Elderly:  Effects on Online Interpretation

Daniel DeSantis, Craig Chambers, & Elizabeth Johnson

     Past eye-tracking experiments have revealed that fluent instructions lead listeners to anticipate items that have already been mentioned, and disfluencies lead to anticipation of an as of yet unmentioned entity. Furthermore, listeners’ perception of the speaker can influence their reactions, as the belief that the speaker had object agnosia eliminated the anticipatory bias. The current study compares younger and older listeners’ reactions to speech from older and younger adults. It is possible that listeners will attribute the cause of disfluency to the speaker’s old age and increased tendency to hesitate when speaking rather than the difficulty of the task (i.e., introducing a new object into discourse). Consistent with previous research, disfluent instructions made listeners anticipate unmentioned objects. However, there was no significant difference between participants’ reactions to young and elderly speech. Furthermore, implicit measures of age attitudes were correlated with the results of the eye-tracking study and no significant correlation was found, showing that attitudes toward aging do not predict differences in the reaction to older and younger speakers' disfluencies. Even though older speakers are known to be more disfluent, people do not react differently to younger and older adults’ disfluencies in terms of the subtle cues they provide. Also, listeners of both ages reacted similarly to speakers’ disfluencies, showing that older adults were able to utilize the cues the same way as younger adults. 

Back to Comprehension, Memory, Emotion

 


 

Test for Rating of Emotions in Speech (T-RES): Interaction of Lexical Content and Prosody in

Spoken Language Possible Age-Related Effects

Multani, N., Ben-David, B. M., & Van Lieshout P. H. H. M.

When talking to others, most of our conversations consist of emotional material. While conversing, we always try to understand other people’s feelings and emotions. For example, you are out with your friend for coffee and he starts talking about how happy he is. From the tone of his voice, you can tell he is really happy. What if your friend told you he was really happy in an angry tone? You will not be sure whether he is happy or angry. Every day we come across these kinds of complex situations, where the emotion, such as anger, happy and sad, of the content or words, does not match the emotional tone of the voice of the speaker. How do we interpret what others are feeling during these complex situations? Do we just rely on the words and ignore the tone or do we rely on the tone and ignore the words? Maybe the interaction of both plays a greater role. Does everyone process this information in a similar manner or do people with different backgrounds or from different age groups process it differently? Some previous studies show that healthy seniors and young adults differ from one another in understanding the emotional state of others through spoken language, which can be due to differences in their hearing. However, other studies show that these differences may relate to how emotion information is processed in older and younger adults. If such differences are indeed robust and how to interpret these differences remains unclear, it is important to examine emotion processing in healthy older adults. Therefore, the goal of this study is to examine the role of words, tone and their interaction in spoken language. This will further help us understand the possible source(s) of these differences and inform future studies on how to improve the understanding of emotion in those individuals who show problems in this respect.

Back to Comprehension, Memory, Emotion

 


 

Pitch Flattening – Effects of reduced pitch fluctuations on the intelligibility of emotional speech

Principal researcher: Prof. K. Pichora-Fuller, Department of Psychology, University of Toronto Mississauga, Mississauga, Ontario, Canada.

Other researchers: J. Besser and Prof. J.M. Festen, Department of ENT/audiology and The EMGO+ Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands.

 

The intelligibility of speech has been found to be affected by the presence or absence of emotion in the speaker’s voice and by the type of emotion expressed. Whereas absolute word recognition scores differ for older and younger adults with normal hearing, the pattern of how intelligibility is influenced by emotion is similar for both age groups. The acoustic properties of emotional speech differ from those of neutral speech, for example in terms of mean F0, range of F0 fluctuations, duration, and intensity. In this research we will specifically investigate the effect of F0 or pitch fluctuations on speech comprehension by manipulating the amplitude of natural pitch fluctuations in a word recognition task. Furthermore, we will examine how cognitive functioning and auditory supra-threshold abilities of temporal processing might be related to the ability to make use of F0 information in both younger and older adults.

The effect of reducing pitch fluctuations on speech intelligibility will be examined by assessing participants’ word recognition abilities with stimuli that have been processed to contain a certain percentage of the stimuli’s original pitch fluctuations. Each participant will be tested at 7 different pitch levels ranging from 0.1% to 100% of the original pitch fluctuations. The stimuli are recorded in an emotional voice representing fear, which is known to increase speech intelligibility compared to speech presented in a neutral voice. Each stimulus consists of a carrier phrase, which is the same for all stimuli, and a target word. Target-word recognition scores are calculated in two different background maskers at each of the seven pitch levels. The maskers are multi-talker babble and a two-talker background. As a point of reference, word recognition scores will also be assessed in both maskers for stimuli spoken in a neutral voice and presented with full pitch fluctuations.

Psychometric functions will be gained indicating intelligibility by percentage of natural pitch for both participant groups and both maskers separately, i.e., resulting in four functions. Group comparisons will be performed for these outcomes. Furthermore, the groups will be compared on measures of general cognitive ability, language skills, subjective hearing qualities, and auditory temporal processing. Testing of auditory temporal processing will include measures of the ability to make use of the signal’s fine structure or envelope cues, respectively.

Back to Comprehension, Memory, Emotion

 


 

The effect of vocal emotion on word recognition and memory

Harman Sawhney, Kathy Pichora-Fuller, & Kate Dupuis

University of Toronto

Previous research in vision has shown that emotional information, such as pictures and written words, is better remembered than neutral information. In the auditory domain, little work has examined whether similar benefits for emotional compared to neutral stimuli exist for vocal displays of emotion, and whether patterns of recall differ across emotions. Moreover, it is unknown whether younger and older listeners would show similar patterns of responding to different vocal emotions.

In earlier research, we developed a corpus of stimuli based on an existing standardized speech test. Each of the 200 test items (e.g., Say the word bean) was re-recorded by a younger (aged 26 years) and an older (aged 64 years) actor in seven different emotional voices: anger, disgust, fear, happiness, neutral, pleasant surprise and sadness. In the current experiment, stimuli portraying fear, pleasant surprise, sadness and neutral will be presented to both younger and older listeners in sets of 2, 4, 6, and 8 stimuli. Participants will be instructed to repeat the final word from each stimulus. At the end of each set, they will be asked to recall all of the final words from that set (i.e., either 2, 4, 6 or 8 words). We will also determine whether participants’ ability to recall neutral visual information presented in different set sizes corresponds with their performance on this auditory task.

This experiment is part of a broader programme of research designed to understand how listeners understand emotion in speech. In addition to appreciating how normal listeners combine linguistic and emotional information during communication, this approach is highly relevant to special populations, including older adults with cognitive impairment and individuals with mental health conditions, who have been shown to have difficulty understanding emotion in speech.

Back to Comprehension, Memory, Emotion

 


 

Keeping Beat:  Does Musical Rhythm Affect Vowel Intelligibility?

The purpose of this study is to discover the effects of musical rhythm on vowel recognition in old and young adults.  Participants will be asked to identify 10 different words in varying conditions and noise.  The results of the experiment will be analyzed to see if words sung on an “off-beat” are harder to identify than words sung “on” the beat.  The results of old and young adults will be compared to see if age is a factor in understanding vowels that are off beat.

The results from this experiment may help future studies on improving the quality of life for elderly adults.  A separate study done by Leek and Molis (2008) uncovered that one of the reasons why elderly people lose enjoyment in music is because they can’t understand the words.  If the results from this study demonstrate an increased understanding of sung words when they are heard “on the beat” versus “off the beat”, it is possible that future music can be designed to cater the needs of an elderly audience.

 

Keeping the Beat:  bVt Psychometric Study

Dario Coletta, Veronica Marchuk, Huiwen Goy, Kathy Pichora-Fuller, Frank Russo

     Rhythm is an important part of both speech and music, and speech and music may co-occur in the form of lyrics set to an instrumental background. In music, there are both strong and weak beats in the meter, and important “events” tend to occur during strong beats. When listeners hear a regular rhythm containing both strong and weak beats, it is possible that they form expectations about when and where they should focus their attention based on the ongoing beat pattern. In this study, listeners heard a 4/4 metronome rhythm before one of 10 possible /bVt/ words (e.g., bat, bet, bit) was sung by a female voice, and the metronome beat and babble noise were present throughout the trial. The timing of the word presentation was manipulated so that the word could occur on a strong beat or on a weak beat. The word could also occur right on the beat, just before the beat, or just after the beat. We hypothesized that listeners would be more accurate at identifying words when the words coincided with a strong beat and on the beat than when words coincided with a weak beat and off the beat. The results of this study showed that younger listeners were slightly better than older listeners at word identification, but that both age groups were similar in their behaviour in different conditions of the experiment. In general, listeners were better at identifying words when words occurred on strong beats rather than on weak beats, and when words occurred right on the beat or just before the beat rather than just after the beat. These results supported our hypothesis that listeners’ attention changes according to their expectation of events, and these changes in attention affect accuracy in vowel perception.

Back to Comprehension, Memory, Emotion

 


Emotional Recognition Sequel Experiment

Kate Dupuis & Kathy Pichora-Fuller

University of Toronto

Standardized speech tests have been used extensively by both clinicians and researchers to measure speech perception and spoken language understanding in different listening conditions and in people of all ages. Stimuli are typically recorded in an artificially neutral way, devoid of any emotional cues, whereas most everyday speech is produced with emotional inflection that is interpreted differently depending on age. Although previous research in vision has shown that emotional information, such as pictures and written words, is better attended to and remembered, the emotional implications of stimuli in commonly-used speech tests have not previously been controlled for or investigated.

We recently developed a novel corpus of stimuli based on an existing standardized speech test. Each of the 200 test items was re-recorded by a younger (aged 26 years) and an older (aged 64 years) actor in seven different emotional voices: anger, disgust, fear, happiness, neutral, pleasant surprise and sadness. In the current experiment both younger and older will listen to these novel stimuli and will indicate which emotion they believe is being portrayed, thereby establishing whether each stimulus is perceived as accurately representing the emotion the actresses were instructed to convey. We will also determine how well people understand pitch, duration, and loudness by having them listen to specific sounds under headphones, and how their ability to understand these cues influences how well they can understand emotion in speech.

This experiment is part of a broader programme of research designed to create a more real-world measure of spoken language understanding. In addition to appreciating how normal listeners combine linguistic and emotional information during communication, this approach is highly relevant to special populations, including older adults and people with mental health conditions, such as schizophrenia and Alzheimer's disease, who have been shown to have difficulty understanding emotion in speech.

Back to Comprehension, Memory, Emotion

 


Speech-in-speech listening on the LiSN-S test by older adults with good audiograms depends on cognition and hearing acuity at high frequencies

Jana Besser, Joost Festen, S. Theo Goverts, Sophia Kramer, Kathy Pichora-Fuller

     We tested younger and older adults with good audiograms in most of the speech range to determine if there were age-related differences on the LiSN-S test, which assesses ability to use voice difference cues and spatial separation cues in speech-in-speech listening. For both age groups, we also investigated if LiSN-S outcomes were influenced by auditory, cognitive, or linguistic measures.

     The LiSN-S test yields a speech reception threshold (SRT) in each of four speech-in-speech listening conditions depending on the availability of voice difference cues and/or spatial separation cues. Based on these four SRTs, scores are calculated for the talker advantage, the spatial advantage, and the total advantage due to both types of cues. Younger listeners outperformed older listeners on all four LiSN-S SRTs and all three LiSN-S advantage measures. Age-related differences were especially large for conditions involving the use of spatial cues. Additionally, participants completed four auditory temporal-processing tests, a cognitive screening, a vocabulary test, and tests of linguistic closure for high- and low-context sentences.  Linguistic abilities determined LiSN-S outcomes for the younger group, whereas in the older group, LiSN-S outcomes were predominantly predicted by cognitive ability and high-frequency hearing acuity.

Back to Comprehension, Memory, Emotion


Differences in conversation comprehension between younger and older listeners

Meital Avivi, Bruce Schneider, Meredyth Daneman

Purpose: We investigated how age and linguistic status affected listeners’ ability to follow and comprehend three-talker conversations, and the extent to which individual differences in language proficiency predict speech comprehension under difficult listening conditions. 

Method: Younger and older native-English listeners as well as young nonnative-English listeners listened to three-talker conversations, when the talkers were either presented together via  single loudspeaker from the front or each from a different loudspeaker (left, right and middle), in either quiet or against moderate or high babble background, and were asked to answer questions regarding their contents.

Results:  After compensating for individual differences in speech recognition, no significant differences in conversation comprehension were found among the groups. As expected, conversation comprehension decreased as babble level increased. Individual differences in reading comprehension skill contributed positively to performance in younger listeners, but not in older listeners. In contrast, individual differences in vocabulary knowledge contributed positively to performance in older native-English and young nonnative-English listeners, but not in young native-English listeners.  Reading comprehension skill was positively related to performance in all babble conditions, whereas vocabulary knowledge was significantly and positively related to performance only at the intermediate babble level.

Conclusion: The results indicate that the manner in which spoken language comprehension is achieved is modulated by the listeners’ age and linguistic status.

Back to Comprehension, Memory, Emotion

To top