Prosody research with large speech databases
Several of our recent and ongoing studies originated in a collaborative project on Landmark-based robust speech recognition using prosody-guided models of speech variability (NSF IIS 07-03624). [Investigators: Carol Espy-Wilson (PI, U Maryland), Abeer Alwan (UCLA), Jennifer Cole (UIUC), Louis Goldstein (USC, Haskins Laboratories), Mary Harper (U Maryland), Mark Hasegawa-Johnson (UIUC), Elliot Saltzsman (Boston U)]
Our contributions relate to the broad project goals of (i) developing acoustic landmark detectors and pattern classifiers for prosodic features, (ii) developing a model of the mapping from articulatory gestures that implement prosody to the acoustic output, and (iii) developing structured language models that combine prosody and syntax to handle disfluencies.
Prosodic category structure
[Investigators: Cole (PI), Hasegawa-Johnson (co-PI), Mahrt; Publications: 57, 59, 61]
Acoustic correlates of perceived prominence are found in measures of pitch, duration, and intensity in American English. This project investigates whether these acoustic correlates cue a binary prominence distinction (prominent vs. non-prominent) or a gradient prominence distinction (low-to-medium-to-high prominence). We further investigate whether all acoustic correlates cue the same pattern of prominence distinctions for a given speaker, and the extent of individual speaker differences in how prominence distinction are cued in the acoustic signal.
Everyday speech and ordinary listeners: Prosody production and perception in spontaneous speech
[Investigators: Cole (PI), Hasegawa-Johnson (co-PI), Mo, Baek; Publications: 37, 40, 41, 42, 43, 44, 45]
This project examines prosody from the dual perspectives of the speaker and listener, and asks two questions:
- What are the prosodic features ordinary listeners perceive in everyday, conversational speech?
- How does the listener’s perception of prosody relate to information in the acoustic signal, and to the lexical, syntactic and discourse context of the utterance?
[Investigators: Cole (PI), Mo, Yoon, Lee; Publications: 10, 30, 35, 38]
Our first project developed a method of minimal prosody transcription based on the ToBI system, by which trained linguists perform simultaneous annotation of prosody and disfluency. This method was used to transcribe a portion of the Switchboard corpus of conversational speech.
Our second and ongoing project in prosody annotation uses untrained transcribers, naïve to our research goals, to label prosodic prominences and phrase boundaries in conversational speech. This prosody annotation is based only on auditory impression of the speech signal, and is done in real time. Groups of 15-20 transcribers independently label the same speech files and their transcriptions are pooled to generate continuous-valued, probabilistic prosody labels for each word in an utterance. We are using this method to create prosody transcriptions for the Buckeye corpus of conversational speech, and will have approximately 5 hours of prosody-annotated speech, representing 40 speakers, completed by Summer 2009.
Prosody-based approaches to Automatic Speech Recognition
[Investigators: Cole (PI), Hasegawa-Johnson (co-PI), Mo, Hu, Huang]
In a collaboration between linguists and electrical & computer engineers we are developing approaches to Automatic Speech Recognition (ASR) that model acoustic variation due to prosodic context as a way to improve ASR accuracy. Our current work focuses on developing a prosody detector that can be used to perform automatic prosody annotation, with portability across databases and speaking styles.
Prosody and phonetic reduction in spontaneous speech
Variation in plosive production: rate and prosodic factors
[Investigators: Khasanova, Cole, Hasegawa-Johnson]
This project examines variability in the acoustic realization of plosive consonants in American English through analysis of data from the Buckeye Corpus of conversational speech. Preliminary work focuses on the implementation of a burst detector, used for the precise localization of bursts in phone-labeled speech.
Prosodic factors influencing reduction in spontaneous speech
[Investigators: Cole, Shattuck-Hufnagel; Publications: [57, 59]
Prosodic contexts influences speech production and conditions strengthening and weakening effects on segments and syllables. This project investigates the relationship between prosodic context and patterns of phonetic strengthening and reduction, in a laboratory task using speech imitation. Pilot study underway in summer 2009.
Prosody and dialect variation
Prosodic characteristics of dialect
[Investigators: Cole (PI), Thomas (co-PI), Britt; Publications: 36]
This project examines speech of African-American and European-American speakerse from the Sociolinguistic Archive and Analysis Project at North Carolina State University. The goal of this work is to identify prosodic features that contribute to ethnicity- and gender-based dialect distinctions. Our current work investigates rhythmic and intonational properties that mark discourse function in conversational speech, and dialectal differences in the prosodic marking of discourse function, through phonological and phonetic analysis. This research is funded by the University of Illinois Research Board.
Prosody in second language acquisition
Linguistics doctoral candidate Lisa Pierce is investigating the effect of perceptual training on the acquisition of prosody in a second language. This work draws on exemplar models of phonological encoding and their application to the study of language acquisition.
Prosody in bilingualism
Linguistics doctoral candidate Vandana Puri is investigating the
intonation and prosody system of Indian English and Hindi spoken by
late and simultaneous bilinguals in Delhi, India. The study explores
if simultaneous bilinguals have two different systems of intonation
for both their L1s and if their intonation system is different from
that of late bilinguals. In order to investigate these questions, the
study looks at pre-boundary lengthening, pitch accents and focus in
both Hindi and Indian English.
Acoustic correlates of prosody in broadcast news speech vs. spontaneous conversational speech
(NSF IIS-0414117; University of Illinois Critical Research Initiative) [Investigators: Cole (co-PI), Hasegawa-Johnson (PI), Chavariia, H. Choi, J. Choi, Kim, Mo, Yoon; Publications: 1, 2, 6, 9, 12, 16, 17, 21, 22, 23, 24, 25, 27, 32, 34]
This project compares prosodic structures and the acoustic features that cue prosody in read and spontaneous speech based on two corpora: the Boston University Radio News corpus of read speech and the Switchboard corpus of spontaneous telephone conversational speech.
Disfluency and prosody
(NSF IIS-0414117) [Investigators: Hasegawa-Johnson (PI), Cole (co-PI), Shih (Co-PI), Borys, Kim, Mo, Yoon; Publications: 13, 15, 30, 31, 35, 39]
Prosody research that uses spontaneous speech data, such as the Switchboard corpus, must deal with the affect of disfluency on prosodic structure. This project explores the acoustic features that cue disfluency regions in running speech and the interaction of disfluency and prosodic structure on segmental and supra-segmental acoustic features. Work on this project includes the analysis of acoustic cues to glottalization that often mark prosodic structure.
Prosody-dependent automatic speech recognition
(University of Illinois Critical Research Initiative) [Investigators: Hasegawa-Johnson (PI), Cole (co-PI), Borys, Chavarría, Chen, J. Choi, H. Choi, Cohen, Kim, Yoon; Publications: 3, 4, 5, 7, 8, 11, 14, 19, 20, 26, 28, 29, 33]
This project seeks to describe the interaction between prosodic structure and phoneme structure in conditioning acoustic variation in natural continuous speech. Our approach combines linguistic phonetic analysis and probabilistic speech recognition models to identify prosodic effects. This research has succeeded in demonstrating that the use of prosody can lead to improved word recognition accuracy in a large-vocabulary speech recognition experiment. A second goal of this research is the creation of techniques for the automatic labeling of prosodic structure and disfluency in speech.