We are extracting NPs of the form (DT NN) from Switch Board in NXT and exploring the effects of predictability (among other things) of the DT and NN on speech rate.

Corpus

All NPs that consist of a determiner (DT) and a noun (NN) were extracted from Switch Board in NXT 1 which did not occur in an unfinished utterance (i.e. "...the bicy-"), did not contain any coded disfluencies (i.e. "...the uh bicycle...") and the NP was not a direct projection of another NP. The following data was also extracted for both the DT and NN: spoken duration, number of syllables (citation form), number of phonemes (citation form), number of syllables in the current speech window2, frequency (in Switch Board), probability (unigram), log forward and backward probability (bigram), forward and backward joint frequency (bigram), givenness, animacy, speaker ID, speaker gender and speaker age. Further, the following information was extracted: the word immediately before the DT and the word following the NN. The following information was imported in: phonological typicality of the NN3, the stressed and unstressed log frequency weighted neighborhood density4 of the DT and NN and the average stressed and unstressed log frequency weighted biphone probability of the DT and NN4. The following information was calculated from the above: phones per second for the DT and NN, syllables per second for the DT and NN, syllables per second for the speech window for the DT and NN (adjusted, i.e. (syllables in speech window minus syllables in word)/(seconds in speech window minus seconds in word)).

[wiki:/Readings Directed Reading]

  1. http://groups.inf.ed.ac.uk/switchboard/index.html (1)

  2. not quite sure where the details of this is located (2)

  3. provided by Thomas Farmer (3)

  4. from iphod v2 http://www.iphod.com/ (4 5)

MoinMoin Appliance - Powered by TurnKey Linux