We are extracting NPs of the form (DT NN) from Switch Board in NXT and exploring the effects of predictability (among other things) of the DT and NN on speech rate.

Corpus

All NPs that consist of a determiner (DT) and a noun (NN) were extracted from Switch Board in NXT ¹ which did not occur in an unfinished utterance (i.e. "...the bicy-"), did not contain any coded disfluencies (i.e. "...the uh bicycle...") and the NP was not a direct projection of another NP. This constituted 16313 items. The following data was also extracted for both the DT and NN: spoken duration, number of syllables (citation form), number of phonemes (citation form), number of syllables in the current speech window², frequency (in Switch Board), probability (unigram), log forward and backward probability (bigram), forward and backward joint frequency (bigram), givenness, animacy³, speaker ID, speaker gender and speaker age. Further, the following information was extracted: the word immediately before the DT and the word following the NN. The following information was imported in: phonological typicality of the NN⁴, the stressed and unstressed log frequency weighted neighborhood density⁵ of the DT and NN and the average stressed and unstressed log frequency weighted biphone probability of the DT and NN⁵. The following information was calculated from the above: phones per second for the DT and NN, syllables per second for the DT and NN, syllables per second for the speech window for the DT and NN (adjusted, i.e. (syllables in speech window minus syllables in word)/(seconds in speech window minus seconds in word)).

Exclusions

Cases with missing values or non-sensical values (such as zero syllables) were removed for the following measures: number of phonemes (DT & NN), number of syllables (DT & NN), duration (DT & NN), syllables per second (DT & NN), phonemes per second (DT & NN), speech window syllables per second (DT & NN), both neighborhood density estimates and average biphone probabilities. This resulted in a loss of about 7.76% of data (15047 cases left). In addition, to remove outliers, the following variables were log10 transformed, normalized, and values above and bellow a value of 2.5 were removed: NN & DT frequency, NN & DT probability, NN & DT duration, NN & DT syllables per second, NN & DT phonemes per second and NN & DT speech window syllables per second. This resulted in a loss of 17.8% of data (12364 cases left). Missing values for all other variables, the final set is 5362 cases.

Direct Control Variables

After the above exclusions, the the following measures were calculated for both DTs and NNs: mean duration by word (i.e. mean duration for all 'the's), mean log10 duration, mean phonemes per second, mean log phonemes per second, mean syllables per second and mean log syllables per second. Analysis of control variables were assessed for the following DVs (both DT & NN): log duration, log phonemes per second, log syllables per second. Speaker ID was included as a random effect in all the analyses. The following results list which factors were significant alone and which factors were significant after accounting for others. In the case of DTs, they were all monosyllabic and thus, syllable count was not included. Coefficients are first shown for only that predictor (not controlling for others) then the coefficients after accounting for all others that are significant (most of the models end up with just mean log DV and log speech window syllables per second as IVs).

DT

Log Duration

mean log duration was significant (B= 1.000, SE=.00777, t= 128.69) over and above all other factors (B= 1.0046, SE= .007689, t= 130.7)

log phonemes was significant (B= .8402, SE= .009529, t= 88.2), but not after controlling for mean log duration

log speech window syllables per second was significant (B= -.19357, SE= .0221, t= -8.76) over and above all other factors (B= -.2482, SE= .01434, t= -17.3)

log frequency was significant (B= -.3477, SE= .005607, t= -62.01), but not after controlling for mean log duration

neighborhood density was significant (B= -.0052152, SE= .0001312, t= -39.74), but not after controlling for mean log duration

average biphone probabilities was significant (B= 304.65523, SE= 4.135123, t= 73.7), but not after controlling for mean log duration

speaker age and speaker gender were not significant (B= .0002692, SE= .0002075, t= 1.297), (B= -.006844, SE= .004317, t= -1.6)

Phonemes per Second

mean log phonemes per second was significant (B= 1.001259, SE= .013154, t= 76.12) over and above all other factors (B= 1.004089, SE= .01299, t= 77.26)

log phonemes was significant (B= .15978, SE= .009529, t= 16.8), but not after controlling for mean log phonemes per second

log speech window syllables per second was significant (B= .23276, SE= .01745, t= 13.34) over and above all other factors (B= .2475, SE= .01433, t= 17.27)

log frequency was significant (B= .197135, SE= .004768, t= 41.35), but not after controlling for mean log phonemes per second

neighborhood density was significant (B= -.0024817, SE= .0001148, t= -21.61), but not after controlling for mean log phonemes per second

average biphone probabilities was significant (B= -78.104935, SE= 4.581018, t= -17.0), but not after controlling for mean log phonemes per second

speaker age and speaker gender were not significant (B= -.0001016, SE= .00017101, t= -.597), (B= .006754 SE= .004512, t= 1.5)

Syllables per Second

mean log syllables per second was significant (B= 1.007822, SE= .0077768, t= 128.69) over and above all other factors (B= 1.0046, SE= .007689, t= 130.7)

log phonemes was significant (B= -.840219, SE= .009529, t= -88.2), but not after controlling for mean log syllables per second

log speech window syllables per second was significant (B= .19357, SE= .02211, t= 8.76) over and above all other factors (B= .248253, SE= .014348, t= 17.3)

log frequency was significant (B= .347702, SE= .005607, t= 62.01), but not after controlling for mean log syllables per second

neighborhood density was significant (B= .0052152, SE= .0001312, t= 39.74), but not after controlling for mean log syllables per second

average biphone probabilities was significant (B= -304.7, SE= 4.135, t= -73.7), but not after controlling for mean log syllables per second

speaker age and speaker gender were not significant (B= -.0002692, SE= .0002075, t= -1.297), (B= .006844, SE= .004317, t= 1.6)

NN

Log Duration

mean log duration was significant (B= 1.0001419, SE= .0080291, t= 124.57) over and above all other factors (B= .999342, SE= .007985, t= 125.16)

log phonemes was significant (B= .560979, SE= .006736, t= 83.28), but not after controlling for mean log duration

log syllables was significant (B= .397644, SE= .006090, t= 65.29), but not after controlling for mean log duration

log speech window syllables per second was significant (B= -.13299, SE= .01552, t= -8.571) over and above all other factors (B= -.122494, SE= .010306, t= -11.89)

log frequency was significant (B= .079339, SE= .001531, t= -51.82), but not after controlling for mean log duration

neighborhood density was significant (B= -.003934, SE= .00008307, t= - 47.36), but not after controlling for mean log duration

average biphone probabilities was significant (B= 19.679, SE= .884171, t= 22.26), but not after controlling for mean log duration

speaker age was not significant (B= .000013, SE= .000149, t= .873)

speaker gender was significant (B= .013347, SE= .004127, t= 3.23) (including over log speech window syllables per second, log phonemes and log syllables), but not after controlling for log syllables

Phonemes per Second

mean log phonemes per second was significant (B= 1.000, SE= .0009271, t= 107.86) over and above all other factors (B= 1.000481, SE= .009219, t= 108.52)

log phonemes was significant (B= .439021, SE= .006736, t= 65.17), but not after controlling for mean log phonemes per second

log syllables was significant (B= .290845, SE= .006007, t= 48.4), but not after controlling for mean log phonemes per second

log speech window syllables per second was significant (B= .11724, SE= .01440, t= 8.14) over and above all other factors (B= .122489, SE= .010306, t= 11.89)

log frequency was significant (B= -.009748, SE= .001564, t= -6.23), but not after controlling for mean log phonemes per second

neighborhood density was significant (B= -.002103, SE= .00008241, t= -25.5), but not after controlling for mean log phonemes per second

average biphone probabilities was significant (B= 22.192279, SE= 75.7286, t= 29.3), but not after controlling for mean log phonemes per second

speaker age and speaker gender were not significant (B=.0002032, SE= .0001416, t= 1.435), (B= .003802, SE= .002952, t= 1.3)

Syllables per Second

mean log syllables per second was significant (B= 1.0002135, SE= .0063544, t= 157.40) over and above all other factors (B= 1.001011, SE= .006318, t= 158.44)

log phonemes was significant (B= .416698, SE= .008961, t= 46.5), but not after controlling for mean log syllables per second

log syllables was significant (B= .602356, SE= .006090, t= 98.91), but not after controlling for mean log syllables per second

log speech window syllables per second was significant (B= .10298, SE= .01795, t= 5.74) over and above all other factors (B= .122944, SE= .010305, t= 11.93)

log frequency was significant (B= -.009846, SE= .0001947, t= -5.06), but not after controlling for mean log syllables per second

neighborhood density was significant (B= .001762, SE= .0001081, t= -16.3), but not after controlling for mean log syllables per second

average biphone probabilities was significant (B= 19.479199, SE= 1.000570, t= 19.47), but not after controlling for mean log syllables per second

speaker age and speaker gender were not significant (B= .0001635, SE= .0001819, t= .899), (B= .005065, SE= .003791, t= 1.34)

Conclusions

For DTs mean log DV was highly correlated with neighborhood density, average biphone probability, log frequency and log phonemes which makes sense if all those factors have an effect on the word's duration, syllable per second rate and phoneme per second rate. However, the mean log DV was always a significant improvement over a model that already contained the above factors, as such, the best control for the log DVs is the mean log DV. The same was true for NNs though with the inclusion of log syllables.

Preceding Context Control Variables

With the above control variables included, we will not control for contextual effects.

DTs

As outlined above, the best control variables for all the DT DVs were mean log DV and the speech window syllables per second. Now to look at the extent to which the previous word influences DTs.

Duration

Log frequency of the previous word is not significant (B= .001364, SE= .001887, t= .72), chisq= .5232, p= .4695

Log joint frequency is significant (B= -.003672, SE= .001846, t=-1.99), chisq= 3.9554, p= .04672

Log forward probability is significant (B= -.008668, .001602, t= -5.41), chisq= 29.229, p< .001

Log joint frequency does not significantly improve the model over log forward probability (chisq= .1867, p= .6656), while the reverse does (chisq = 28.893, p<.001)

Phonemes per Second

Log frequency of the previous word is not significant (B= -.001456, SE= .001895, t= -.77), chisq= .5915, p= .4418

Log joint frequency is marginally significant (B= .003637, SE= .001884, t=-1.93), chisq= 3.7276, p= .05352

Log forward probability is significant (B= .007969, SE= .001552, t= 5.13), chisq= 26.314, p< .001

Log joint frequency does not significantly improve the model over log forward probability (chisq= .3074, p= .5793), while the reverse does (chisq = 22.893, p<.001)

Syllables per Second

Log frequency of the previous word is not significant (B= -.001364, SE= .001887, t= -.72), chisq= .5232, p= .4695

Log joint frequency is significant (B= .003672, SE= .001846, t=-1.99), chisq= 3.9554, p= .04672

Log forward probability is significant (B= .008668, SE= .001602, t= 5.41), chisq= 29.229, p< .001

Log joint frequency does not significantly improve the model over log forward probability (chisq= .1867, p= .6656), while the reverse does (chisq = 25.461, p<.001)

Conclusions

The log forward probability (which is the log ratio of the joint frequency of the DT and the preceding word and the frequency of the determiner) is a significant factor in predicting all three speech measurements of the DT. It will be included as a control factor for the following results.

NN Effects on DT

In the following section there will be two separate analyses, to start, first to look only at the effects of the probability and frequency of the NN on the DT then to look only at the effects of some measures implicated in effects of articulation of the NN. Many of the measurements in both analyses are correlated, thus the final analysis will be to determine the extent to which they uniquely contribute to the DT. To get a more direct measure of the effects in question the above measures of DT will be residualized based on the results⁶ above, so: log DT duration ~ (1 | Speaker) + log speech window syllables per second + log forward probability + log mean by DT duration log DT syllables per second ~ (1 | Speaker) + log speech window syllables per second + log forward probability + log mean by DT syllables per second log DT phonemes per second ~ (1 | Speaker) + log speech window syllables per second + log forward probability + log mean by DT phonemes per second

Probability of the NN

The following factors were picked to explore the predictability of the NN. First is the log frequency of the NN, the log backward probability (ratio of the joint frequency of the DT and NN over the frequency of the NN), the log forward probability (ratio of the joint frequency of the DT and NN over the frequency of the DT), Givenness of the NN (new referent, implied, mentioned).

Directed Reading

http://groups.inf.ed.ac.uk/switchboard/index.html (1)
not quite sure where the details of this is located (2)
following http://npcorpus.bu.edu/documentation/index.html the distinction is human > organizations and animals > other (3)
provided by Thomas Farmer (4)
from iphod v2 http://www.iphod.com/ (5 6)
Alternatively one could include all factors, regardless of significance (7)