Computational Accounts of Production


We will start with a quick refresher (written for language researchers) on probability theory and information theory and then read a lot of papers on examples of how information content, entropy, etc. affect language production. The goal of the class would be to provide a thorough introduction to these topics, but also to discuss the short-comings of these types of accounts and their relation to other mechanistic accounts of language production.


The seminar is intended for graduate students though I may consider advanced undergraduate students with a psycholinguistics background and strong interest. A very basic background in probability theory is assumed, but we'll go through the basics at the beginning of the class.


This will be a reading/discussion seminar (not a lecture). So, even if you plan to audit I would appreciate if you do the readings (see webpage for more detail on requirements etc.).

Students who are taking the class for credits will have to prepare for every discussion. I plan to use the BlackBoard forum feature and students taking the class for credit will have to post 2 questions or comments at least 1 day before each class about the readings. Additionally, they will have to lead some discussions. There also will be a final project, which can be a discussion paper or a proposal for an experiment (or grant ;). The final write-up should be about 4-10pp.


There will be a lot readings for each day, but the goal is not for all of them to be read by everyone. Instead, we will have a short obligatory reading and then distribute additional readings across people in the class. Discussion leaders have to have read all of the papers.

In addition to the readings given below, you may find the following books helpful although they are in no way required as reading since they are pretty technical:

At least the first chapter of Bishop are more readable than MacKay, who is pretty technical if you haven't yet had any background in information theory.


This is a very rough draft of a syllabus. I am also blatantly stealing parts of a great class taught by Dan Jurafsky and Michael Ramscar at Stanford (Fall 2009). Feedback welcome.

NB 1: Check out the HLP lab calendar to see [ whether we are meeting this week].

NB 2: Please use the [ BlackBoard forum] I set up to add questions and comments on the readings before each class (log in before clicking the link).

The Basics of Efficiency

  1. Class: Overview and Early approaches to efficiency: The Principle of Least Effort
    • Zipf 1949 (1-22) and Zipf 1935 (20-39, 172-176) on the inverse frequency-form link pdf

      • Zipf, G.K. (1935). The psycho-biology of language: An introduction to dynamic philology. Houghton Mifflin.
      • Zipf, G.K. (1949). Human behaviour and the principle of least effort: An introduction to human ecology. New York: Hafner.
  2. Class: Basics of Probability Theory and Information Theory, as well as early applications to language
    • Zipf 1949 (23-31)
    • John A. Goldsmith. 2007. Probability for linguists. pdf

      • NB: This starts of very nicely and gives some great intuitions about probability theory and the basics of information theory. It does, however, contain several typos, at least two wrong formulas, and some terminological inconsistencies that can be rather confusing. Check the forum posts for detail.

    • Sheldon Ross. 2010. A First Course in Probability. Eigth Edition. Section 9.3 "Surprise, Uncertainty, and Entropy", pages 425-429. see []

    • You may find Manning and Schuetze's introduction useful, too.

More than a Curious Phenomenon? Constant Entropy Rate

  1. Class: The Entropy Rate of English
    • Shannon, C.E. Prediction and entropy of printed English. Bell System Technical Journal, 30, 50-64.
    • Thomas M. Cover and Roger C. King. 1978. A Convergent Gambling Estimate of the Entropy of English. IEEE Transactions on Information Theory 24:4, 413-421.
      • NB: Not for the faint-hearted. Focus on the conceptual point. Skip parts that get too dense. We will return to some of the less transparent parts later in the semester.

    • Manin, D. 2006. Experiments on predictability of word in context and information rate in natural language.
  2. Class: Constant Entropy Rate Across Discourses [ Ting ]

    • Genzel, D. and Charniak, E. (2002). Entropy rate constancy in text. In Proceedings of ACL-02.
    • Keller, F. (2004). The entropy rate principle as a predictor of processing effort: An evaluation against eye-tracking data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, pp. 317-324.
    • Qian, T. and Jaeger, T.F. (submitted). Entropy profiles in Language: A cross-linguistic investigation. Entropy. -- feedback very much appreciated (you can use the forum). Thanks

    • also covered:
      • Genzel, D. and Charniak, E. (2003). Variation of Entropy and Parse Trees of Sentences as a Function of the Sentence Number. Proceedings of the 2003 Conference on Emprical Methods in Natural Language Processing, pp. 65-72.
      • Qian, T. and Jaeger, T.F. (2009). Evidence for Efficient Language Production in Chinese. In CogSci Proceedings.

      • Qian. T. (2009). Efficiency of Language Production in Native and Non-native Speakers
  3. Class: the Noisy Channel Theorem and Language [ Ting ]

    • MacKay (2008): p. 3-8, 66-73, 137-155, and, if you still have it in you, 161-175 link -- skip exercises but be aware that there are examples and explanations interspered between exercises. Don't focus so much on the formulas but more on the conceptual points. The book also has excellent illustrations of some of the main points. Make sure you make it at least through chapter 9.

    • Ahammad, Daskalakis, Etesami, and Frome (2004) Claude Shannon and "A Mathematical Theory of Communication". pdf

    • also covered:
      • Shannon, C.E. (1948). A Mathematical Theory of Communication. Mobile Computing and Communications Review, Volume 5, Number I. Reprinted from Bell System Technical Journal with corrections.

Probability and Information in Online Language Production

  1. Class: Frequency, Predictability and Word Duration [ Meredith ]

    • Zipf 1935 (283-287) on speech rate (velocity of speech)
    • Pluymaekers, M., Ernestus, M., and Baayen, R. (2005). Lexical frequency and acoustic reduction in spoken Dutch. The Journal of the Acoustical Society of America, 118, 25-61.
    • Alan Bell, Jason Brenier, Michelle Gregory, Cynthia Girand, and Dan Jurafsky. (2009) Predictability Effects on Durations of Content and Function Words in Conversational English. Journal of Memory and Language 60:1, 92-111.
    • Also covered:
      • Aylett, M. and Turk, A. (2004). The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech, 47(1), 31-56.
      • Gahl, S. and Garnsey, S.M. (2004). Knowledge of grammar, knowledge of usage: Syntactic probabilities a ect pronunciation variation. Language, 80 (4), 748-775.
      • Gahl, S., Garnsey, S. M., Fisher, C., & Matzen, L. (2006). "That sounds unlikely": Syntactic probabilities affect pronunciation. Proceedings of the 28th Annual Conference of the Cognitive Science Society (pp. 1334-1339).

  2. Class: Word predictability and Word Pronunciation
    • Van Son, R., and Pols, L. (2003). How efficient is speech? Proceedings of the Institute of Phonetic Sciences, 25, 171-184.
    • Aylett, M.P. and Turk, A. (2006) Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei. The Journal of the Acoustical Society of America 119, 30-48.
    • Also covered:
      • van Son, R. and van Santen, J. (2005) Duration and spectral balance of intervocalic consonants: A case for efficient communication. Speech Communication 47(1), 100-123.
  3. Class: Frequency, Predictability and Disfluency and Gesture [ Andrew ]

    • Shriberg, E., & Stolcke, A. (1996). Word predictability after hesitations: A corpus-based study. In Proceedings of ICSLP '96.

    • Cook, S. W., Jaeger, T. F., & Tanenhaus, M. K. (2009). Producing less preferred structures: More gestures, less Fluency. In Proceedings of the 31st conference of the Cognitive Science Society. Vancouver, BC.

    • also covered:
      • Clark, H. H., & Fox Tree, J. E. (2002). Using "uh" and "um" in spontaneous speech. Cognition, 84, 73-111.

      • Tily, H., Gahl, S., Arnon, I., Kothari, A., Snider, N., and Bresnan, J. (2009). Pronunciation reflects syntactic probabilities: Evidence from spontaneous speech. Language and Cognition, 1, XX-XX.
  4. Class: Phonological and morphological reduction
    • Frank, A., & Jaeger, T.F. (2008, July). Speaking rationally: Uniform information density as an optimal strategy for language production. In The 30th annual meeting of the Cognitive Science Society (CogSci08) (p. 933-938). Washington, D.C.

    • Uriel Cohen Priva. 2008. Using Information Content to Predict Phone Deletion. Proceedings of the 27th West Coast Conference on Formal Linguistics, 90--98.
    • also covered:
      • Johnson, K. (2004). Massive reduction in conversational American English. In Spontaneous speech: Data and analysis, Proceedings of the 1st session of the 10th international symposium (The National International Institute for Japanese Language, Tokyo, Japan), 29-54.
  5. Class: Uniform Information Density and Morpho-Syntactic Production
    • Wasow, T., Jaeger, T.F., & Orr, D. (in press). Lexical variation in relativizer frequency. In H. Wiese & H. Simon (Eds.), Proceedings of the workshop on expecting the unexpected: Exceptions in grammar at the 27th annual meeting of the German Linguistic Association. Berlin and New York: Mouton de Gruyter.

    • parts of Jaeger, T.F. (submitted). Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology.
    • also covered:
      • Jaeger, T.F. (submitted). Corpus-based research on language production: Information density a ffects the syntactic reduction of subject relatives.
      • Jaeger, T.F., Levy, R., and Ferreira, V. (in progress)
      • Norcliffe, E. and Jaeger, T.F. (in progress)
  6. Class: Relative Entropy and Phrase omission [ Judith ]

    • pp 127-138 and 145-151 (Experiment 3 and 4) in Resnik, P. (1996). Selectional constraints: An information-theoretic model and its computational realization. Cognition, 61 , 127-159.
    • Brown, P., & Dell, G.S. (1987). Adapting production to comprehension: The explicit mention of instruments. Cognitive Psychology, 19 (4), 441-472.

    • also covered:
      • Koenig, J.P., Mauner, G., and Bienvenue, B. (2003). Arguments for adjuncts. Cognition 89, 67-103.
      • Ellipsis
  7. Class: Information Density and Planning Beyond the Clause, Inference, Differences across Languages [ Benjamin ]

    • Gomez Gallo, C., Jaeger, T. F., & Smyth, R. (2008, July). Incremental syntactic planning across clauses. In The 30th annual meeting of the Cognitive Science Society (CogSci08) (p. 845-850). Washington, D.C.

    • Hagoort, P., and Van Berkum, J. J. A. (2007). Beyond the sentence given. Philosophical Transactions of the Royal Society. Series B: Biological Sciences, 362, 801-811
    • also covered: Gomez Gallo, C. 2010. PhD Thesis, University of Rochester.
  8. Class: Uniform Information Density and Processing [ Andrew ]

    • Hale, J. (2001). A probabilistic earley parser as a psycholinguistic model. In Proceedings of the North American Association of Computational Linguistics.
    • Levy, R., & Jaeger, T.F. (2007). Speakers optimize information density through syntactic reduction. In B. Schlokopf, J. Platt, & T. Ho man (Eds.), Advances in neural information processing systems (NIPS) 19 (p. 849-856). Cambridge, MA: MIT Press.

    • Smith, N., & Levy, R. (2008, July). Optimal processing times in reading: A formal model and empirical investigation. In The 30th annual meeting of the Cognitive Science Society (CogSci08). Washington, D.C..

    • could also be covered:
      • Levy, R. (2008). Expectation-based syntactic comprehension. Cognition, 106 (3), 1126-1177.
  9. Class: Information Theory and Information Structure [ Judith ]

    • Prince, Ellen F. 1992. The ZPG letter: Subjects, definiteness, and information-status. In S. Thompson and W. Mann, eds., Discourse Description: Diverse Analyses of a Fundraising Text.
    • pp 1-7 pf Rosenfeld, R. (1996). A maximum entropy approach to adaptive statistical language modelling. Computer speech and language 10(3),187-
    • Kahn & Arnold (submitted, so pls don't circulate). When predictability is not enough: the additional contribution of givenness to durational reduction.

    • Tily & Piantadosi (2009). Refer efficiently: Use less informative expressions for more predictable meanings. Proceedings of the Workshop on Production of Referring Expressions, Cogsci 2009.

  10. Class: Game-Theoretic Pragmatics [ Chris & Judith]

Computational Accounts and Mechanisms

  1. Class: Connectionist Accounts of Production [ Alex ]

    • Dell, G. S., Chang, F., & Grin, Z. M. (1999). Connectionist models of language production: Lexical access and grammatical encoding. Cognitive Science: A Multidisciplinary Journal, 23 (4), 517-542. pdf

    • parts of Jaeger, T.F. (submitted). Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology.
    • also covered:
      • Chang, F., Dell, G. S., & Bock, J. K. (2006). Becoming syntactic. Psychological Review, 113 (2), 234-272. pdf

      • focusing on discussion of competition accounts, have another look at: Cook, S. W., Jaeger, T. F., & Tanenhaus, M. K. (2009). Producing less preferred structures: More gestures, less Fluency. In Proceedings of the 31st conference of the Cognitive Science Society. Vancouver, BC.

Language Change: The Link between Processing and Grammar

  1. Class: Zipf continued, early evidence from phonology and speech
    • Zipf, G.K. 1935 (73-81, 109-121) and Zipf 1949 (98-108) on phonological change pdf

    • Kuperman, V. (2008). Frequency distributions of uniphones, diphones, and triphones in spontaneous speech. JASA 124(6), 3897-3908. pdf

    • also covered:
      • Schuchardt, H. (1885) On sound laws: Against Neogrammarians. Translated by T. Vennemann and T.H. Wilbur. pdf

  2. Class: Functionalist Theories of Language Change [ Masha ]

    • Kirby, S., Cornish, H., and Smith, K. (2008). Cumulative Cultural Evolution in the Laboratory: an experimental approach to the origins of structure in human language. Proceedings of the National Academy of Sciences, 105(31):10681-10686. pdf

      • Arnon, I. & Neal Snider. 2009. More than words: Frequency effects for multi-word phrases. Journal of Memory and Language. 62(1): 67-82. pdf

    • also covered:
      • Bannard, C., Matthews, D. Stored Word Sequences in Language Learning. The Effect of Familiarity on Children’s Repetition of Four-Word Combinations.

        (2008). Psychological Science, 19, 241-248. pdf

  3. Class: Adaptation [ Alex ]

    • Wells, Christiansen, Race, Acheson, and MacDonald. (2009). Experience and sentence processing: Statistical learning and relative clause comprehension. wellsetal09

    • Toscano and McMurray 2010. pdf

    • also covered
      • Sauermann et al. pdf

  4. Class: More on Optimal Lexica and Multiple Functional Pressures [ Neal ]

    • Plotkin, J.B. and Nowak, M.A. (2000). Language evolution and information theory. Journal of Theoretical Bilogy 205(1), 147-159.
    • Ferrer i Cancho, R. (2005). Zipf’s law from a communicative phase transition. THE EUROPEAN PHYSICAL JOURNAL B, 47, 449–457.
    • Gasser, M. (2004). The origins of arbitrariness in language. Annual Conference of the Cognitive Science Society, 26.
    • Piantadosi, Tily, Gibson (2009). The communicative lexicon hypothesis. Proceedings of Cogsci 2009.
    • also covered:
      • Graff, P. and Jaeger, T.F. (submitted). Locality and Feature Specificity in OCP Effects: Evidence from Aymara, Dutch, and Javanese. CLS.
      • Excerpts from Givon, T. (1995). Functionalism and grammar. Amsterdam: John Benjamins.
      • Wasow, Perfors, and Beaver
      • Bi-directional OT approaches
  5. Class: Entropy, Neighborhood, and Paradigms [ Alex ]

    • Milin, P., Kuperman, V., Kostic, A. & Baayen, R.H. Paradigms bit by bit: an information- theoretic approach to the processing of inflection and derivation. In press in Blevins, James P. and Juliette Blevins (eds.), Analogy in Grammar: Form and Acquisition. Oxford: Oxford University Press. Milinetal09.pdf

    • Moscoso del Prado Martin et al. 2006. Putting the bits together: an information-theoretical perspective on morphological processing. Cognition. pdf

The End

  1. Final Discussion and Wonders

Additional Topics

Computational Approaches to Production

Mozer, M. C., Kinoshita, S., & Shettel, M. (2007). Sequential dependencies offer insight into cognitive control. In W. Gray (Ed.), Integrated Models of Cognitive Systems (pp. 180-193). Oxford University Press. pdf

snider & jaeger. submitted ACL-2010. pdf

HlpLab: ComputationalAccountsOfProduction (last edited 2011-06-27 17:13:58 by 192)

MoinMoin Appliance - Powered by TurnKey Linux