Computational Accounts of Production

Synopsis:

We will start with a quick refresher (written for language researchers) on probability theory and information theory and then read a lot of papers on examples of how information content, entropy, etc. affect language production. The goal of the class would be to provide a thorough introduction to these topics, but also to discuss the short-comings of these types of accounts and their relation to other mechanistic accounts of language production.

Prerequisites

The seminar is intended for graduate students though I may consider advanced undergraduate students with a psycholinguistics background and strong interest. A very basic background in probability theory is assumed, but we'll go through the basics at the beginning of the class.

Requirements

This will be a reading/discussion seminar (not a lecture). So, even if you plan to audit I would appreciate if you do the readings (see webpage for more detail on requirements etc.).

Students who are taking the class for credits will have to prepare for every discussion. I plan to use the BlackBoard forum feature and students taking the class for credit will have to post 2 questions or comments at least 1 day before each class about the readings. Additionally, they will have to lead some discussions. There also will be a final project, which can be a discussion paper or a proposal for an experiment (or grant ;). The final write-up should be about 4-10pp.

Readings

There will be a lot readings for each day, but the goal is not for all of them to be read by everyone. Instead, we will have a short obligatory reading and then distribute additional readings across people in the class. Discussion leaders have to have read all of the papers.

Syllabus

This is a very rough draft of a syllabus. I am also blatantly stealing parts of a great class taught by Dan Jurafsky and Michael Ramscar at Stanford (Fall 2009). The list below is meant as a superset suggestion (covering all topics would take more than a semester). Please feel free to suggest additional topics or to tell me your favorites.

The Basics of Efficiency

  1. Class: Overview and Early approaches to efficiency: The Principle of Least Effort
    • Zipf 1949 (1-22) and Zipf 1935 (20-39, 172-176) on the inverse frequency-form link Zipf35-49.pdf pdf

      • Zipf, G.K. (1935). The psycho-biology of language: An introduction to dynamic philology. Houghton Mifflin.
      • Zipf, G.K. (1949). Human behaviour and the principle of least effort: An introduction to human ecology. New York: Hafner.
  2. Class: Basics of Probability Theory and Information Theory, as well as early applications to language
    • Background reading:
      • John A. Goldsmith. 2007. Probability for linguists. Goldsmith07.pdf pdf

      • Sheldon Ross. 2010. A First Course in Probability. Eigth Edition. Section 9.3 "Surprise, Uncertainty, and Entropy", pages 425-429. see [http://onlinestatbook.com/]

More than a Curious Phenomenon? Constant Entropy Rate

  1. Class: The Entropy Rate of English
    • Shannon, C.E. Prediction and entropy of printed English. Bell System Technical Journal, 30, 50-64. shannon51.pdf pdf

    • Thomas M. Cover and Roger C. King. 1978. A Convergent Gambling Estimate of the Entropy of English. IEEE Transactions on Information Theory 24:4, 413-421. coverking78.pdf pdf

    • Manin, D. 2006. Experiments on predictability of word in context and information rate in natural language. manin06.pdf pdf

  2. Class: the Noisy Channel Theorem and Language
    • Genzel, D. and Charniak, E. (2002). Entropy rate constancy in text. In Proceedings of ACL-02. genzelcharniak02.pdf pdf

    • Shannon, C.E. (1948). A Mathematical Theory of Communication. Mobile Computing and Communications Review, Volume 5, Number I. Reprinted from Bell System Technical Journal with corrections. shannon48.pdf pdf

  3. Class: Constant Entropy Rate Across Discourses
    • Keller, F. (2004). The entropy rate principle as a predictor of processing effort: An evaluation against eye-tracking data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, pp. 317-324. keller04.pdf pdf

    • Qian, T. and Jaeger, T.F. (submitted). Entropy profiles in Language: A cross-linguistic investigation. Entropy. qianjaeger10.pdf pdf

    • also covered:
      • Genzel, D. and Charniak, E. (2003). Variation of Entropy and Parse Trees of Sentences as a Function of the Sentence Number. Proceedings of the 2003 Conference on Emprical Methods in Natural Language Processing, pp. 65-72.
      • Qian, T. and Jaeger, T.F. (2009). Evidence for Efficient Language Production in Chinese. In CogSci Proceedings. qianjaeger09.pdf pdf

      • Qian. T. (2009). Efficiency of Language Production in Native and Non-native Speakers

Probability and Information in Online Language Production

  1. Class: Frequency, Predictability and Word Duration
    • Zipf 1935 (283-287) on speech rate (velocity of speech) Zipf35-49_sound.pdf pdf

    • Pluymaekers, M., Ernestus, M., and Baayen, R. (2005). Lexical frequency and acoustic reduction in spoken Dutch. The Journal of the Acoustical Society of America, 118, 25-61. pluymaekersetal05.pdf pdf

    • Alan Bell, Jason Brenier, Michelle Gregory, Cynthia Girand, and Dan Jurafsky. (2009) Predictability Effects on Durations of Content and Function Words in Conversational English. Journal of Memory and Language 60:1, 92-111. belletal09.pdf pdf

    • Also covered:
      • Aylett, M. and Turk, A. (2004). The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech, 47(1), 31-56. aylettturk04.pdf pdf

      • Gahl, S. and Garnsey, S.M. (2004). Knowledge of grammar, knowledge of usage: Syntactic probabilities a ect pronunciation variation. Language, 80 (4), 748-775.
      • Gahl, S., Garnsey, S. M., Fisher, C., & Matzen, L. (2006). "That sounds unlikely": Syntactic probabilities affect pronunciation. Proceedings of the 28th Annual Conference of the Cognitive Science Society (pp. 1334-1339).

  2. Class: Word predictability and Word Pronunciation
    • Van Son, R., and Pols, L. (2003). How efficient is speech? Proceedings of the Institute of Phonetic Sciences, 25, 171-184. vansonpols03.pdf pdf

    • Aylett, M.P. and Turk, A. (2006) Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei. The Journal of the Acoustical Society of America 119, 30-48. aylettturk06.pdf pdf

    • Also covered:
      • van Son, R. and van Santen, J. (2005) Duration and spectral balance of intervocalic consonants: A case for efficient communication. Speech Communication 47(1), 100-123. vansonvansanten05.pdf pdf

  3. Class: Frequency, Predictability and Disfluency and Gesture
    • Shriberg, E., & Stolcke, A. (1996). Word predictability after hesitations: A corpus-based study. In Proceedings of ICSLP '96.

    • Cook, S. W., Jaeger, T. F., & Tanenhaus, M. K. (2009). Producing less preferred structures: More gestures, less Fluency. In Proceedings of the 31st conference of the Cognitive Science Society. Vancouver, BC.

    • also covered:
      • Clark, H. H., & Fox Tree, J. E. (2002). Using "uh" and "um" in spontaneous speech. Cognition, 84, 73-111.

      • Tily, H., Gahl, S., Arnon, I., Kothari, A., Snider, N., and Bresnan, J. (2009). Pronunciation reflects syntactic probabilities: Evidence from spontaneous speech. Language and Cognition, 1, XX-XX.
  4. Class: Phonological and morphological reduction
    • Frank, A., & Jaeger, T.F. (2008, July). Speaking rationally: Uniform information density as an optimal strategy for language production. In The 30th annual meeting of the Cognitive Science Society (CogSci08) (p. 933-938). Washington, D.C.

    • Uriel Cohen Priva. 2008. Using Information Content to Predict Phone Deletion. Proceedings of the 27th West Coast Conference on Formal Linguistics, 90--98.
    • also covered:
      • Johnson, K. (2004). Massive reduction in conversational American English. In Spontaneous speech: Data and analysis, Proceedings of the 1st session of the 10th international symposium (The National International Institute for Japanese Language, Tokyo, Japan), 29-54.
  5. Class: Uniform Information Density and Morpho-Syntactic Production
    • Wasow, T., Jaeger, T.F., & Orr, D. (in press). Lexical variation in relativizer frequency. In H. Wiese & H. Simon (Eds.), Proceedings of the workshop on expecting the unexpected: Exceptions in grammar at the 27th annual meeting of the German Linguistic Association. Berlin and New York: Mouton de Gruyter.

    • parts of Jaeger, T.F. (submitted). Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology.
    • also covered:
      • Jaeger, T.F. (submitted). Corpus-based research on language production: Information density a ffects the syntactic reduction of subject relatives.
      • Jaeger, T.F., Levy, R., and Ferreira, V. (in progress)
      • Norcliffe, E. and Jaeger, T.F. (in progress)
  6. Class: Relative Entropy and Phrase omission
    • pp 127-138 and 145-151 (Experiment 3 and 4) in Resnik, P. (1996). Selectional constraints: An information-theoretic model and its computational realization. Cognition, 61 , 127-159.
    • Brown, P., & Dell, G.S. (1987). Adapting production to comprehension: The explicit mention of instruments. Cognitive Psychology, 19 (4), 441-472.

    • also covered:
      • Koenig, J.P., Mauner, G., and Bienvenue, B. (2003). Arguments for adjuncts. Cognition 89, 67-103.
      • Ellipsis
  7. Class: Information Density and Planning Beyond the Clause, Inference, Differences across Languages
    • Gomez Gallo, C., Jaeger, T. F., & Smyth, R. (2008, July). Incremental syntactic planning across clauses. In The 30th annual meeting of the Cognitive Science Society (CogSci08) (p. 845-850). Washington, D.C.

    • Hagoort, P., and Van Berkum, J. J. A. (2007). Beyond the sentence given. Philosophical Transactions of the Royal Society. Series B: Biological Sciences, 362, 801-811
    • also covered: Gomez Gallo, C. 2010. PhD Thesis, University of Rochester.
  8. Class: Uniform Information Density and Processing
    • Hale, J. (2001). A probabilistic earley parser as a psycholinguistic model. In Proceedings of the North American Association of Computational Linguistics.
    • Levy, R., & Jaeger, T.F. (2007). Speakers optimize information density through syntactic reduction. In B. Schlokopf, J. Platt, & T. Ho man (Eds.), Advances in neural information processing systems (NIPS) 19 (p. 849-856). Cambridge, MA: MIT Press.

    • Smith, N., & Levy, R. (2008, July). Optimal processing times in reading: A formal model and empirical investigation. In The 30th annual meeting of the Cognitive Science Society (CogSci08). Washington, D.C..

    • could also be covered:
      • Levy, R. (2008). Expectation-based syntactic comprehension. Cognition, 106 (3), 1126-1177.
  9. Class: Information Theory and Information Structure
    • Prince, Ellen F. 1992. The ZPG letter: Subjects, definiteness, and information-status. In S. Thompson and W. Mann, eds., Discourse Description: Diverse Analyses of a Fundraising Text.
    • pp 1-7 pf Rosenfeld, R. (1996). A maximum entropy approach to adaptive statistical language modelling. Computer speech and language 10(3),187-
    • Arnold, J. (I assume) CUNY talk I just reviewed that argues that givenness cannot be reduced to predictability/information density.
    • Tily & Piantadosi (2009). Refer efficiently: Use less informative expressions for more predictable meanings. Proceedings of the Workshop on Production of Referring Expressions, Cogsci 2009.

    • Also covered:
      • Excerpts from Givon, T. (1995). Functionalism and grammar. Amsterdam: John Benjamins.
      • Wasow, Perfors, and Beaver

Computational Accounts and Mechanisms

  1. Class: Connectionist Accounts of Production
    • Dell, G. S., Chang, F., & Grin, Z. M. (1999). Connectionist models of language production: Lexical access and grammatical encoding. Cognitive Science: A Multidisciplinary Journal, 23 (4), 517-542.

    • parts of Jaeger, T.F. (submitted). Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology.
    • also covered:
      • Chang, F., Dell, G. S., & Bock, J. K. (2006). Becoming syntactic. Psychological Review, 113 (2), 234-272.

      • focusing on discussion of competition accounts, have another look at: Cook, S. W., Jaeger, T. F., & Tanenhaus, M. K. (2009). Producing less preferred structures: More gestures, less Fluency. In Proceedings of the 31st conference of the Cognitive Science Society. Vancouver, BC.

Language Change: The Link between Processing and Grammar

  1. Class: Zipf continued, early evidence from phonology and speech
    • Zipf 1935 (73-81, 109-121) and Zipf 1949 (98-108) on phonological change Zipf35-49_sound.pdf pdf

    • also covered:
      • Schuchardt, H. (1885) On sound laws: Against Neogrammarians. Translated by T. Vennemann and T.H. Wilbur. schuchardt1885.pdf pdf

  2. Class: Functionalist Theories of Language Change
    • Bybee, J. (2002). Word frequency and context of use in the lexical di usion of phonetically conditioned sound change. Language Variation and Change, 14 (3), 261-290.
    • Bates, E. and MacWhinney, B. (1982)

    • also covered:
      • Bannard, M.
      • Arnon, I. and Snider, N.
  3. Class: More on Optimal Lexica and Multiple Functional Pressures
    • Plotkin, J.B. and Nowak, M.A. (2000). Language evolution and information theory. Journal of Theoretical Bilogy 205(1), 147-159.
    • Gasser, M. (2004). The origins of arbitrariness in language. Annual Conference of the Cognitive Science Society, 26.
    • Piantadosi, Tily, Gibson (2009). The communicative lexicon hypothesis. Proceedings of Cogsci 2009.
    • also covered:
      • Graff, P. and Jaeger, T.F. (submitted). Locality and Feature Specificity in OCP Effects: Evidence from Aymara, Dutch, and Javanese. CLS.
      • Bi-directional OT approaches
  4. Class: Entropy, Neighborhood, and Paradigms
    • Milin, P., Kuperman, V., Kostic, A. & Baayen, R.H. Paradigms bit by bit: an information- theoretic approach to the processing of inflection and derivation. In press in Blevins, James P. and Juliette Blevins (eds.), Analogy in Grammar: Form and Acquisition. Oxford: Oxford University Press.

The End

  1. Final Discussion and Wonders

Additional Topics

Computational Approaches to Production

  • Connectionist Models of Lexical Production

    • Speech errors (Dell, 86)
  • Information theoretic approaches to Morphological Paradigms

    • Baayen
    • Moscovo del Prado Martin
  • Priming and Implicit Learning

    • Computational Models of Skill Maintenance

      • Huber et al
    • Connectionist Models of Syntactic Priming

      • Chang et al
    • ACT-R Models of Syntactic Priming

    • Surprisal and Surprisal-based Models of Syntactic Priming

      • Snider & Jaeger

  • Phonetic Adaptation

    • Clayards et al 09; Kraljic and Samuel
  • Syntactic Adaptation

    • Wells et al 09; Sauerland et al., 09
  • Ideal Observer Approaches to Adaptation