Differences between revisions 8 and 9

Computational Accounts of Production

Synopsis:

Connectionist and spreading-activation models of language production (lexical and syntactic production, but with a focus on speech errors)
Information theoretic models of incremental language production (phonetic, morphological, syntactic, and extra-syntactic preferences)
Computational models of adaptation in language processing (implicit learning, ideal observer models)

We will start with a quick refresher (written for language researchers) on probability theory and information theory and then read a lot of papers on examples of how information content, entropy, etc. affect language production. The goal of the class would be to provide a thorough introduction to these topics, but also to discuss the short-comings of these types of accounts and their relation to other mechanistic accounts of language production.

Prerequisites

The seminar is intended for graduate students though I may consider advanced undergraduate students with a psycholinguistics background and strong interest. A very basic background in probability theory is assumed, but we'll go through the basics at the beginning of the class.

Requirements

This will be a reading/discussion seminar (not a lecture). So, even if you plan to audit I would appreciate if you do the readings (see webpage for more detail on requirements etc.).

Students who are taking the class for credits will have to prepare for every discussion. I plan to use the BlackBoard forum feature and students taking the class for credit will have to post 2 questions or comments at least 1 day before each class about the readings. Additionally, they will have to lead some discussions. There also will be a final project, which can be a discussion paper or a proposal for an experiment (or grant ;). The final write-up should be about 4-10pp.

Readings

There will be a lot readings for each day, but the goal is not for all of them to be read by everyone. Instead, we will have a short obligatory reading and then distribute additional readings across people in the class. Discussion leaders have to have read all of the papers.

Syllabus

This is a very rough draft of a syllabus. I am also blatantly stealing parts of a great class taught by Dan Jurafsky and Michael Ramscar at Stanford (Fall 2009). The list below is meant as a superset suggestion (covering all topics would take more than a semester). Please feel free to suggest additional topics or to tell me your favorites.

The Basics of Efficiency

Class: Overview and Early approaches to efficiency: The Principle of Least Effort
Zipf 1949 (1-22) and Zipf 1935 (20-39, 172-176) on the inverse frequency-form link Zipf35-49.pdf pdf
Zipf, G.K. (1935). The psycho-biology of language: An introduction to dynamic philology. Houghton Mifflin.
Zipf, G.K. (1949). Human behaviour and the principle of least effort: An introduction to human ecology. New York: Hafner.
Class: Basics of Probability Theory and Information Theory, as well as early applications to language
Background reading:
John A. Goldsmith. 2007. Probability for linguists. Goldsmith07.pdf pdf
Sheldon Ross. 2010. A First Course in Probability. Eigth Edition. Section 9.3 "Surprise, Uncertainty, and Entropy", pages 425-429. see [http://onlinestatbook.com/]

More than a Curious Phenomenon? Constant Entropy Rate

Class: The Entropy Rate of English
Shannon, C.E. Prediction and entropy of printed English. Bell System Technical Journal, 30, 50-64. shannon51.pdf pdf
Thomas M. Cover and Roger C. King. 1978. A Convergent Gambling Estimate of the Entropy of English. IEEE Transactions on Information Theory 24:4, 413-421. coverking78.pdf pdf
Manin, D. 2006. Experiments on predictability of word in context and information rate in natural language. manin06.pdf pdf
Class: the Noisy Channel Theorem and Language
Genzel, D. and Charniak, E. (2002). Entropy rate constancy in text. In Proceedings of ACL-02. genzelcharniak02.pdf pdf
Shannon, C.E. (1948). A Mathematical Theory of Communication. Mobile Computing and Communications Review, Volume 5, Number I. Reprinted from Bell System Technical Journal with corrections. shannon48.pdf pdf
Class: Constant Entropy Rate Across Discourses
Keller, F. (2004). The entropy rate principle as a predictor of processing effort: An evaluation against eye-tracking data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, pp. 317-324. keller04.pdf pdf
Qian, T. and Jaeger, T.F. (submitted). Entropy profiles in Language: A cross-linguistic investigation. Entropy. qianjaeger10.pdf pdf
also covered:
Genzel, D. and Charniak, E. (2003). Variation of Entropy and Parse Trees of Sentences as a Function of the Sentence Number. Proceedings of the 2003 Conference on Emprical Methods in Natural Language Processing, pp. 65-72.
Qian, T. and Jaeger, T.F. (2009). Evidence for Efficient Language Production in Chinese. In CogSci Proceedings. qianjaeger09.pdf pdf
Qian. T. (2009). Efficiency of Language Production in Native and Non-native Speakers

Probability and Information in Online Language Production

Class: Frequency, Predictability and Word Duration
Zipf 1935 (283-287) on speech rate (velocity of speech) Zipf35-49_sound.pdf pdf
Pluymaekers, M., Ernestus, M., and Baayen, R. (2005). Lexical frequency and acoustic reduction in spoken Dutch. The Journal of the Acoustical Society of America, 118, 25-61. pluymaekersetal05.pdf pdf
Alan Bell, Jason Brenier, Michelle Gregory, Cynthia Girand, and Dan Jurafsky. (2009) Predictability Effects on Durations of Content and Function Words in Conversational English. Journal of Memory and Language 60:1, 92-111. belletal09.pdf pdf
Also covered:
Aylett, M. and Turk, A. (2004). The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech, 47(1), 31-56. aylettturk04.pdf pdf
Gahl, S. and Garnsey, S.M. (2004). Knowledge of grammar, knowledge of usage: Syntactic probabilities aect pronunciation variation. Language, 80 (4), 748-775.
Gahl, S., Garnsey, S. M., Fisher, C., & Matzen, L. (2006). "That sounds unlikely": Syntactic probabilities affect pronunciation. Proceedings of the 28th Annual Conference of the Cognitive Science Society (pp. 1334-1339).
Class: Word predictability and Word Pronunciation
Van Son, R., and Pols, L. (2003). How efficient is speech? Proceedings of the Institute of Phonetic Sciences, 25, 171-184. vansonpols03.pdf pdf
Aylett, M.P. and Turk, A. (2006) Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei. The Journal of the Acoustical Society of America 119, 30-48. aylettturk06.pdf pdf
Also covered:
van Son, R. and van Santen, J. (2005) Duration and spectral balance of intervocalic consonants: A case for efficient communication. Speech Communication 47(1), 100-123. vansonvansanten05.pdf pdf
Class: Frequency, Predictability and Disfluency and Gesture
Shriberg, E., & Stolcke, A. (1996). Word predictability after hesitations: A corpus-based study. In Proceedings of ICSLP '96.
Cook, S. W., Jaeger, T. F., & Tanenhaus, M. K. (2009). Producing less preferred structures: More gestures, less Fluency. In Proceedings of the 31st conference of the Cognitive Science Society. Vancouver, BC.
also covered:
Clark, H. H., & Fox Tree, J. E. (2002). Using "uh" and "um" in spontaneous speech. Cognition, 84, 73-111.
Tily, H., Gahl, S., Arnon, I., Kothari, A., Snider, N., and Bresnan, J. (2009). Pronunciation reflects syntactic probabilities: Evidence from spontaneous speech. Language and Cognition, 1, XX-XX.
Class: Phonological and morphological reduction
Frank, A., & Jaeger, T.F. (2008, July). Speaking rationally: Uniform information density as an optimal strategy for language production. In The 30th annual meeting of the Cognitive Science Society (CogSci08) (p. 933-938). Washington, D.C.
Uriel Cohen Priva. 2008. Using Information Content to Predict Phone Deletion. Proceedings of the 27th West Coast Conference on Formal Linguistics, 90--98.
Class: Uniform Information Density and Morpho-Syntactic Production
Wasow, T., Jaeger, T.F., & Orr, D. (in press). Lexical variation in relativizer frequency. In H. Wiese & H. Simon (Eds.), Proceedings of the workshop on expecting the unexpected: Exceptions in grammar at the 27th annual meeting of the German Linguistic Association. Berlin and New York: Mouton de Gruyter.
parts of Jaeger, T.F. (submitted). Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology.
also covered:
Jaeger, T.F. (submitted). Corpus-based research on language production: Information density affects the syntactic reduction of subject relatives.
Jaeger, T.F., Levy, R., and Ferreira, V. (in progress)
Norcliffe, E. and Jaeger, T.F. (in progress)
Class: Relative Entropy and Phrase omission
pp 127-138 and 145-151 (Experiment 3 and 4) in Resnik, P. (1996). Selectional constraints: An information-theoretic model and its computational realization. Cognition, 61 , 127-159.
Brown, P., & Dell, G.S. (1987). Adapting production to comprehension: The explicit mention of instruments. Cognitive Psychology, 19 (4), 441-472.
also covered:
Koenig, J.P., Mauner, G., and Bienvenue, B. (2003). Arguments for adjuncts. Cognition 89, 67-103.
Class: Information Density and Planning Beyond the Clause, Inference, Differences across Languages
Gomez Gallo, C., Jaeger, T. F., & Smyth, R. (2008, July). Incremental syntactic planning across clauses. In The 30th annual meeting of the Cognitive Science Society (CogSci08) (p. 845-850). Washington, D.C.
Hagoort, P., and Van Berkum, J. J. A. (2007). Beyond the sentence given. Philosophical Transactions of the Royal Society. Series B: Biological Sciences, 362, 801-811
also covered: Gomez Gallo, C. 2010. PhD Thesis, University of Rochester.
Class: Uniform Information Density and Processing
Hale, J. (2001). A probabilistic earley parser as a psycholinguistic model. In Proceedings of the North American Association of Computational Linguistics.
Levy, R., & Jaeger, T.F. (2007). Speakers optimize information density through syntactic reduction. In B. Schlokopf, J. Platt, & T. Homan (Eds.), Advances in neural information processing systems (NIPS) 19 (p. 849-856). Cambridge, MA: MIT Press.
Smith, N., & Levy, R. (2008, July). Optimal processing times in reading: A formal model and empirical investigation. In The 30th annual meeting of the Cognitive Science Society (CogSci08). Washington, D.C..
could also be covered:
Levy, R. (2008). Expectation-based syntactic comprehension. Cognition, 106 (3), 1126-1177.
Class: Information Theory and Information Structure
Prince, Ellen F. 1992. The ZPG letter: Subjects, definiteness, and information-status. In S. Thompson and W. Mann, eds., Discourse Description: Diverse Analyses of a Fundraising Text.
pp 1-7 pf Rosenfeld, R. (1996). A maximum entropy approach to adaptive statistical language modelling. Computer speech and language 10(3),187-
Arnold, J. (I assume) CUNY talk I just reviewed that argues that givenness cannot be reduced to predictability/information density.
Tily & Piantadosi (2009). Refer efficiently: Use less informative expressions for more predictable meanings. Proceedings of the Workshop on Production of Referring Expressions, Cogsci 2009.
Also covered:
Excerpts from Givon, T. (1995). Functionalism and grammar. Amsterdam: John Benjamins.
Wasow, Perfors, and Beaver

Computational Accounts and Mechanisms

Class: Connectionist Accounts of Production
Dell, G. S., Chang, F., & Grin, Z. M. (1999). Connectionist models of language production:

Lexical access and grammatical encoding. Cognitive Science: A Multidisciplinary Journal, 23 (4), 517-542.

parts of Jaeger, T.F. (submitted). Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology.
also covered:
Chang, F., Dell, G. S., & Bock, J. K. (2006). Becoming syntactic. Psychological Review, 113 (2), 234-272.
focusing on discussion of competition accounts, have another look at: Cook, S. W., Jaeger, T. F., & Tanenhaus, M. K. (2009). Producing less preferred structures: More gestures, less Fluency. In Proceedings of the 31st conference of the Cognitive Science Society. Vancouver, BC.

Language Change: The Link between Processing and Grammar

Class: Zipf continued, early evidence from phonology and speech
Zipf 1935 (73-81, 109-121) and Zipf 1949 (98-108) on phonological change Zipf35-49_sound.pdf pdf
also covered:
Schuchardt, H. (1885) On sound laws: Against Neogrammarians. Translated by T. Vennemann and T.H. Wilbur. schuchardt1885.pdf pdf
Class: Functionalist Theories of Language Change
Bybee, J. (2002). Word frequency and context of use in the lexical diusion of phonetically conditioned sound change. Language Variation and Change, 14 (3), 261-290.
Bates, E. and MacWhinney, B. (1982)
also covered:
Bannard, M.
Arnon, I. and Snider, N.
Class: More on Optimal Lexica and Multiple Functional Pressures
Plotkin, J.B. and Nowak, M.A. (2000). Language evolution and information theory. Journal of Theoretical Bilogy 205(1), 147-159.
Gasser, M. (2004). The origins of arbitrariness in language. Annual Conference of the Cognitive Science Society, 26.
Piantadosi, Tily, Gibson (2009). The communicative lexicon hypothesis. Proceedings of Cogsci 2009.
also covered:
Graff, P. and Jaeger, T.F. (submitted). Locality and Feature Specificity in OCP Effects: Evidence from Aymara, Dutch, and Javanese. CLS.
Bi-directional OT approaches
Class: Entropy, Neighborhood, and Paradigms
Milin, P., Kuperman, V., Kostic, A. & Baayen, R.H. Paradigms bit by bit: an information- theoretic approach to the processing of inflection and derivation. In press in Blevins, James P. and Juliette Blevins (eds.), Analogy in Grammar: Form and Acquisition. Oxford: Oxford University Press.

The End

Final Discussion and Wonders

Topics

Computational Approaches to Production

Background in probability theory and information theory
Early applications of information theory to natural language: The entropy of English
Least Effort
Zipf (1929/49)
Manin (2006, 2007)
Shannon Information and Sub-Phonemic/Phonemic Reduction
Duration reduction (Bell et al. 03, 09); Aylett and Turk 04; Pluyymaerkers et al. 05)
Vowel weakening (Van Son and Van Santen, 05)
Shannon Information and Sub-Phonemic/Phonemic Reduction
Phone deletion (Cohen Priva, 08)
Fluency (Shriberg and Stolcke 96)
Shannon Information and Morpho-syntactic Reduction
Auxiliary reduction and omission (Frank and Jaeger 08)
Prefix deletion (Norcliffe and Jaeger 10)
Case-marker omission
Connectionist Models of Lexical Production
Speech errors (Dell, 86)
Connectionist Models of Syntactic Production
Chang et al
Shannon Information and Syntactic Reduction
Wasow et al 07; Jaeger 10a,b
Relative Entropy and Argument Omission
Argument drop (Resnik 96)
Ellipsis
Uncertainty Reduction and Referring Expressions
Wasow, Perfors, and Beaver
Tily and Piantadosi
Shannon Information and Neighborhood Entropy across the Discourse
Genzel and Charniak (2002, 2003)
Piantadosi and Gibson (2008)
Qian and Jaeger (2009, 2010a,b)
Optimal Lexica
Information density, Neighborhood density, Ambiguity (Piantadosi et al 09; Plotkin and Nowak; Gassner 04)
Phonological optimality (Graff and Jaeger 09)
Information theoretic approaches to Morphological Paradigms
Baayen
Moscovo del Prado Martin

Computational Models of Priming, Implicit Learning, Adaptation

Priming and Implicit Learning
Computational Models of Skill Maintenance
Huber et al
Connectionist Models of Syntactic Priming
Chang et al
ACT-R Models of Syntactic Priming
Surprisal and Surprisal-based Models of Syntactic Priming
Hale 01; Levy 08
Snider & Jaeger
Phonetic Adaptation
Clayards et al 09; Kraljic and Samuel
Syntactic Adaptation
Wells et al 09; Sauerland et al., 09
Ideal Observer Approaches to Adaptation

-  ⇤ ← Revision 8 as of 2010-01-18 01:34:56 → 
  Size: 15793
  Editor: cpe-67-240-134-21
  Comment:
+   ← Revision 9 as of 2010-01-18 01:51:05 → ⇥
  Size: 17412
  Editor: cpe-67-240-134-21
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 27:
+'''The Basics of Efficiency''
 Line 38:
+'''More than a Curious Phenomenon? Constant Entropy Rate'''
-Line 77:
+Line 78:
-   * Tily, H., Gahl, S., Arnon, I., Kothari, A., Snider, N., and Bresnan, J. (2009). Pronunciation reflects syntactic probabilities: Evidence from spontaneous speech.
Language and Cognition, 1, XX-XX.
+   * Tily, H., Gahl, S., Arnon, I., Kothari, A., Snider, N., and Bresnan, J. (2009). Pronunciation reflects syntactic probabilities: Evidence from spontaneous speech. Language and Cognition, 1, XX-XX.
 Line 99:
-. Class: Information Density and Planning Beyond the Clause
+. Class: Information Density and Planning Beyond the Clause, Inference, Differences across Languages
 Line 101:
+  * Hagoort, P., and Van Berkum, J. J. A. (2007). Beyond the sentence given. Philosophical Transactions of the Royal Society. Series B: Biological Sciences, 362, 801-811
-Line 111:
+Line 112:
+. Class: Information Theory and Information Structure
   * Prince, Ellen F. 1992. The ZPG letter: Subjects, definiteness, and information-status. In S. Thompson and W. Mann, eds., Discourse Description: Diverse Analyses of a Fundraising Text. 
   * pp 1-7 pf Rosenfeld, R. (1996). A maximum entropy approach to adaptive statistical language modelling. Computer speech and language 10(3),187-
   * Arnold, J. (I assume) CUNY talk I just reviewed that argues that givenness cannot be reduced to predictability/information density.
   * Tily & Piantadosi (2009). Refer efficiently: Use less informative expressions for more predictable meanings. Proceedings of the Workshop on Production of Referring Expressions, Cogsci 2009.
   * Also covered:
     * Excerpts from Givon, T. (1995). Functionalism and grammar. Amsterdam: John Benjamins.
     * Wasow, Perfors, and Beaver
-Line 112:
+Line 122:
 . Class: Connectionist Accounts of Production
-Line 123:
+Line 133:
 . Class: Zipf continued, early evidence from phonology and speech
-Line 128:
+Line 138:
 . Class: Functionalist Theories of Language Change
-Line 136:
+Line 146:
 . Class: More on Optimal Lexica and Multiple Functional Pressures
-Line 141:
+Line 151:
-   * Graff, P. and Jaeger, T.F. (submitted). Locality and Feature Specificity in OCP Effects:
Evidence from Aymara, Dutch, and Javanese. CLS.
+   * Graff, P. and Jaeger, T.F. (submitted). Locality and Feature Specificity in OCP Effects: Evidence from Aymara, Dutch, and Javanese. CLS.
   * Bi-directional OT approaches

 19. Class: Entropy, Neighborhood, and Paradigms
  * Milin, P., Kuperman, V., Kostic, A. & Baayen, R.H. Paradigms bit by bit: an information- theoretic approach to the processing of inflection and derivation. In press in Blevins, James P. and Juliette Blevins (eds.), Analogy in Grammar: Form and Acquisition. Oxford: Oxford University Press.

'''The End'''
 20. Final Discussion and Wonders