Size: 3661
Comment:
|
Size: 9931
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 11: | Line 11: |
We will start with a quick refresher (written for language researchers) on probability theory and information theory and then read a lot of papers on examples of how information content, entropy, etc. affect language production. The goal of the class would be to provide a thorough introduction to these topics, but also to discuss the short-comings of these types of accounts and their relation to other mechanistic accounts of language production. == Prerequisites == The seminar is intended for graduate students though I may consider advanced undergraduate students with a psycholinguistics background and strong interest. A very basic background in probability theory is assumed, but we'll go through the basics at the beginning of the class. == Requirements == This will be a reading/discussion seminar (not a lecture). So, even if you plan to audit I would appreciate if you do the readings (see webpage for more detail on requirements etc.). Students who are taking the class for credits will have to prepare for every discussion. I plan to use the BlackBoard forum feature and students taking the class for credit will have to post 2 questions or comments at least 1 day before each class about the readings. Additionally, they will have to ''lead'' some discussions. There also will be a final project, which can be a discussion paper or a proposal for an experiment (or grant ;). The final write-up should be about 4-10pp. === Readings === There will be a lot readings for each day, but the goal is not for all of them to be read by everyone. Instead, we will have a short obligatory reading and then distribute additional readings across people in the class. Discussion leaders have to have read all of the papers. |
|
Line 12: | Line 25: |
This is a very rough draft of a syllabus. I am also blatantly stealing parts of a great class taught by Dan Jurafsky and Michael Ramscar at Stanford (Fall 2009). | This is a very rough draft of a syllabus. I am also blatantly stealing parts of a great class taught by Dan Jurafsky and Michael Ramscar at Stanford (Fall 2009). The list below is meant as a superset suggestion (covering all topics would take more than a semester). Please feel free to suggest additional topics or to tell me your favorites. |
Line 14: | Line 27: |
1. Class: Overview and Early approaches to efficiency: The Principle of Least Effort * Zipf 1949 (1-22) and Zipf 1935 (20-39, 172-176) on the inverse frequency-form link [[attachment:Zipf35-49.pdf pdf]] * Zipf, G.K. (1935). The psycho-biology of language: An introduction to dynamic philology. Houghton Mifflin. * Zipf, G.K. (1949). Human behaviour and the principle of least effort: An introduction to human ecology. New York: Hafner. 2. Class: Basics of Probability Theory and Information Theory, as well as early applications to language * Background reading: * John A. Goldsmith. 2007. Probability for linguists. [[attachment:Goldsmith07.pdf pdf]] * Sheldon Ross. 2010. A First Course in Probability. Eigth Edition. Section 9.3 "Surprise, Uncertainty, and Entropy", pages 425-429. [[attachment:Ross10.pdf pdf]], see also [http://onlinestatbook.com/] 3. Class: The Entropy Rate of English * Shannon, C.E. Prediction and entropy of printed English. Bell System Technical Journal, 30, 50-64. [[attachment:shannon51.pdf pdf]] * Thomas M. Cover and Roger C. King. 1978. A Convergent Gambling Estimate of the Entropy of English. IEEE Transactions on Information Theory 24:4, 413-421. [[attachment:coverking78.pdf pdf]] * Manin, D. 2006. Experiments on predictability of word in context and information rate in natural language. [[attachment:manin06.pdf pdf]] 4. Class: Constant Entropy Rate Across Discourses * Genzel, D. and Charniak, E. (2002). Entropy rate constancy in text. In Proceedings of ACL-02. [[attachment:genzelcharniak02.pdf pdf]] * Qian, T. and Jaeger, T.F. (submitted). Entropy profiles in Language: A cross-linguistic investigation. Entropy. [[attachment:qianjaeger10.pdf pdf]] * also covered: * Keller, F. (2004). The entropy rate principle as a predictor of processing effort: An evaluation against eye-tracking data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, pp. 317-324. [[attachment:keller04.pdf pdf]] * Genzel, D. and Charniak, E. (2003). Variation of Entropy and Parse Trees of Sentences as a Function of the Sentence Number. Proceedings of the 2003 Conference on Emprical Methods in Natural Language Processing, pp. 65-72. * Qian, T. and Jaeger, T.F. (2009). Evidence for Efficient Language Production in Chinese. In CogSci Proceedings. [[attachment:qianjaeger09.pdf pdf]] * Qian. T. (2009). Efficiency of Language Production in Native and Non-native Speakers 5. Class: the Noisy Channel Theorem * Shannon, C.E. (1948). A Mathematical Theory of Communication. Mobile Computing and Communications Review, Volume 5, Number I. Reprinted from Bell System Technical Journal with corrections. [[attachment:shannon48.pdf pdf]] 6. Class: Zipf continued, early evidence from phonology and speech * Zipf 1935 (73-81, 109-121) and Zipf 1949 (98-108) on phonological change [[attachment:Zipf35-49_sound.pdf pdf]] * also covered: * Schuchardt, H. (1885) On sound laws: Against Neogrammarians. Translated by T. Vennemann and T.H. Wilbur. [[attachment:schuchardt1885.pdf pdf]] 7. Class: Functionalist Theories of Language Change * Bybee, J. * Bates, E. and MacWhinney, B. (1982) 8. Class: Frequency, Predictability and Word Duration * Zipf 1935 (283-287) on speech rate (velocity of speech) [[attachment:Zipf35-49_sound.pdf pdf]] * Pluymaekers, M., Ernestus, M., and Baayen, R. (2005). Lexical frequency and acoustic reduction in spoken Dutch. The Journal of the Acoustical Society of America, 118, 25-61. [[attachment:pluymaekersetal05.pdf pdf]] * Alan Bell, Jason Brenier, Michelle Gregory, Cynthia Girand, and Dan Jurafsky. (2009) Predictability Effects on Durations of Content and Function Words in Conversational English. Journal of Memory and Language 60:1, 92-111. [[attachment:belletal09.pdf pdf]] * Also covered: * Aylett, M. and Turk, A. (2004). The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech, 47(1), 31-56. [[attachment:aylettturk04.pdf pdf]] 9. Class Word predictability and Word Pronunciation * Van Son, R., and Pols, L. (2003). How efficient is speech? Proceedings of the Institute of Phonetic Sciences, 25, 171-184. [[attachment:vansonpols03.pdf pdf]] * Aylett, M.P. and Turk, A. (2006) Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei. The Journal of the Acoustical Society of America 119, 30-48. [[attachment:aylettturk06.pdf pdf]] * Also covered: * van Son, R. and van Santen, J. (2005) Duration and spectral balance of intervocalic consonants: A case for efficient communication. Speech Communication 47(1), 100-123. [[attachment:vansonvansanten05.pdf pdf]] == Topics == |
|
Line 16: | Line 80: |
* Robert A. Rescorla. 1988. Pavlovian Conditioning: It's Not What You Think It Is. American Psychologist, 43(3), 151-160 PLUS * For those with no probability theory or information theory, start with: John A. Goldsmith. 2007. Probability for linguists. * For those with no information theory, the above plus: Sheldon Ross. 2010. A First Course in Probability. Eigth Edition. Section 9.3 "Surprise, Uncertainty, and Entropy", pages 425-429. |
|
Line 20: | Line 81: |
* Shannon, C. Prediction and entropy of printed English. Bell System Technical Journal, 30, 50-64. * Thomas M. Cover and Roger C. King. 1978. A Convergent Gambling Estimate of the Entropy of English. IEEE Transactions on Information Theory 24:4, 413-421. |
|
Line 23: | Line 82: |
* Zipf | * Zipf (1929/49) * Manin (2006, 2007) |
Line 44: | Line 104: |
* Beaver et al | * Wasow, Perfors, and Beaver |
Line 47: | Line 107: |
* Genzel and Charniak (2002, 2003) * Piantadosi and Gibson (2008) * Qian and Jaeger (2009, 2010a,b) |
|
Line 48: | Line 111: |
* Information density, Neighborhood density, Ambiguity (Piantadosi et al 09; Gassner) | * Information density, Neighborhood density, Ambiguity (Piantadosi et al 09; Plotkin and Nowak; Gassner 04) |
Line 50: | Line 113: |
* Plotkin and Nowak | |
Line 67: | Line 129: |
Computational Accounts of Production
Synopsis:
- Connectionist and spreading-activation models of language production (lexical and syntactic production, but with a focus on speech errors)
- Information theoretic models of incremental language production (phonetic, morphological, syntactic, and extra-syntactic preferences)
- Computational models of adaptation in language processing (implicit learning, ideal observer models)
We will start with a quick refresher (written for language researchers) on probability theory and information theory and then read a lot of papers on examples of how information content, entropy, etc. affect language production. The goal of the class would be to provide a thorough introduction to these topics, but also to discuss the short-comings of these types of accounts and their relation to other mechanistic accounts of language production.
Prerequisites
The seminar is intended for graduate students though I may consider advanced undergraduate students with a psycholinguistics background and strong interest. A very basic background in probability theory is assumed, but we'll go through the basics at the beginning of the class.
Requirements
This will be a reading/discussion seminar (not a lecture). So, even if you plan to audit I would appreciate if you do the readings (see webpage for more detail on requirements etc.).
Students who are taking the class for credits will have to prepare for every discussion. I plan to use the BlackBoard forum feature and students taking the class for credit will have to post 2 questions or comments at least 1 day before each class about the readings. Additionally, they will have to lead some discussions. There also will be a final project, which can be a discussion paper or a proposal for an experiment (or grant ;). The final write-up should be about 4-10pp.
Readings
There will be a lot readings for each day, but the goal is not for all of them to be read by everyone. Instead, we will have a short obligatory reading and then distribute additional readings across people in the class. Discussion leaders have to have read all of the papers.
Syllabus
This is a very rough draft of a syllabus. I am also blatantly stealing parts of a great class taught by Dan Jurafsky and Michael Ramscar at Stanford (Fall 2009). The list below is meant as a superset suggestion (covering all topics would take more than a semester). Please feel free to suggest additional topics or to tell me your favorites.
- Class: Overview and Early approaches to efficiency: The Principle of Least Effort
Zipf 1949 (1-22) and Zipf 1935 (20-39, 172-176) on the inverse frequency-form link Zipf35-49.pdf pdf
- Zipf, G.K. (1935). The psycho-biology of language: An introduction to dynamic philology. Houghton Mifflin.
- Zipf, G.K. (1949). Human behaviour and the principle of least effort: An introduction to human ecology. New York: Hafner.
- Class: Basics of Probability Theory and Information Theory, as well as early applications to language
- Background reading:
John A. Goldsmith. 2007. Probability for linguists. Goldsmith07.pdf pdf
Sheldon Ross. 2010. A First Course in Probability. Eigth Edition. Section 9.3 "Surprise, Uncertainty, and Entropy", pages 425-429. Ross10.pdf pdf, see also [http://onlinestatbook.com/]
- Background reading:
- Class: The Entropy Rate of English
Shannon, C.E. Prediction and entropy of printed English. Bell System Technical Journal, 30, 50-64. shannon51.pdf pdf
Thomas M. Cover and Roger C. King. 1978. A Convergent Gambling Estimate of the Entropy of English. IEEE Transactions on Information Theory 24:4, 413-421. coverking78.pdf pdf
Manin, D. 2006. Experiments on predictability of word in context and information rate in natural language. manin06.pdf pdf
- Class: Constant Entropy Rate Across Discourses
Genzel, D. and Charniak, E. (2002). Entropy rate constancy in text. In Proceedings of ACL-02. genzelcharniak02.pdf pdf
Qian, T. and Jaeger, T.F. (submitted). Entropy profiles in Language: A cross-linguistic investigation. Entropy. qianjaeger10.pdf pdf
- also covered:
Keller, F. (2004). The entropy rate principle as a predictor of processing effort: An evaluation against eye-tracking data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, pp. 317-324. keller04.pdf pdf
- Genzel, D. and Charniak, E. (2003). Variation of Entropy and Parse Trees of Sentences as a Function of the Sentence Number. Proceedings of the 2003 Conference on Emprical Methods in Natural Language Processing, pp. 65-72.
Qian, T. and Jaeger, T.F. (2009). Evidence for Efficient Language Production in Chinese. In CogSci Proceedings. qianjaeger09.pdf pdf
- Qian. T. (2009). Efficiency of Language Production in Native and Non-native Speakers
- Class: the Noisy Channel Theorem
Shannon, C.E. (1948). A Mathematical Theory of Communication. Mobile Computing and Communications Review, Volume 5, Number I. Reprinted from Bell System Technical Journal with corrections. shannon48.pdf pdf
- Class: Zipf continued, early evidence from phonology and speech
Zipf 1935 (73-81, 109-121) and Zipf 1949 (98-108) on phonological change Zipf35-49_sound.pdf pdf
- also covered:
Schuchardt, H. (1885) On sound laws: Against Neogrammarians. Translated by T. Vennemann and T.H. Wilbur. schuchardt1885.pdf pdf
- Class: Functionalist Theories of Language Change
- Bybee, J.
Bates, E. and MacWhinney, B. (1982)
- Class: Frequency, Predictability and Word Duration
Zipf 1935 (283-287) on speech rate (velocity of speech) Zipf35-49_sound.pdf pdf
Pluymaekers, M., Ernestus, M., and Baayen, R. (2005). Lexical frequency and acoustic reduction in spoken Dutch. The Journal of the Acoustical Society of America, 118, 25-61. pluymaekersetal05.pdf pdf
Alan Bell, Jason Brenier, Michelle Gregory, Cynthia Girand, and Dan Jurafsky. (2009) Predictability Effects on Durations of Content and Function Words in Conversational English. Journal of Memory and Language 60:1, 92-111. belletal09.pdf pdf
- Also covered:
Aylett, M. and Turk, A. (2004). The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech, 47(1), 31-56. aylettturk04.pdf pdf
- Class Word predictability and Word Pronunciation
Van Son, R., and Pols, L. (2003). How efficient is speech? Proceedings of the Institute of Phonetic Sciences, 25, 171-184. vansonpols03.pdf pdf
Aylett, M.P. and Turk, A. (2006) Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei. The Journal of the Acoustical Society of America 119, 30-48. aylettturk06.pdf pdf
- Also covered:
van Son, R. and van Santen, J. (2005) Duration and spectral balance of intervocalic consonants: A case for efficient communication. Speech Communication 47(1), 100-123. vansonvansanten05.pdf pdf
Topics
Computational Approaches to Production
Background in probability theory and information theory
Early applications of information theory to natural language: The entropy of English
Least Effort
- Zipf (1929/49)
- Manin (2006, 2007)
Shannon Information and Sub-Phonemic/Phonemic Reduction
- Duration reduction (Bell et al. 03, 09); Aylett and Turk 04; Pluyymaerkers et al. 05)
- Vowel weakening (Van Son and Van Santen, 05)
Shannon Information and Sub-Phonemic/Phonemic Reduction
- Phone deletion (Cohen Priva, 08)
- Fluency (Shriberg and Stolcke 96)
Shannon Information and Morpho-syntactic Reduction
- Auxiliary reduction and omission (Frank and Jaeger 08)
- Prefix deletion (Norcliffe and Jaeger 10)
- Case-marker omission
Connectionist Models of Lexical Production
- Speech errors (Dell, 86)
Connectionist Models of Syntactic Production
- Chang et al
Shannon Information and Syntactic Reduction
- Wasow et al 07; Jaeger 10a,b
Relative Entropy and Argument Omission
- Argument drop (Resnik 96)
- Ellipsis
Uncertainty Reduction and Referring Expressions
- Wasow, Perfors, and Beaver
- Tily and Piantadosi
Shannon Information and Neighborhood Entropy across the Discourse
- Genzel and Charniak (2002, 2003)
- Piantadosi and Gibson (2008)
- Qian and Jaeger (2009, 2010a,b)
Optimal Lexica
- Information density, Neighborhood density, Ambiguity (Piantadosi et al 09; Plotkin and Nowak; Gassner 04)
- Phonological optimality (Graff and Jaeger 09)
Information theoretic approaches to Morphological Paradigms
- Baayen
- Moscovo del Prado Martin
Computational Models of Priming, Implicit Learning, Adaptation
Priming and Implicit Learning
Computational Models of Skill Maintenance
- Huber et al
Connectionist Models of Syntactic Priming
- Chang et al
ACT-R Models of Syntactic Priming
Surprisal and Surprisal-based Models of Syntactic Priming
- Hale 01; Levy 08
Snider & Jaeger
Phonetic Adaptation
- Clayards et al 09; Kraljic and Samuel
Syntactic Adaptation
- Wells et al 09; Sauerland et al., 09
Ideal Observer Approaches to Adaptation