Differences between revisions 1 and 6 (spanning 5 versions)

Computational Accounts of Production

Synopsis:

Connectionist and spreading-activation models of language production (lexical and syntactic production, but with a focus on speech errors)
Information theoretic models of incremental language production (phonetic, morphological, syntactic, and extra-syntactic preferences)
Computational models of adaptation in language processing (implicit learning, ideal observer models)

We will start with a quick refresher (written for language researchers) on probability theory and information theory and then read a lot of papers on examples of how information content, entropy, etc. affect language production. The goal of the class would be to provide a thorough introduction to these topics, but also to discuss the short-comings of these types of accounts and their relation to other mechanistic accounts of language production.

Prerequisites

The seminar is intended for graduate students though I may consider advanced undergraduate students with a psycholinguistics background and strong interest. A very basic background in probability theory is assumed, but we'll go through the basics at the beginning of the class.

Requirements

This will be a reading/discussion seminar (not a lecture). So, even if you plan to audit I would appreciate if you do the readings (see webpage for more detail on requirements etc.).

Students who are taking the class for credits will have to prepare for every discussion. I plan to use the BlackBoard forum feature and students taking the class for credit will have to post 2 questions or comments at least 1 day before each class about the readings. Additionally, they will have to lead some discussions. There also will be a final project, which can be a discussion paper or a proposal for an experiment (or grant ;). The final write-up should be about 4-10pp.

Readings

There will be a lot readings for each day, but the goal is not for all of them to be read by everyone. Instead, we will have a short obligatory reading and then distribute additional readings across people in the class. Discussion leaders have to have read all of the papers.

Syllabus

This is a very rough draft of a syllabus. I am also blatantly stealing parts of a great class taught by Dan Jurafsky and Michael Ramscar at Stanford (Fall 2009). The list below is meant as a superset suggestion (covering all topics would take more than a semester). Please feel free to suggest additional topics or to tell me your favorites.

Class: Overview and Early approaches to efficiency: The Principle of Least Effort
- Zipf 1949 (1-22) and Zipf 1935 (20-39, 172-176) on the inverse frequency-form link [attachment:Zipf35-49.pdf pdf]
  - Zipf, G.K. 1935. The psycho-biology of language: An introduction to dynamic philology. Houghton Mifflin.
  - Zipf, G.K. (1949). Human behaviour and the principle of least effort: An introduction to human ecology. New York: Hafner.
Class: Zipf continued, early evidence from phonology and speech
- Zipf 1935 (73-81, 109-121) and Zipf 1949 (98-108) on phonological change [attachment:Zipf35-49_sound.pdf pdf]
- Zipf 1935 (283-287) on speech rate (velocity of speech) in same file
- also covered:
  - Schuchardt, H. 1885. On sound laws: Against Neogrammarians. Translated by T. Vennemann and T.H. Wilbur.
Class: Basics of Probability Theory and Information Theory, as well as early applications to language
- Background reading:
  - John A. Goldsmith. 2007. Probability for linguists. [attachment:Goldsmith07.pdf pdf]
  - Sheldon Ross. 2010. A First Course in Probability. Eigth Edition. Section 9.3 "Surprise, Uncertainty, and Entropy", pages 425-429. [attachment:Ross10.pdf pdf]
- Shannon, C. Prediction and entropy of printed English. Bell System Technical Journal, 30, 50-64. [attachment:shannon51.pdf pdf]
- Thomas M. Cover and Roger C. King. 1978. A Convergent Gambling Estimate of the Entropy of English. IEEE Transactions on Information Theory 24:4, 413-421. [attachment:coverking78.pdf pdf]
Class
- Manin, D. 2006. Experiments on predictability of word in context and information rate in natural language.

Topics

Computational Approaches to Production

Background in probability theory and information theory
- Robert A. Rescorla. 1988. Pavlovian Conditioning: It's Not What You Think It Is. American Psychologist, 43(3), 151-160 PLUS
- 2. Early applications of information theory to natural language: The entropy of English
- Shannon, C. Prediction and entropy of printed English. Bell System Technical Journal, 30, 50-64.
- Thomas M. Cover and Roger C. King. 1978. A Convergent Gambling Estimate of the Entropy of English. IEEE Transactions on Information Theory 24:4, 413-421.
Least Effort
- Zipf (1929/49)
- Manin (2006, 2007)
Shannon Information and Sub-Phonemic/Phonemic Reduction
- Duration reduction (Bell et al. 03, 09); Aylett and Turk 04; Pluyymaerkers et al. 05)
- Vowel weakening (Van Son and Van Santen, 05)
Shannon Information and Sub-Phonemic/Phonemic Reduction
- Phone deletion (Cohen Priva, 08)
- Fluency (Shriberg and Stolcke 96)
Shannon Information and Morpho-syntactic Reduction
- Auxiliary reduction and omission (Frank and Jaeger 08)
- Prefix deletion (Norcliffe and Jaeger 10)
- Case-marker omission
Connectionist Models of Lexical Production
- Speech errors (Dell, 86)
Connectionist Models of Syntactic Production
- Chang et al
Shannon Information and Syntactic Reduction
- Wasow et al 07; Jaeger 10a,b
Relative Entropy and Argument Omission
- Argument drop (Resnik 96)
- Ellipsis
Uncertainty Reduction and Referring Expressions
- Wasow, Perfors, and Beaver
- Tily and Piantadosi
Shannon Information and Neighborhood Entropy across the Discourse
- Genzel and Charniak (2002, 2003)
- Piantadosi and Gibson (2008)
- Qian and Jaeger (2009, 2010a,b)
Optimal Lexica
- Information density, Neighborhood density, Ambiguity (Piantadosi et al 09; Plotkin and Nowak; Gassner 04)
- Phonological optimality (Graff and Jaeger 09)
Information theoretic approaches to Morphological Paradigms
- Baayen
- Moscovo del Prado Martin

Computational Models of Priming, Implicit Learning, Adaptation

Priming and Implicit Learning
Computational Models of Skill Maintenance
- Huber et al
Connectionist Models of Syntactic Priming
- Chang et al
ACT-R Models of Syntactic Priming
Surprisal and Surprisal-based Models of Syntactic Priming
- Hale 01; Levy 08
- Snider & Jaeger
Phonetic Adaptation
- Clayards et al 09; Kraljic and Samuel
Syntactic Adaptation
- Wells et al 09; Sauerland et al., 09
Ideal Observer Approaches to Adaptation

-  ⇤ ← Revision 1 as of 2009-12-13 00:54:02 → 
  Size: 3661
  Editor: cpe-67-240-134-21
  Comment:
+   ← Revision 6 as of 2010-01-17 22:33:26 → ⇥
  Size: 7126
  Editor: cpe-67-240-134-21
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 11:
+We will start with a quick refresher (written for language researchers) on probability theory and information theory and then read a lot of papers on examples of how information content, entropy, etc. affect language production. The goal of the class would be to provide a thorough introduction to these topics, but also to discuss the short-comings of these types of accounts and their relation to other mechanistic accounts of language production. 

== Prerequisites ==
The seminar is intended for graduate students though I may consider advanced undergraduate students with a psycholinguistics background and strong interest. A very basic background in probability theory is assumed, but we'll go through the basics at the beginning of the class.

== Requirements ==
This will be a reading/discussion seminar (not a lecture). So, even if you plan to audit I would appreciate if you do the readings (see webpage for more detail on requirements etc.). 

Students who are taking the class for credits will have to prepare for every discussion. I plan to use the BlackBoard forum feature and students taking the class for credit will have to post 2 questions or comments at least 1 day before each class about the readings. Additionally, they will have to ''lead'' some discussions. There also will be a final project, which can be a discussion paper or a proposal for an experiment (or grant ;). The final write-up should be about 4-10pp.

=== Readings ===
There will be a lot readings for each day, but the goal is not for all of them to be read by everyone. Instead, we will have a short obligatory reading and then distribute additional readings across people in the class. Discussion leaders have to have read all of the papers.
-Line 12:
+Line 25:
-This is a very rough draft of a syllabus. I am also blatantly stealing parts of a great class taught by Dan Jurafsky and Michael Ramscar at Stanford (Fall 2009).
+This is a very rough draft of a syllabus. I am also blatantly stealing parts of a great class taught by Dan Jurafsky and Michael Ramscar at Stanford (Fall 2009). The list below is meant as a superset suggestion (covering all topics would take more than a semester). Please feel free to suggest additional topics or to tell me your favorites.
-Line 14:
+Line 27:
+. Class: Overview and Early approaches to efficiency: The Principle of Least Effort
  * Zipf 1949 (1-22) and Zipf 1935 (20-39, 172-176) on the inverse frequency-form link [attachment:Zipf35-49.pdf pdf]
   * Zipf, G.K. 1935. The psycho-biology of language: An introduction to dynamic philology. Houghton Mifflin.
   * Zipf, G.K. (1949). Human behaviour and the principle of least effort: An introduction to human ecology. New York: Hafner.

 2. Class: Zipf continued, early evidence from phonology and speech
  * Zipf 1935 (73-81, 109-121) and Zipf 1949 (98-108) on phonological change [attachment:Zipf35-49_sound.pdf pdf]
  * Zipf 1935 (283-287) on speech rate (velocity of speech) in same file
  * also covered:
    * Schuchardt, H. 1885. On sound laws: Against Neogrammarians. Translated by T. Vennemann and T.H. Wilbur.
 3. Class: Basics of Probability Theory and Information Theory, as well as early applications  to language
  * Background reading:
   * John A. Goldsmith. 2007. Probability for linguists. [attachment:Goldsmith07.pdf pdf]
   * Sheldon Ross. 2010. A First Course in Probability. Eigth Edition. Section 9.3 "Surprise, Uncertainty, and Entropy", pages 425-429. [attachment:Ross10.pdf pdf]
  * Shannon, C. Prediction and entropy of printed English. Bell System Technical Journal, 30, 50-64. [attachment:shannon51.pdf pdf]
  * Thomas M. Cover and Roger C. King. 1978. A Convergent Gambling Estimate of the Entropy of English. IEEE Transactions on Information Theory 24:4, 413-421. [attachment:coverking78.pdf pdf]

 4. Class
  * Manin, D. 2006. Experiments on predictability of word in context and information rate in natural language.

== Topics ==
-Line 17:
+Line 52:
-  * For those with no probability theory or information theory, start with: John A. Goldsmith. 2007. Probability for linguists.
  * For those with no information theory, the above plus: Sheldon Ross. 2010. A First Course in Probability. Eigth Edition. Section 9.3 "Surprise, Uncertainty, and Entropy", pages 425-429.
 2. '''Early applications of information theory to natural language: The entropy of English'''
+  *  2. '''Early applications of information theory to natural language: The entropy of English'''
-Line 23:
+Line 56:
-  * Zipf
+  * Zipf (1929/49)
  * Manin (2006, 2007)
-Line 44:
+Line 78:
-  * Beaver et al
+  * Wasow, Perfors, and Beaver
-Line 47:
+Line 81:
+  * Genzel and Charniak (2002, 2003)
  * Piantadosi and Gibson (2008)
  * Qian and Jaeger (2009, 2010a,b)
-Line 48:
+Line 85:
-  * Information density, Neighborhood density, Ambiguity (Piantadosi et al 09; Gassner)
+  * Information density, Neighborhood density, Ambiguity (Piantadosi et al 09; Plotkin and Nowak; Gassner 04)
-Line 50:
+Line 87:
-  * Plotkin and Nowak
-Line 67:
+Line 103: