[wiki:HlpLab/LSA09/Syllabus Syllabus] | [wiki:HlpLab/LSA09/Assignments Assignments] | [wiki:HlpLab/LSA09/People People] | [wiki:HlpLab/LSA09/CorporaTutorials Corpora & Tutorials] | [wiki:HlpLab/LSA09/References References] | [http://lsa2009.berkeley.edu/courses/lsa125.html Offical LSA course page]

Reading and References

We've put together a couple of general readings suggestions for corpus-based research on psycholinguistics in addition to the specific readings mentioned on the syllabus. They are listed below the references.

1. References

AttachList

2. Reading Themes

Each section below summarizes a couple of papers on a particular issue that will be covered in class. We don't at all expect you to read all these papers, it's more to give you pointers for further readings. At the end of each section you find what we identify to be a good entry reading on that topic.

2.1. Accessibility: Availability and Alignment in Sentence Production

Syntactic variation has been attributed to accessibility. For the purpose of this class, accessibility refers to ease of retrieval. Accessibility-based accounts for e.g. word order alternations say that the relative accessibility of the referents described by the different constituents affects speakers' word order preferences.

Two specific proposals have been discussed and tested in detail in the literature. Psycholinguistic alignment accounts (e.g. Bock and Warren, 1985) state that speakers prefer to align conceptually accessible referents with higher grammatical functions (this accounts resemble linguistic accounts of alignments, as e.g. in Aissen, 2003; Bresnan et al., 2001). Availability accounts, on the other hand, state that speakers prefer to mention accessible referents early in the sentence (Levelt & Maassen, 1981; Ferreira, 1996; Ferreira and Dell, 2000). For English these two accounts make very similar predictions, but for other languages they don't necessarily. We recommend Branigan et al. (2007) for a direct comparison and summary of previous work. See also Jaeger and Norcliffe (in press) for a summary of the relevant cross-linguistics work.

2.2. Length and Word Order in Sentence Production

John Hawkins' books from 1994 and 2004 are both classics. We may scan some portions of these, but definitely read Hawkins (2007).

And please read one of the following (whichever ones you don't read are optional):

Yamashita and Change (2001): Gives a Hawkins-style account of ordering preferences in a production experiment with native speakers of Japanese. Choi (1997): This could also go in the Availability and Alignment section, but the paper deals with ordering preferences driven by discourse and other factors in an LFG framework. Arnold et al. (2000): This study tries to tease apart the contributions of syntactic weight and discourse status in post-verbal constituent ordering. Gildea and Temperley (2008): This is an interesting attempt to test the hypothesis that dependency length minimization constrains grammar, insofar as grammars are optimal systems for simultaneously minimizing numerous dependency lengths. The paper is very technical (from a computational linguistics journal), but you can get the gist from the introduction if you don't want to wade through all the details.

2.3. Ambiguity Avoidance in Sentence Production

Please read Haywood et al. (2005) and Arnold et al (2004).

Optional reading: Kraljic and Brennan (2005)

2.4. Uniform Information Density

Uniform Information Density is a recently emerging account of language production (Jaeger, 2006; Levy & Jaeger, 2007; Jaeger, submitted, in prep), according to which speakers' choices in production are driven by a preference to distribute information uniformly across the linguistic signal. Information is defined information theoretically (Shannon, 1948) with reference to probability distribution (the probable an event is the more information its occurrence carries).

Uniform Information Density has been tested against corpus data from phonetic reduction (Jaeger & Kidd, 2008; building on Bell et al., 2003, 2009), morpho-syntactic reduction (Frank & Jaeger, 2008), syntactic reduction (Jaeger, 2006, in press, in prep; Levy & Jaeger, 2007), and against inter-clausal planning (Gomez Gallo et al., 2008). Data from the distribution of disfluencies and gestures has also been argued to be supporting the principle of Uniform Information Density (Cook et al., 2009).

Short introductions can be found in Levy and Jaeger (2007, rather technical) and Frank and Jaeger (2008). A more in depth discussion in journal format is found in Jaeger (submitted).

2.5. Psycholinguistic Corpus-based work on Syntactic Variation

For some examples, of corpus-based psycholinguistic research on syntactic production, see:

Arnold et al. (2000) on various word order variations;
Bresnan et al. (2007) on the ditransitive alternation;
Wasow (1997) on heavy NP shift;
Lohse et al. (2003) on particle shift;
Roland et al. (2005) and Jaeger (submitted) on complementizer-mentioning;
Wasow et al. (in press) on relativizer-mentioning;
Jaeger (in prep) on passive subject-extracted relative clause reduction;
Frank and Jaeger (2008) on auxiliary contraction;

Example of a corpus-based approach using mixed logit models are given in Bresnan et al. (2007) and Jaeger (in press).

3. Sociolinguistic Corpus-based work on Syntactic Variation

For some examples, of corpus-based sociolinguistic research on syntactic production, see:

Tagliamonte and Smith (2005) on complementizer-mentioning
Tagliamonte et al. (2005) on relativizer-mentioning

Both are very nice papers that are easy to understand.

3.1. Grammaticization and Gradient Grammaticality in Syntactic Variation

Please read Bresnan and Hay (2007) and Torres Calcoullos and Walker (2009)

Optional: Bresnan et al. (2007)

3.2. Statistics for Corpus-based Research

By far the most useful resource is Harald Baayen's book, which you can download for free from his website. The book is R-based, as you should be if you want to fit in.

For a nice article about mixed-effects regression models, see Baayen et al. (2008) ; in the same issue of JML, you can find Florian's article (Jaeger (2008)) about mixed logit models (which are especially useful in psycholinguistics). Please read both articles.

For a discussion of statistics with respect specifically to sociolinguistic corpus research, have a look at Johnson (2009)--this is optional if your project doesn't deal with sociolinguistics; if it does, read it...if your project falls into some gray area that may or may not count as sociolinguistic, read it.

It would also be good to read some papers which actually demonstrate how sophisticated statistical techniques can be applied to corpus-based research. Please read at least one of the following: Roland et al. (2005) , Bell et al. (2003) , Jaeger (submitted) , or Bresnan et al. (2007)