Project maintained by: ["TingQian"]

Project-related newsBR

Motivation

What underlies the difference in linguistic performance between native speakers and non-native speakers? We may all have the experience of finding even perfectly grammatical speech or writings produced by non-native speakers hard to understand. On the behaviorist level, this is likely to occur if non-native speakers choose unconventional terms to convey ideas, or, they tend to ignore contextual information that is specific to the new language, thus adding redundancy to expression. However, the sources of this added difficulty in comprehension (especially from the perspectives of native speakers) are not clearly identified. That is, we want to know what kinds of changes in language production of non-native speakers have caused the difficulty in comprehension. The goal of this project is to try to find a possible account for this unnaturalness of non-native English by looking at both Chinese and English language production of native Chinese speakers.

Background

Recent studies have shown humans tend to optimize information density of a sentence during speech production in order to make it easy to understand (e.g. Jaeger, 2006). In studies on written English text, Genzel and Charniak (2002) proposed a constancy rate principle, which states if humans try to communicate in an efficient way, the information content of each individual sentence will increase with respect to its order in a paragraph. Intuitively, this means that without the knowledge of prior context, it will be difficult to understand a sentence randomly picked out from a paragraph.

The current study uses computational methods to examine whether languages spoken by people of a certain language group will vary in information density when one is their first language (Chinese is used in this study) and the other the second language (English). There are three main tasks:

  1. exploring the distribution of information content of written Chinese, for which the constant rate principle has not yet been confirmed;
  2. comparing the obtained result with information content of written English by Chinese speakers;
  3. based on 1 and 2, looking for the reason why different levels of linguistic performance are often observed between first- and second-language uses.

The entire study will be based on three corpora: Chinese, English, and English used as a second language, respectively. N-gram models will be used to compute information content for each type of language. The prospective results can shed light on what might be the cause for utterances of non-native speakers to appear unnatural. Is it imperfection with grammatical rules, or some other constraints in the brain that interferes with second language processing? The long-term goal of this project tries to find an answer to this question.

Method

Current Progress

Data

ProjectsConstantEntropyChinese (last edited 2008-04-07 03:15:41 by cpe-66-67-32-171)

MoinMoin Appliance - Powered by TurnKey Linux