Differences between revisions 8 and 9
Revision 8 as of 2008-03-21 02:08:38
Size: 534
Editor: cpe-66-67-32-171
Comment:
Revision 9 as of 2008-03-21 02:11:55
Size: 2424
Editor: cpe-66-67-32-171
Comment:
Deletions are marked like this. Additions are marked like this.
Line 11: Line 11:
== Motivation == == Background & Motivation ==
Line 13: Line 13:
== Background == The current study uses computational methods to examine whether written languages by people of a certain language group will vary in information density when one is their first language (Chinese is used in this study) and the other the second language (English). Recent studies have shown humans tend to optimize information density of a sentence during speech production in order to make it easy to understand (e.g. Jaeger, 2006). In studies on written English text, Genzel and Charniak (2002) proposed a constancy rate principle, which states if humans try to communicate in an efficient way, the information content of each individual sentence will increase with respect to its order in a paragraph. Intuitively, this means that without the knowledge of prior context, it will be difficult to understand a sentence randomly picked out from a paragraph.

In this project, there are three main tasks: 1) exploring the distribution of information content of written Chinese, for which the constant rate principle has not yet been confirmed, 2) comparing the obtained result with information content of written English by Chinese speakers, and 3) based on 1 and 2, looking for the reason why different levels of linguistic performance are often observed between first- and second-language uses.

The entire study will be based on three corpora: Chinese, English, and English used as a second language, respectively. Models of n-gram and lexicalized probabilistic context-free grammar will be used to compute information content for each type of language. The prospective results can shed light on what might be the cause for utterances of non-native speakers to appear unnatural. Is it imperfection with grammatical rules, or some other constraints in the brain that interferes with second language processing? The long-term goal of this project tries to find an answer to this question.

Project maintained by: ["TingQian"]

Project-related newsBR

Background & Motivation

The current study uses computational methods to examine whether written languages by people of a certain language group will vary in information density when one is their first language (Chinese is used in this study) and the other the second language (English). Recent studies have shown humans tend to optimize information density of a sentence during speech production in order to make it easy to understand (e.g. Jaeger, 2006). In studies on written English text, Genzel and Charniak (2002) proposed a constancy rate principle, which states if humans try to communicate in an efficient way, the information content of each individual sentence will increase with respect to its order in a paragraph. Intuitively, this means that without the knowledge of prior context, it will be difficult to understand a sentence randomly picked out from a paragraph.

In this project, there are three main tasks: 1) exploring the distribution of information content of written Chinese, for which the constant rate principle has not yet been confirmed, 2) comparing the obtained result with information content of written English by Chinese speakers, and 3) based on 1 and 2, looking for the reason why different levels of linguistic performance are often observed between first- and second-language uses.

The entire study will be based on three corpora: Chinese, English, and English used as a second language, respectively. Models of n-gram and lexicalized probabilistic context-free grammar will be used to compute information content for each type of language. The prospective results can shed light on what might be the cause for utterances of non-native speakers to appear unnatural. Is it imperfection with grammatical rules, or some other constraints in the brain that interferes with second language processing? The long-term goal of this project tries to find an answer to this question.

Method

Current Progress

ProjectsConstantEntropyChinese (last edited 2008-04-07 03:15:41 by cpe-66-67-32-171)

MoinMoin Appliance - Powered by TurnKey Linux