Differences between revisions 9 and 10
Revision 9 as of 2008-03-21 02:11:55
Size: 2424
Editor: cpe-66-67-32-171
Comment:
Revision 10 as of 2008-03-21 03:01:19
Size: 2932
Editor: cpe-66-67-32-171
Comment:
Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
'''''Project-related news'''''[[BR]]
 * ''Mar 20'' -- Ting Qian is going to present the current progress of this project at the [http://www.rochester.edu/College/ugresearch/expo.html Undergraduate Research Exposition 2008]. Specific data and time will be announced.
'''Project-related news'''[[BR]]
 * ''Mar 20'' -- I am going to present the current progress of this project at the [http://www.rochester.edu/College/ugresearch/expo.html Undergraduate Research Exposition 2008]. Specific data and time will be announced here.
Line 11: Line 11:
== Background & Motivation == == Motivation ==
Line 13: Line 13:
The current study uses computational methods to examine whether written languages by people of a certain language group will vary in information density when one is their first language (Chinese is used in this study) and the other the second language (English). Recent studies have shown humans tend to optimize information density of a sentence during speech production in order to make it easy to understand (e.g. Jaeger, 2006). In studies on written English text, Genzel and Charniak (2002) proposed a constancy rate principle, which states if humans try to communicate in an efficient way, the information content of each individual sentence will increase with respect to its order in a paragraph. Intuitively, this means that without the knowledge of prior context, it will be difficult to understand a sentence randomly picked out from a paragraph. What underlies the difference in linguistic performance between native speakers and non-native speakers? We may all have the experience of finding even perfectly grammatical speech or writings produced by non-native speakers sometimes hard to understanding. On the behaviorist level, this is likely to occur if a non-native speaker chooses unconventional terms to express ideas, or he or she ignores contextual information that is specific to the new language and thus adds redundancy to expression.
Line 15: Line 15:
In this project, there are three main tasks: 1) exploring the distribution of information content of written Chinese, for which the constant rate principle has not yet been confirmed, 2) comparing the obtained result with information content of written English by Chinese speakers, and 3) based on 1 and 2, looking for the reason why different levels of linguistic performance are often observed between first- and second-language uses. The current study uses computational methods to examine whether languages spoken by people of a certain language group will vary in information density when one is their first language (Chinese is used in this study) and the other the second language (English).

== Background ==

Recent studies have shown humans tend to optimize information density of a sentence during speech production in order to make it easy to understand (e.g. Jaeger, 2006). In studies on written English text, Genzel and Charniak (2002) proposed a constancy rate principle, which states if humans try to communicate in an efficient way, the information content of each individual sentence will increase with respect to its order in a paragraph. Intuitively, this means that without the knowledge of prior context, it will be difficult to understand a sentence randomly picked out from a paragraph.

In my project, there are three main tasks:
 1. exploring the distribution of information content of written Chinese, for which the constant rate principle has not yet been confirmed;
 2. comparing the obtained result with information content of written English by Chinese speakers;
 3. based on 1 and 2, looking for the reason why different levels of linguistic performance are often observed between first- and second-language uses.

Project maintained by: ["TingQian"]

Project-related newsBR

Motivation

What underlies the difference in linguistic performance between native speakers and non-native speakers? We may all have the experience of finding even perfectly grammatical speech or writings produced by non-native speakers sometimes hard to understanding. On the behaviorist level, this is likely to occur if a non-native speaker chooses unconventional terms to express ideas, or he or she ignores contextual information that is specific to the new language and thus adds redundancy to expression.

The current study uses computational methods to examine whether languages spoken by people of a certain language group will vary in information density when one is their first language (Chinese is used in this study) and the other the second language (English).

Background

Recent studies have shown humans tend to optimize information density of a sentence during speech production in order to make it easy to understand (e.g. Jaeger, 2006). In studies on written English text, Genzel and Charniak (2002) proposed a constancy rate principle, which states if humans try to communicate in an efficient way, the information content of each individual sentence will increase with respect to its order in a paragraph. Intuitively, this means that without the knowledge of prior context, it will be difficult to understand a sentence randomly picked out from a paragraph.

In my project, there are three main tasks:

  1. exploring the distribution of information content of written Chinese, for which the constant rate principle has not yet been confirmed;
  2. comparing the obtained result with information content of written English by Chinese speakers;
  3. based on 1 and 2, looking for the reason why different levels of linguistic performance are often observed between first- and second-language uses.

The entire study will be based on three corpora: Chinese, English, and English used as a second language, respectively. Models of n-gram and lexicalized probabilistic context-free grammar will be used to compute information content for each type of language. The prospective results can shed light on what might be the cause for utterances of non-native speakers to appear unnatural. Is it imperfection with grammatical rules, or some other constraints in the brain that interferes with second language processing? The long-term goal of this project tries to find an answer to this question.

Method

Current Progress

ProjectsConstantEntropyChinese (last edited 2008-04-07 03:15:41 by cpe-66-67-32-171)

MoinMoin Appliance - Powered by TurnKey Linux