Diff for "LabmeetingAu08w5"

Differences between revisions 13 and 14

This week we'll be talking about computational approaches to morpheme segmentation, the problem of finding morphemes in a string of words. The most common approach to this is MDL (minimum description length) but lots of other methods have been used as well. The readings will mostly focus on MDL. The things that I am concerned with are:

1) understanding the math behind why these approaches work

2) how to incorporate these methods into forming a psychologically realistic model of language acquisition, specifically how kids learn what is the stem, suffix, etc.

3) How to incorporate finding the morphemes into a model of finding categories (N, V, etc.)

The readings that I'd like to talk about are:

i (for everyone)) Goldsmith, J. 2005. An algorithm for the unsupervised learning of morphology.

attachment:goldsmith2005.pdf

If everyone read this paper, that would be super. The other ones below are suggestions for extra credit.

This is an application of MDL that is written pretty cleanly, and I think could easily be expanded to do category learning.

ii (For CS-types)) van den Bosch & Daekemans. 1999. Memory-based morphological analysis. attachment:vandenbosch1999.pdf

This paper uses supervised learning to do both morpheme segmentation and categorization. It's short, but either mathematically too dense for me to understand what exactly they did, or it's just not there. Computer-types would have fun deciding.

iii (For CS-types))Brent, Murthy, & Lundberg. 1995. Discovering morphemic suffixes: A case in MDL induction. attachment:brent1995.pdf

This is short, and mathematically a bit dense. It uses MDL to find morphemes, and tests whether knowing the syntactic category already helps.

iv (for newbies)) Optional tutorial on MDL by Peter Grunwald. It's a 60 or so page MDL tutorial that you can flip through. Chapter 1 is the fluffy intro, and Chapter 2 is the mathematical intro. attachment:grunwald-MDL.pdf

AttachList

-  ⇤ ← Revision 13 as of 2008-10-02 14:01:18 → 
  Size: 1923
  Editor: colossus
  Comment:
+   ← Revision 14 as of 2008-10-02 14:02:44 → ⇥
  Size: 1984
  Editor: colossus
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 13:
-i) Goldsmith, J. 2005. An algorithm for the unsupervised learning of morphology.
+i (for everyone)) Goldsmith, J. 2005. An algorithm for the unsupervised learning of morphology.
 Line 16:
-Line 18:
+Line 19:
-ii) van den Bosch & Daekemans. 1999. Memory-based morphological analysis. attachment:vandenbosch1999.pdf
+ii (For CS-types)) van den Bosch & Daekemans. 1999. Memory-based morphological analysis. attachment:vandenbosch1999.pdf
-Line 22:
+Line 23:
-iii)Brent, Murthy, & Lundberg. 1995. Discovering morphemic suffixes: A case in MDL induction. attachment:brent1995.pdf
+iii (For CS-types))Brent, Murthy, & Lundberg. 1995. Discovering morphemic suffixes: A case in MDL induction. attachment:brent1995.pdf
-Line 26:
+Line 27:
-iv) Optional tutorial on MDL by Peter Grunwald. It's a 60 or so page MDL tutorial that you can flip through. Chapter 1 is the fluffy intro, and Chapter 2 is the mathematical intro. attachment:grunwald-MDL.pdf
+iv (for newbies)) Optional tutorial on MDL by Peter Grunwald. It's a 60 or so page MDL tutorial that you can flip through. Chapter 1 is the fluffy intro, and Chapter 2 is the mathematical intro. attachment:grunwald-MDL.pdf