This week we'll be talking about computational approaches to morpheme segmentation, the problem of finding morphemes in a string of words. The most common approach to this is MDL (minimum description length) but lots of other methods have been used as well. The readings will mostly focus on MDL. The things that I am concerned with are: 1) understanding the math behind why these approaches work 2) how to incorporate these methods into forming a psychologically realistic model of language acquisition, specifically how kids learn what is the stem, suffix, etc. 3) How to incorporate finding the morphemes into a model of finding categories (N, V, etc.)

The readings that I'd like to talk about are: i) Goldsmith, J. 2005. An algorithm for the unsupervised learning of morphology. This is an application of MDL that is written pretty cleanly, and I think could easily be expanded to do category learning. ii) van den Bosch & Daekemans. 1999. Memory-based morphological analysis.

MoinMoin Appliance - Powered by TurnKey Linux