| ⇤ ← Revision 1 as of 2007-08-29 15:37:10   Size: 935 Comment:  | Size: 1278 Comment:  | 
| Deletions are marked like this. | Additions are marked like this. | 
| Line 6: | Line 6: | 
| == Gigaword == * Chinese == Parsed Switchboard == You must be a member of the `pswbd` Unix group to access this corpus. == TGrep2able == Corpora that have been processed to make them usable with the TGrep2 tool. See [wiki:/HlpLab/CorpusTools/ Corpus Tools] for more info on TGrep2. == TIGER Corpora == == Tiger2 Corpus == | 
Corpora
1. Gigaword
- Chinese
2. Parsed Switchboard
You must be a member of the pswbd Unix group to access this corpus.
3. TGrep2able
Corpora that have been processed to make them usable with the TGrep2 tool. See [wiki:/HlpLab/CorpusTools/ Corpus Tools] for more info on TGrep2.
4. TIGER Corpora
5. Tiger2 Corpus
6. Treebanks
| Title | File | LDC Catalog number/Original name | Language | #word | #sentence | #story | Original format | 
| Arabic Treebank Part 1 V3 | ATB1_V3/ | LDC2005T02 | Arabic | 145386 | 
 | 734 | 
 | 
| Arabic Treebank Part 2 V2 | ATB2_V2/ | LDC2004T02 | Arabic | 144199 | 
 | 501 | 
 | 
| Arabic Treebank Part 3 V1 | ATB3_V1/ | LDC2004T11 | Arabic | 340281 | 
 | 600 | 
 | 
| Chinese Treebank V5.1 | ChineseTreebank5.1/ | LDC2005T01U01 | Chinese | 507222 | 
 | 18782 | 
 | 
| Prague Dependency Treebank 2.0 | pdt_2/ | LDC2006T01 | Czech | 2000000 | 
 | 
 | 
 | 
| Danish Dependency Treebank V1.0 | ddt1.0/ | ddt-1.0.tar | Danish | 
 | 
 | 5540 | 
 | 
| NEGRA corpus V2.0 | Negra2.0/ | negra-corpus.tar.gz | German | 
 | 
 | 20602 | export/Penn Treebank | 
| Merge files | mrg/ | 
 | 
 | 
 | 
 | 
 | 
 | 
