Size: 1278
Comment:
|
Size: 1305
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 4: | Line 4: |
#pragma section-numbers 2 |
Corpora
1. Gigaword
- Chinese
2. Parsed Switchboard
You must be a member of the pswbd Unix group to access this corpus.
3. TGrep2able
Corpora that have been processed to make them usable with the TGrep2 tool. See [wiki:/HlpLab/CorpusTools/ Corpus Tools] for more info on TGrep2.
4. TIGER Corpora
5. Tiger2 Corpus
6. Treebanks
Title |
File |
LDC Catalog number/Original name |
Language |
#word |
#sentence |
#story |
Original format |
Arabic Treebank Part 1 V3 |
ATB1_V3/ |
LDC2005T02 |
Arabic |
145386 |
|
734 |
|
Arabic Treebank Part 2 V2 |
ATB2_V2/ |
LDC2004T02 |
Arabic |
144199 |
|
501 |
|
Arabic Treebank Part 3 V1 |
ATB3_V1/ |
LDC2004T11 |
Arabic |
340281 |
|
600 |
|
Chinese Treebank V5.1 |
ChineseTreebank5.1/ |
LDC2005T01U01 |
Chinese |
507222 |
|
18782 |
|
Prague Dependency Treebank 2.0 |
pdt_2/ |
LDC2006T01 |
Czech |
2000000 |
|
|
|
Danish Dependency Treebank V1.0 |
ddt1.0/ |
ddt-1.0.tar |
Danish |
|
|
5540 |
|
NEGRA corpus V2.0 |
Negra2.0/ |
negra-corpus.tar.gz |
German |
|
|
20602 |
export/Penn Treebank |
Merge files |
mrg/ |
|
|
|
|
|
|