Differences between revisions 20 and 30 (spanning 10 versions)
Revision 20 as of 2009-07-09 17:35:32
Size: 4409
Editor: adsl-63-197-16-149
Comment:
Revision 30 as of 2011-08-10 19:29:30
Size: 5190
Editor: echidna
Comment:
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
[wiki:HlpLab/LSA09/Syllabus Syllabus] | [wiki:HlpLab/LSA09/Assignments Assignments] | [wiki:HlpLab/LSA09/People People] | [wiki:HlpLab/LSA09/CorporaTutorials Corpora & Tutorials] | [wiki:HlpLab/LSA09/References References] | [http://lsa2009.berkeley.edu/courses/lsa125.html Offical LSA course page] [[LSA09Syllabus|Syllabus]] | [[LSA09Assignments|Assignments]] | [[LSA09People|People]] | [[LSA09CorporaTutorials|Corpora & Tutorials]] | [[LSA09References|Readings]] | [[http://lsa2009.berkeley.edu/courses/lsa125.html|Offical LSA course page]]
Line 16: Line 16:
To log onto the corpus server: If your username is '''lsa1''' to '''lsa30''' log onto the corpus server:
{{{
ssh <username>@174.129.5.193
}}}
Line 18: Line 21:
If your username is '''lsa31''' to '''lsa99''' log onto the corpus server:
Line 31: Line 35:
attachment:cheatsheet.txt [[attachment:cheatsheet.txt]]
Line 45: Line 49:
 * International Corpus of English (/corpora/TGrep2able/icegb.t2c.gz)
Line 49: Line 54:
 * York-Toronto-Helsinki Parsed Corpus of Old English Prose (/corpora/TGrep2able/ycoe.t2c.gz)
Line 50: Line 56:
== TGrep2 == == TGrep2 and the TGrep2 Database Tools (TDT) ==
Line 52: Line 58:
Here is the TGrep2 User Manual with information on how to run TGrep2, Tgrep2 options and pattern syntax, how to create MACRO files, etc. attachment:TGrep2Manual.pdf Here is the TGrep2 User Manual with information on how to run TGrep2, Tgrep2 options and pattern syntax, how to create MACRO files, etc. [[attachment:TGrep2Manual.pdf]]

For tutorial-style introductions to TGrep2, try these: http://www.bcs.rochester.edu/people/fjaeger/teaching/tutorials/TGrep2/LabSyntax-Tutorial.html and http://www.stanford.edu/dept/linguistics/corpora/cas-tut-tgrep.html

For preliminary documentation on the TDT Tools, see the [[TDT2|TDT]] page
Line 56: Line 66:
For the Switchboard-specific bracketing conventions: [[attachment:swbd_bracketing.pdf]]

== Other search software ==

The Corpus Query Processor (CQP) - a tutorial: [[attachment:cqp_tutorial.pdf]]
Line 58: Line 74:
This guide is a brief schematic introduction to computing in the R language. Basic statistical concepts such as variables, descriptive statistics, scales, reasoning, hypothesis testing and power analysis are defined and explained. attachment:StatsNotes1.pdf, attachment:StatsNotes2.pdf. This guide is a brief schematic introduction to computing in the R language. Basic statistical concepts such as variables, descriptive statistics, scales, reasoning, hypothesis testing and power analysis are defined and explained. [[attachment:StatsNotes1.pdf]], [[attachment:StatsNotes2.pdf]].
Line 60: Line 76:
Also, consider out [wiki:HlpLab/StatsCourses lab-internal stats tutorials] with R-scripts, reading suggestions, etc. There also are a [http://hlplab.wordpress.com couple of posts on our HLP lab blog] about visualization, model fitting, common issues, snippets of R code, etc. The easiest way to find what you're looking for is to just enter it as a search term on the blog page linked above. Also, consider out [[StatsCourses|lab-internal stats tutorials]] with R-scripts, reading suggestions, etc. There also are a [[http://hlplab.wordpress.com|couple of posts on our HLP lab blog]] about visualization, model fitting, common issues, snippets of R code, etc. The easiest way to find what you're looking for is to just enter it as a search term on the blog page linked above.
Line 62: Line 78:
Finally, if you decide to make R your choice for data analysis, we recommend that you enroll to the [https://ling.ucsd.edu/mailman/listinfo.cgi/r-lang R-lang email list]. It's low-traffic, and directed at language researchers. The perfect place to ask questions. Finally, if you decide to make R your choice for data analysis, we recommend that you enroll to the [[https://ling.ucsd.edu/mailman/listinfo.cgi/r-lang|R-lang email list]]. It's low-traffic, and directed at language researchers. The perfect place to ask questions.
Line 64: Line 80:
[[AttachList]]
<<AttachInfo>>
<<AttachList>>
Line 68: Line 84:
[wiki:HlpLab/LSA09/Syllabus Syllabus] | [wiki:HlpLab/LSA09/Assignments Assignments] | [wiki:HlpLab/LSA09/People People] | [wiki:HlpLab/LSA09/CorporaTutorials Corpora & Tutorials] | [wiki:HlpLab/LSA09/References References] | [http://lsa2009.berkeley.edu/courses/lsa125.html Offical LSA course page] [[LSA09Syllabus|Syllabus]] | [[LSA09Assignments|Assignments]] | [[LSA09People|People]] | [[LSA09CorporaTutorials|Corpora & Tutorials]] | [[LSA09References|Readings]] | [[http://lsa2009.berkeley.edu/courses/lsa125.html|Offical LSA course page]]


Syllabus | Assignments | People | Corpora & Tutorials | Readings | Offical LSA course page


Corpora and Tutorials

1. Logging onto the corpus server

To do your corpus work, you'll have to log onto the corpus server via SSH. Mac users and those of you using the Windows computers in the computer facilities on campus are all set. For those of you bringing private laptops that run Windows and don't already have an ssh/scp program, please download and install one, e.g. PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/) or OpenSSH (http://sourceforge.net/projects/sshwindows/files/OpenSSH%20for%20Windows%20-%20Release/setupssh381-20040709.zip).

If your username is lsa1 to lsa30 log onto the corpus server:

ssh <username>@174.129.5.193

If your username is lsa31 to lsa99 log onto the corpus server:

ssh <username>@174.129.205.212

Usernames and passwords will be distributed in the first class meeting.

...and if you don't know what any of this means, don't panic - we'll have a tutorial during the first week of class where we'll explain how to log on to the server and use basic Unix commands.

2. Unix Tutorial

If you don't have any experience with Unix (or if you want to refresh your memory), there will be a tutorial on Tuesday, 7/7, at 6:30pm in 212 Wheeler Hall. We'll go through basic commands that you need to navigate a Unix system. All corpus work will be done on a Unix system, so if you don't feel comfortable with Unix, please come to the tutorial! If you have a laptop, bring it along. If you don't, there will also be computers available in the room.

You can download today's Unix/vi/tgrep2 cheatsheet here: cheatsheet.txt Note that there is a Part III on regular expressions that we didn't address. Don't worry about this.

3. Available Corpora

The following corpora are available on the server for TGrep2 searches:

  • Arabic Treebank (/corpora/TGrep2able/arabic-collapsed.t2c.gz)
  • British National Corpus (full) (/corpora/TGrep2able/BNC.parsed.t2c.gz)
  • British National Corpus (spoken) (/corpora/TGrep2able/BNC_spoken.parsed.t2c.gz)
  • British National Corpus (written) (/corpora/TGrep2able/BNC_written.parsed.t2c.gz)
  • Brown Corpus (/corpora/TGrep2able/brown.t2c.gz)
  • Chinese Trebank (/corpora/TGrep2able/chtb6.t2c.gz)
  • International Corpus of English (/corpora/TGrep2able/icegb.t2c.gz)
  • NEGRA (/corpora/TGrep2able/negra.t2c.gz) -- written German
  • Switchboard Corpus (/corpora/TGrep2able/sw.backtrans.convid_020607.t2c.gz)
  • TIGER (/corpora/TGrep2able/tiger.t2c.gz) -- written German
  • Wall Street Journal (/corpora/TGrep2able/wsj_mrg.t2c.gz)
  • York-Toronto-Helsinki Parsed Corpus of Old English Prose (/corpora/TGrep2able/ycoe.t2c.gz)

4. TGrep2 and the TGrep2 Database Tools (TDT)

Here is the TGrep2 User Manual with information on how to run TGrep2, Tgrep2 options and pattern syntax, how to create MACRO files, etc. TGrep2Manual.pdf

For tutorial-style introductions to TGrep2, try these: http://www.bcs.rochester.edu/people/fjaeger/teaching/tutorials/TGrep2/LabSyntax-Tutorial.html and http://www.stanford.edu/dept/linguistics/corpora/cas-tut-tgrep.html

For preliminary documentation on the TDT Tools, see the TDT page

For the Penn Treebank Bracketing Conventions: http://bulba.sdsu.edu/jeanette/thesis/PennTags.html

For the Switchboard-specific bracketing conventions: swbd_bracketing.pdf

5. Other search software

The Corpus Query Processor (CQP) - a tutorial: cqp_tutorial.pdf

6. R Tutorials

This guide is a brief schematic introduction to computing in the R language. Basic statistical concepts such as variables, descriptive statistics, scales, reasoning, hypothesis testing and power analysis are defined and explained. StatsNotes1.pdf, StatsNotes2.pdf.

Also, consider out lab-internal stats tutorials with R-scripts, reading suggestions, etc. There also are a couple of posts on our HLP lab blog about visualization, model fitting, common issues, snippets of R code, etc. The easiest way to find what you're looking for is to just enter it as a search term on the blog page linked above.

Finally, if you decide to make R your choice for data analysis, we recommend that you enroll to the R-lang email list. It's low-traffic, and directed at language researchers. The perfect place to ask questions.

There are 7 attachment(s) stored for this page.

  • [get | view] (2021-04-22 12:55:37, 192.3 KB) [[attachment:StatsNotes1.pdf]]
  • [get | view] (2021-04-22 12:55:37, 147.8 KB) [[attachment:StatsNotes2.pdf]]
  • [get | view] (2021-04-22 12:55:37, 162.3 KB) [[attachment:TGrep2Manual.pdf]]
  • [get | view] (2021-04-22 12:55:37, 4.1 KB) [[attachment:cheatsheet.txt]]
  • [get | view] (2021-04-22 12:55:37, 248.9 KB) [[attachment:cqp_tutorial.pdf]]
  • [get | view] (2021-04-22 12:55:37, 196.5 KB) [[attachment:swbd_bracketing.pdf]]
  • [get | view] (2021-04-22 12:55:37, 364.4 KB) [[attachment:tdt_manual.pdf]]
 All files | Selected Files: delete move to page copy to page


Syllabus | Assignments | People | Corpora & Tutorials | Readings | Offical LSA course page


LSA09CorporaTutorials (last edited 2011-08-10 19:29:30 by echidna)

MoinMoin Appliance - Powered by TurnKey Linux