#################################################### # UNIX, TGREP2, & TDT - THE MOST IMPORTANT COMMANDS # -- created by judith degen on 07/06/2009 #################################################### ************************************************ * PART I: navigating the directory structure ************************************************ # Log onto the LSA server: # ssh USERNAME@174.129.205.212 ssh lsa1@174.129.205.212 # Change your password! passwd # Show the contents of your current directory. If you just logged on, this will be your home directory. ls # Show the contents of the directory /corpora/TDTlite. ls /corpora/TDTlite ls -l # Move to the directory /corpora/TDTlite cd /corpora/TDTlite # Show the contents of the directory /corpora/TDTlite/sample_project ls sample_project # Move back to your home directory. cd # Move one directory up. cd .. # Check where you are. pwd # Figure out how to use a command. # man COMMANDNAME man cp # Copy the sample_project directory to your home directory and rename it. cp -r /corpora/TDTlite/sample_project . mv sample_project myproject # Create a directory. mkdir mydir # Remove the project directory from your home directory. BE VERY CAREFUL WITH THE rm COMMAND! rm -r myproject # To copy file myfile.txt from your home directory (for user lsa1 - insert your own username to download your files) on the server to your current directory on your computer (for Mac people): scp lsa1@174.129.205.212:./myfile.txt . # To copy the directory myproject from your home directory on the server to your current directory on your computer (for Mac people): scp -r lsa1@174.129.205.212:./myproject . ************************************************ * PART II: the basics of tgrep2 ************************************************ # Run tgrep2 tgrep2 "ADJP" # Run tgrep2 on the Wall Street Journal. Search for all VPs tgrep2 -c /corpora/TGrep2able/wsj_mrg.t2c.gz "ADJP" # The same, but print only terminals tgrep2 -c /corpora/TGrep2able/wsj_mrg.t2c.gz -t "ADJP" # Print the entire sentence tgrep2 -c /corpora/TGrep2able/wsj_mrg.t2c.gz -tw "ADJP" # Save the output to a file tgrep2 -c /corpora/TGrep2able/wsj_mrg.t2c.gz -tw "ADJP" > adjp.txt # View the contents of adjp.txt less adjp.txt # Search inside adjp.txt for "awesome": /awesome # Count the lines in adjp.txt wc -l adjp.txt # Output the match ID in front of the match itself. \t is a special character that inserts a tab, similarly \n inserts a newline tgrep2 -c /corpora/TGrep2able/wsj_mrg.t2c.gz -m '%xm\t%tm\n' "ADJP" # Always use the -af options, they make sure all your matches are found if for example there are multiple matches within one sentence # Two ADJP that are sisters, print first one, tab, second one tgrep2 -c /corpora/TGrep2able/wsj_mrg.t2c.gz -m '%t=a1= \t %t=a2=\n' "ADJP=a1 $ ADJP=a2" # Create a MACRO file vi # In the vi, there are a number of commands you can use: # :w FILENAME - save file as FILENAME # :q - quit # :wq - save and quit # 0 - move to start of line # $ - move to end of line # 1G - move to first line # G - move to last line # i - insert text before cursor, until is hit # a - insert text after cursor, until is hit # r - replace single character under cursor # R - replace characters until is hit # x - delete character under cursor # dd - delete entire current line # yy - copy the current line # p - paste the line(s) in the buffer into the text after the current line # Create a macro @AA that contains the ADJP pattern from above: i @ AA ADJP=a1 $ ADJP=a2; :w MACRO.ptn :q ************************************************ * PART III: regular expressions ************************************************ # Regular expressions in tgrep2 belong between // # Probably the most useful one will be /^NODENAME_START/ - which finds all nodes that begin with NODENAME_START tgrep2 -c /corpora/TGrep2able/wsj_mrg.t2c.gz "/^ADJP/" # Special characters: # ^ - start of string # $ - end of string # . - any character # * - any node # | - any of the strings separated by | # /^AD/ matches "ADJP", "ADJP-PRD", "ADVP", "ADVP-LOC", "ADVP-MNR"... # /VP$/ matches "ADVP", "VP" # /AD.P/ matches "ADVP", "ADJP", "WHADVP" # /ADVP|ADJP/ matches "ADVP", "ADJP"