How to make an ExAnalysis database ready for analysis in R

Once you've finished building a database in the ExBuilder analysis software, and before you start analyzing the data in R, you will want to reorganize and reformat your data a little bit. I wrote this page, and the accompanying R script, to facilitate this process.

1. Exporting your database from the analysis software

I typically want to export all the data (that is, not just data from a particular condition or only trials with correct responses) and include relative timing information, such as time relative to the onset of the target word. In order to do this, I create a filter called "_EXPORT". I'm lazy, so I leave all the information that is checked when the filter is created as is, and select my variables of interest at a later stage (see below). In order to make sure that the "filtered" database includes all the data, I create a filter expression that is always true, such as "r_condition != "poes"". This filter selects all the data, assuming that there is a variable called "r_condition" and it's value is never "poes" (the Dutch word for cat). You can click "Preview" to make sure that the filter works, and then click "Export" to export the database to a .csv file. The export process can take a while.

2. Reformatting your (huge) database in R

The .csv file that is created when you export your data from the analysis software is very large (typically, several hundred MBs). It contains a lot of information that is irrelevant for your analysis, and it's a good idea to reorganize and reformat the .csv file and save it as an R data file (.rda). This annotated R script may help you to do so.

exanalysis-to-rda.R (May 16, 2014)

You will have to make changes to the script in order to get it to work for your database. (Note that most of following steps are optional.)

2.1. Specify the filename of your database (.csv file)

If necessary, include the full directory path, e..g. "~/Desktop/myexperiment.csv".

2.2. Specify which variables from your database you want to keep

Select only those variables that are relevant for your analysis. Getting rid of redundant or irrelevant variables makes it much easier to inspect your dataframe in R.

2.3. Specify which of those variables you want to rename, and what name(s) to use

Some of the standard names that the analysis software assigns to variables are a bit quirky or cryptic, so it's a good idea to rename those variables.

2.4. Specify the time window of interest

You may want to discard data that you would never analyze, e.g. anything that happens more than a second prior to target-word onset.

2.5. Change the formatting of some variable values

For instance, a variable may correspond to the filename of the picture that the participant clicked, including the extension ".png". You will probably want to discard this extension.

2.6. Reorder factor levels

Suppose you have a variable called "clickedon" which corresponds to the type of picture that the participant clicked on. The variable can assume the values "T", "C", and "D". (Target, competitor, or distractor.) R will automatically code this variable as a factor, and the associated factor levels will be sorted alphabetically. As a result, the order of factor levels will be "C" -> "D" -> "T". If you later plot your proportion of fixation data in a graph, the legend would show "C", then "D", then "T". You probably don't want this to be the case, so it's a good idea to choose an order for factor levels of a variable that makes sense to you.

3. Running the script

After you specify all the information above as relevant for your database, the R script will reorganize and reformat the data, and save the results to an .rda file. This file can be loaded (e.g. at the start of your analysis script) using the command "load(filename)", where "filename" corresponds to the name of the .rda file.

4. Questions, comments, suggestions

TanenhausLab: ExAnalysisToR (last edited 2014-05-19 20:36:19 by AnnePierSalverda)

MoinMoin Appliance - Powered by TurnKey Linux