BCS152SPRExperiment

Creating Stimuli For Web-base Self Paced Reading

To run an experiment we need to create stimuli (made up of critical items that have differences in line with our experimental maniuplation and fillers), order them into trials and balance within and across participants the order and type of stimuli they see. This guide will teach you some terminology, give you a short course in experimental list design and tell you how to create your stimuli for use with our web-based self-paced reading software.

In creating stimuli for your experiment, you might also find the following page useful, with links to tools that let you obtain word frequencies, predictability, neighborhood density, and so on.

Terminology

Participant: a participant in our experiments (sometimes called a subject).
Session: a session for which a participant comes in. This may contain one experiment, or it may combine several experiments. For our purposes we will only have one session per participant.
Experiment: a set of lists of stimuli, of which each participant sees one. Each experiment may contain several embedded experiments. That is, we often run stimuli from several experiments mixed together, so that the item stimuli from one experiment can serve as fillers for the other and vice versa.
Lists: define in which order a participant will see the stimuli. An experiment typically has several lists, but each participant only sees one list. This is used to balance conditions across lists (and hence participants).
Design: Experiments are determined by their design. We typically fully cross manipulations, leading to factorial designs, e.g. a 2 x 2 (manipulating two binary factors) leading to 4 conditions.
Stimuli: anything that constitutes one presentation (one trial), e.g. a picture to describe, a sentence to read, a word to remember, etc.
Item: an item usually is a set of stimuli that only differ along the dimensions that define the conditions of the experiment (manipulations). So, in a 2 x 2 design, each item has 4 conditions (= four different versions). There are between-item designs where this logic breaks down, but let's ignore this for now. The stimuli within an item should only differ as defined by the design. Everything else between them should be the same. E.g., here's an example item for a 2x2 design, crossing word frequency and predictability, where the target word is in italics:
- a) High predictability, high frequency: He ate some pizza.
  b) High predictability, low frequency: He ate some kimchi.
  c) Low predictability, high frequency: He saw some pizza.
  d) Low predictability, low frequency: He saw some kimchi.
Note that the different stimuli aim to hold the context (he ate/saw some) as constant as possible. We also generally aim to hold other factors that might matter constant. For example, in the above example item, all target words a bisyllabic (have two syllables) with stress on the first syllable. Additionally, both ate and saw are mono-syllabic, so that the target words is equally many syllables away both fro the start and the end of the sentence.
Filler: a stimulus that is not of primary interest for the question we want to address with the experiment. Fillers are held constant across lists. For more detail, see the Section on Fillers below.

Latin square design

We will use a Latin square experimental design in your studies. We have scripts that will automatically create a Latin square from a specially formatted list of stimuli described later. But, knowing what a Latin square design is and how to make one is something you should understand. First it will be important to know the specifics of how participants saw your stimuli when you sit down to analyze your data. Second, it will let you interpret your findings with other experiments that both use this design or alternatives.

The idea behind a Latin square design (a between-subject design) is to have each participant see each item exactly once AND to see all conditions equally often AND across lists each item should be seen equally often in all its conditions. Since each participants sees exactly one list of the experiment, this means that each list should contain each item only once (in one of its condition) and all conditions equally often. One thing that follows from this is that the number of items that an experiment should have should be a multiple of the number of the experiment's conditions. For power consideration, we often have between 6- to 8-times as many items as there are conditions in the experiment and an equal number of subjects, but that's another matter. For more subtle effects, you may need even more items.

Items and lists

How should the items and conditions be distributed across lists? Let's consider a 2 x 2 design, with the four conditions a1, b1, a2, and b2. The minimum number of lists we will need is four. Let's say that we use 8-times more items than condition (a.k.a. 8 items per condition) and that List1 looks like this (prior to sorting and prior to the inclusion of fillers, more on that later):

List1:
- Item1 in condition a1
- Item2 in condition b1
- Item3 in condition a2
- Item4 in condition b2
- Item5 in condition a1
- Item6 in condition b1
- Item7 in condition a2
- Item8 in condition b2
- ...
- Item29 in condition a1
- Item30 in condition b1
- Item31 in condition a2
- Item32 in condition b2

So, List1 contains each item only once, and each condition (a1, b1, a2, b2) occurs equally often (8 times). Now we want a second list that fulfills these constraints and brings us closer to the third constraints stated above, that --across lists-- each item should be seen equally often in all its conditions. To achieve this, we construct List2 by simply shifting the condition one up. So the condition of Item1 in List2 will be the one of Item2 in List1, etc. The last item of List2 will occur in the same condition as the first item of List1

List2:
- Item1 in condition b1
- Item2 in condition a2
- Item3 in condition b2
- Item4 in condition a1
- Item5 in condition b1
- Item6 in condition a2
- Item7 in condition b2
- Item8 in condition a1
- ...
- Item29 in condition b1
- Item30 in condition a2
- Item31 in condition b2
- Item32 in condition a1

If we repeat this for List3 and List4 (see below), you will see that --across lists-- each item occurs exactly once in each of its conditions, and --within lists-- each item occurs once and all conditions occur equally often across items. That's what we want. (Latin square designs are most powerful, if each item is seen equally often in each condition across all participants. That is, we want each list to be seen by equally many participants, but that is a matter to be kept track of later when we run the experiment)

List3:
- Item1 in condition a2
- Item2 in condition b2
- Item3 in condition a1
- Item4 in condition b1
- ...
- Item29 in condition a2
- Item30 in condition b2
- Item31 in condition a1
- Item32 in condition b1

etc.

Fillers and lists

All lists have the same fillers. Lists typically have at least one filler between two items in the list. Since we usually have at least twice as many fillers as items that is easy to satisfy. What should be avoided, however, is that fillers and items are distributed according to some pattern, e.g. filler, filler, item, filler, filler, item, filler, filler, item, ... That is not good since participants may pick up on such patterns. One way to avoid this is to fully randomize lists, however, that might result in situations where participants see a long chain of items (and even worse if they are all the same condition). An alternative is to use a pseudo-random order where the items and fillers are intermixed somewhat randomly but avoiding excessively long chains of similar trials. For the purposes of this experiment we will pseudo-randomize stimuli using pre-written scripts so don't worry about this, but be aware that this means that you cannot specify your items or fillers to appear in a particular order.

Making stimuli

Here is an example of stimuli from a 2x2 design with a box around the critical words:

In this design the experimenters are manipulating the verb ('saw', 'ate') and final noun ('apple', 'kiwi'). This is an example of one item with four conditions:

I saw the apple
I saw the kiwi
I ate the apple
I ate the kiwi

In a within-participant Latin square design (which is what you will be running), participants see this item once in one and only one of its conditions. The first manipulation is to make the verb either uninformative unconstraining (e.g. 'saw'; you can see many things and it doesn't cause you to generate any constraining expectations) or informative constraining ('ate'; there is a much more constrained set of edible things in the world, generating an expectation of an edible noun). The second manipulation is to make the final noun either high frequency (e.g. 'apple') or low frequency (e.g. 'kiwi'). In this design the experimenters are interested in reading times of the final noun (RT = reading time); you can have a different region of interest in your own study.

Some notes on critical items

The sentences here are only different as it relates to the manipulations. That is everything is held constant except for the portions that are part of the experimental design: I __ the __. Design your items to be like this too (for another example see the stimuli below). Notice that the manipulations are also very controlled. In the example above the verbs are both one syllable and equally frequent, the difference between them is minimized as much as possible to how constraining they are. The final nouns are also very similar: both have two syllables. The region of interest is held constant too, the final noun is the same distance from both the start and end of the sentence.

Some notes on filler items

Fillers are often just considered stimuli that aren't of further interest to the experimenter. It is dangerous though to underestimate the importance of fillers. There are many examples where fillers matter a lot. Fillers may determine what participants take to be the task. Consider, for example, the difference between having only grammatical or also ungrammatical fillers in some task paradigms.

Generally, fillers have two main functions:

distract from the items, which may otherwise make it to obvious what we are interested in
counter-balance some aspects of items to prevent that participants (subconsciously) learn about the distribution of stimuli in the experiments and then start acting strategically based on information that reflects their understanding of the experiment rather than the question of interest.

Several consequences follow from these two central purposes of fillers:

In order to distract from items, ...
fillers need to be similar to the items (e.g. in terms of complexity and structure) --though, of course, they also need to be different.
fillers need to form groups/clusters just like items. Imagine seeing a 10 of 40 items with a certain lexical property (e.g. a certain verb) because one of four conditions of an experiment requires that lexical property. In that case, groups of fillers of similar size (10ish) should also be created distracting from the repeated property of items that would otherwise stand out.

Self-paced reading and questions

After participants read through a sentence they will have to answer a Yes/No question. Questions are primarily to make sure participants are actually reading your sentences (you can also use questions to see how participants are interpreting your sentences but that requires very careful designing so it is not recommended). Just like fillers you have to put some care into how questions are designed. Here are some important things to do:

To avoid answer bias you should have an equal number of 'yes' and 'no' responses, doing so also makes it more likely participants will pay attention.
- Since fillers will be the same for all participants have half of the filler questions have a 'yes' and half have a 'no' response.
- For each item keep the question and response the same, this will make it easy to have an equal number of 'yes' and 'no' responses across items. If it's not possible to keep the question exactly the same try to keep the questions as similar as possible while still keeping the response the same.
Make sure your questions ask about different parts of the sentences. Otherwise participants might ignore all of your sentences except for the part necessary to answer your questions.

What your stimuli file should look like

Creating lists for latin square design with a good mix of fillers between items can be done automatically. However, the format given to those scripts needs to follow a very specific format. Here is an example from a real experiment run in Professor Jaeger's lab on what your sheet should look like: ExampleLists.xlsx
It's in the current Excel format (XLSX), although we can also handle the older (XLS) format, and the comma separated value (CSV) format. Excerpted below are the first few rows of from the example file:

Experiment	ItemName	Condition	Sentence	Question	Answer
control	1	AMV	The excited fans moved through the crowd and sauntered to front stage.	Was the place empty?	N
control	1	ARC	The excited fans moved through the crowd broke apart from Jane accidentally.	Was the place empty?	N
control	1	UMV	The excited fans flew through the crowd and sauntered to front stage.	Was the place empty?	N
control	1	URC	The excited fans that were moved through the crowd broke apart from Jane accidentally.	Was the place empty?	N
control	2	AMV	The aging professors phoned about the midterm and were surprised by the workload.	Was somebody concerned about a homework set?	N
control	2	ARC	The aging professors phoned about the midterm were surprised by all the questions.	Was somebody concerned about a homework set?	N
...
filler	41	-	The parcel arrived too late to be of any use.	Did the package arrive on time?	N
filler	42	-	The espresso machine was broken for months before finally being fixed.	Was the espresso machine fixed?	Y
...

There are six (6) required columns: Experiment, ItemName,Condition, Sentence, Question, and Answer. See the previous sections for important notes about each.
Experiment distinguishes between items from different sub-experiments being run in the same script. We won't be doing that (strictly speaking) but we will be using filler versus whatever you decide to call your experiment (in the example it is just called control).
- ItemName is to identify an individual item within an experiment. This should be a number. For items this will be the same across all conditions (see the example file).
- Condition is to identify which condition the item is in within the experiment. Condition names should not contain any spaces or special symbols, except for ".". It's common to use capitalized names. For example, for the example item given at the beginning of this page, we might call our conditions: HighPred.HighFreq, HighPred.LowFreq, LowPred.HighFreq, and LowPred.LowFreq. For filler items this should be "-". We will be randomizing the lists and each item will appear once in each list (with a different condition across lists) as specified in the latin square design above.
- Sentence is the stimulus item that will be self-pace read.
- Question is the comprehension question to ensure that subjects are actually paying attention and not just holding down the space bar. Within an item, you want the same question, or minimally different questions and they should lead to the same answer. For items, you often want to probe a specific part of the stimulus to make sure that that part was indeed processed correctly. Across all stimuli, however, it is advisable to probe different parts of the sentences to roughly equal extents. For example, if you only asked questions about the first word of all sentences, participants might pick up on that and stop paying attention to the remainder of sentences.
Answer is the correct answer for the comprehension question. Within an item, correct answers should be held constant. Across items and fillers, you should make sure to have equally many questions that are answered Y and questions that are answered N. This will reduce any bias and increase the likelihood of participants paying attention.
Important points about the format:
- Condition names should only be upper and lowercase letters and periods (e.g. UNINF.HIGH, UNINF.LOW, INF.HIGH, INF.LOW). Do not include spaces, numbers or other characters.
- Remove blank/empty rows between stimuli.
- The sentences and questions will be displayed as-is so do not include extra characters unless you want those characters to be displayed. For example if your design uses a comma to disambiguate an ambiguous sentence make sure they are included, otherwise remove them.
Regions should be indicated with an @, multi-word regions with the words in curly brackets, e.g.
- word@regionname
- {several words here}@anotherregionname
- {This sentence}@subject has@verb {multiple regions}@object

Entering Items

Create a new Excel document (or clear out the example provided above)
Put the required and any optional headers you need in the first row of the sheet
Enter each of your items (make sure each item has a different ItemName and the same number of conditions, one per row)
Enter your fillers on the same sheet
n.b. the program that generates the lists for the applet based on your Excel sheet will ignore any column other than the ones described above, so feel free to have some extra columns if they help you when making your stimuli, but also keep in mind that if you screw up the spelling of your required columns they will be ignored too.
Save the file and send it to us.

What the web experiment applet will do

Your stimuli file will be converted into lists that follow a latin square design (one list for the total number of conditions)
Items and fillers will be pseudo-randomly ordered (so don't expect Item 1 to be presented after Item 2)
n.b. All participants will see the fillers and items in the same pseudo-random order
Each participant will only see one list
Each Sentence will be presented via word by word moving window self paced reading
Each Question will appear after participants finish reading the Sentence