TIME TO THINK?
So ... what new questions come to your mind? You might wonder, for example, whether good baseball players can hit a ball better than other people because they have faster simple reaction times. Could you use your new telescope/microscope to try and answer that question? Of course. Just find some good hitters and determine their simple reaction times, and compare those to the reaction times of some other people who are not good hitters. What about "thinking times"? Are there some kinds of people who you think might have faster thinking times? You could find out whether its so by again comparing measured times in two groups of people. Or maybe you're just curious about two groups of people who seem to act differently in some way you don't fully understand. Is it possible that one or another of the times you can measure with your new telescope/microscope is different in the two groups? What about students and teachers? Or younger kids and older kids (do one or more of the measured times change with age? are times more different as people get older or are they already different when young?). Or .... ?
See? Its easy to get involved in doing research. All you need is a tool and some curiousity (which everyone is born with). Try it out, make some observations, think about what they mean, and let us know what you find (email us at Serendip). If you like, we'll post your findings as part of our Collected Observations.
Concerns about the observing equipment
Good observers (scientists or otherwise) need to appreciate both the strengths and the limitations of their tools. How good is the Serendip tool? Pretty good, but there are some reservatons that should be kept in mind. Serendip reports reaction times in terms of milliseconds, and is based on a clock in your computer which (at least in most cases) does in fact have that level of accuracy. But ... there is a small and somewhat unpredictable time delay between clicking the mouse button and your computer noticing that event and defining a time for it. Hence, the reaction times reported by Serendip should probably be regarded as accurate not to a millisecond but rather to within five or ten milliseconds, and differences smaller than this should probably not be regarded as meaningful. In addition, there may be some systematic variation in times measured on different computers, so differences between subjects might actually reflect computer differences rather than subject differences. The best way to deal with this is to have all subjects use the same computer, or to estimate differences by collecting data from the same subject on different computers and then correcting for any observed differences. Our guess is these would be quite small (a few milliseconds) in general, but might involve tens of milliseconds if there were large differences between the computers being used. (10 August: Actually, we've been checking ourselves (see Collected Observations), and its looking like there may be as much as an 80 millisecond difference between older and newer (faster processing time?) computers. So we (and others) would be interested in hearing about your own experiences and seeing your observations along these lines).
The issue of "controlling variables"
Let's imagine that we are in fact interested in whether good hitters have faster reaction times than poor hitters, and so we identify two groups of people, collect reaction times from each, and observe a difference. Is that difference because one group are good hitters and the other group not? Or is instead because of something else that's different between the two groups that we haven't thought about? Maybe, for example, just because of our other obligations, we collected the observations on one group in the morning and on the other late at night. Or maybe we happened to show one group the Serendip program before deciding to collect observations on them, and the other group had never tried out the program. Either difference might result in observing differences between the groups which have nothing in fact to do with being or not being a good hitter. "Controlling variables" means trying to assure that the ONLY difference between the two groups is the one you are interested in.
How important is "controlling variables"? Well, that depends on what one is trying to do. If one is simply curious about whether two groups differ in terms of the measurements one is making, without necessarily being able to say WHY they differ, then one doesn't have to worry at all about controlling variables. Lots of interesting research actually gets started this way, with an observation that a difference exists, and then speculation about possible reasons for the difference which in turn leads to new research. On the other hand, if one wants to try and establish that there is a particular reason for a difference, then one has to worry a lot about controlling variables. This means anticipating all of the possible reasons for a difference other than the one that one is interested in, and trying to make the two groups identical in all respects except that one. There is lots of interesting research of this kind as well. Some people, in fact, think this is the only kind that deserves the name "research"; what they fail to appreciate is not only the demonstrated usefulness of the first kind but also the reality that it is, for almost all interesting questions, actually impossible to fully control the variables. How could one be sure one knows all possible reasons for differences, to say nothing of actually eliminating them in practice? In actuality, most research is neither "uncontrolled" nor "fully controlled" but somewhere in between. The important thing isn't, in the abstract, to be sure one has done a "controlled" set of observations, but rather to understand, and be explicit about, what one's question is, and to what degree one's question can be answered given the amount of control over the variables that one has achieved.
The question of variation
Finding out whether one thing is or is not different from another might seem straightforward: you measure one thing, then you measure the other, and then you compare the two measurements. In practice, its not quite so simple. One issue, that we talked about a bit above under "controlling variables", is whether you are or not measuring the two things under the same conditions. For example, you're likely to find that, at least at the beginning, your reaction times measurements get shorter as you do more trials. This kind of "systematic" variation can make for interesting studies in its own right (how much better can you get? how long does it take? do you stay better if you stop for a while?), but it also needs to be taken into account and controlled for if you're trying to compare reaction times in, for example, two different people. So, in general, before trying to compare between two different subjects one first checks for systematic variations, and then tries to "control" for them. In the case of "practice" effects, one might, for example, observe that people get better with trials for a while but then the times stop shortening ("plateaus"), and then be sure that different people whose times one wants to compare have reached plateau values before collecting the observations one is going to analyze for the comparison. If you don't do this, your comparison might be misleading because the different people were at different points along their "practice" curves.
In most interesting situations, even when there is no "systematic" variation there remains some trial to trial variation in measured values: there is no "average" change over time, but the measured values on each trial are different, in one direction or another, from those on the previous trials. This kind of variation too raises some interesting questions in its own right (what causes it? can it be reduced? increased? is it itself significant for behavior?). And it too creates problems for comparisons. Which value does one use for a comparison? Obviously, if one makes only one observation on each of two people, one may by chance get a lower than typical value for one person and a higher than typical value for the other or vice versa, and the comparison might come out oppositely in the two cases. So, rather than basing the comparison on the value for one trial one typically uses an average value obtained over a number of trials. How many trials? Well, again, that depends on what question one is trying to answer, and depends as well on how much trial to trial variation there is. If you want to make a case that small differences in the average (or "mean") value are important, the you need either lots of trials or very small trial to trial variations. If you're only interested in large differences, you can get by with fewer trials (but enough to be sure you've seen how much variation there is and that it is small relative to the mean differences). Common sense, together with a good measure of skepticism about your own results, is a pretty good guide to deciding how many trials you need, and whether particular mean differences are or are not meaningful. At least ten trials, and differences of at least twenty or thirty milliseconds are probably not bad rough guides for the kind of data collected here. To be more rigorous (though not absolutely sure) you need to learn and use some statistics (cf. Rice Virtual Lab in Statistics or VassarStats, which provides both background information and on-line statistical analyses of data sets).
Writing it up: Interpreting observations. Where's the hypothesis? And what's the conclusion?
It is both an oddity and a profundity of modern science (or at least the teaching of it) that most people believe that research begins with an hypothesis and ends with a conclusion. One CAN do research starting with an explicit hypothesis ("good hitters have faster reaction times than poor hitters"), but much good research starts instead with a curiousity or uncertainty ("I wonder if good hitters have faster reaction times than poor hitters"). Either is a perfectly good motivation for the real purpose of research, which is to collect some new observations. And it is the reporting of observations, ideally in so complete and clear a form that people not there can have as much confidence in them as if they were themselves there, that is the primary purpose of scientific publication: the intrepretation and conclusion, along with the "hypothesis" is, to a large extent, just a way to give people reason to be interested in the observations.
With that recognized, the ideas of "hypothesis", "interpretation", and "conclusion" take on different and perhaps even more profound meaning than is usually ascribed to them An "hypothesis" is an expectation of how things should come out which is falsifiable, in the sense that the new observations might not be consistent with the expectations. It is important that it be falsifiable because, with exceptions only in quite special cases, new observations can never prove an hypothesis; they can only disprove it (to prove that in general good hitters have faster reaction times requires an infinite number of observations, on all people past, present, and future ... whereas observing the opposite in a small number of cases disproves the general hypothesis). To put it differently, observations that fit an expectation (hypothesis) simply add to the body of observations consistent with it, whereas exceptions invalidate it. In so doing, they require one to change how one thinks about things, for the better, since one has to account not only for the observations which led to the original hypothesis but for the new ones as well. It is this kind of "becoming less wrong" which is the real point of science. So, whether you actually had a hypothesis or not when you began your research, its a good idea to generate one before you write up your research. It will help others see why your findings are significant (which is the point, so there's nothing dishonest or improper about creating your hypothesis after you get your findings and think about what they are good for; many good scientists do). Remember that a hypothesis has to be falsifiable by the observations you make (or made), and that, if you're lucky, it will in fact have been falsified by your findings. By the way, even people who are just curious or uncertain actually have an hypothesis: "my way of making sense of the world is good enough so nothing will surprise me". And, usually, their hypothesis is, as it should be, is falsified. The hypothesis, along with some explanation of why one has it and a sketch of how one is going to test it, normally goes in the introduction of a scientific paper.
If you've got hypothesis straight, the "conclusion" is pretty obvious. Either your findings are consistent with your hypothesis, ie pretty much what you expected from your earlier understandings (in which case you haven't proven anything) or they're not (in which case you have, and you've got something new and interesting). In the "consistent" case, good scientists will consider in what ways there findings are limited, and give pointers for new research that would test the limits (e.g. "these findings hold for good middle-school hitters; it remains to be determined whether they are applicable to professional baseball players"). In the "disproof" case, one has more fun. Ideally, you come up with a new hypothesis, which in turn (obviously) suggests new observations that have to be made to test it (e.g. "there must be something other than reaction time which makes for good hitters; in the future, I will explore the possibility that ... they have clearer eye sight"). The conclusion, and associated considerations, typically goes in the discussion section of a scientific paper.
"Interpreting observations" is what comes between the hypothesis/observations and the conclusion (and is usually included in the discussion section of a scientific paper). Its not usually as straightforward as it seems, for lots of reasons. One we've talked about already: depending on how good one's controls are, there may or may not be a strong relation between the hypothesis to be tested and the actual observations one has made ... so interpretation means, among other things, being clear about the degree of certainty one has that the observations actually relate to the hypothesis. There are usually subtler issues as well, having to do with assumptions one makes without thinking about it. For example, in this "Time to Think?" exhibit, we've assumed that responding to a visual stimulus involves a series of sequential brain activities, so that the responses times should get progressively longer as we had successively more tasks to the "thinking" process. And we are implicitly "interpreting observations" in relation to this model to get "thinking" time, "reading time", and "negating" time. But is this presumption really so? Might the brain actually be organized in some other way, in which case the times we measure actually have some different set of interpretations altogether? The observations themselves, if looked at carefully, frequently give one hints that one needs to check one's interpretations (notice that, in our collected observations some of the "negate" times are negative). Like disproving an hypothesis, this inconsistency between expectations and observations is not a bad thing but a good one, since it encourages a rethinking from which a "less wrong" understanding can emerge.
So ... you can get better at doing research by thinking about it a bit. And different people have thought about it more or less, and have more or less experience doing it. But everyone knows the basics of how to do research, and everyone can get better at it by doing it/thinking about it. And everybody doing it makes everyone else doing it better, because at its core resesarch is a cooperative and collective activity. No one can make all the observations, and think of all the possible controls and reinterpretations themselves. So one of the most important parts of doing research is finding out what others have done, and telling others what you have done. That way, the numbers of observations everyone can think about is much greater, and the issues of controls and interpetations can be looked at from a much larger array of perspectives. Moreover, any "mistakes" that are made in one piece of research (as they inevitably are) will be offset by other pieces of research, with every thing available and out in the open so everyone can evaluate and re-evaluate what is going on. It isn't "Truth, not in any part or in the whole, but its a very good way to collectively get "less wrong".
We'd love to have you join in on Serendip's "Time to Think?" research project. As a student, a class of students, an individual, or any other way. And whatever your research background. Send us your observations, along with an introduction and a discussion/conclusion, and we'll add it to our evolving exhibit.