Taming big data: Applying the experimental method to naturalistic data sets
|Title||Taming big data: Applying the experimental method to naturalistic data sets|
|Publication Type||Journal Article|
|Year of Publication||In Press|
|Journal||Behavior Research Methods|
|Keywords||Big data, Corpus statistics, Phonesthemes, Representation, Semantic change|
Psychological researchers have traditionally focused on lab-based experiments to test their theories and hypotheses. Although the lab provides excellent facilities for controlled testing, some questions are best explored by collecting information that is difficult to obtain in the lab. The vast amounts of data now available to researchers can be a valuable resource in this respect. By incorporating this new realm of data and translating it into traditional laboratory methods, we can expand the reach of the lab into the wilderness of human society. This study demonstrates how the troves of linguistic data generated by humans can be used to test theories about cognition and representation. It also suggests how similar interpretations can be made of other research in cognition. The first case tests a long-standing prediction of Gentner’s natural partition hypothesis: that verb meaning is more subject to change due to the textual context in which it appears than is the meaning of nouns. Within a diachronic corpus, verbs and other relational words indeed showed more evidence of semantic change than did concrete nouns. In the second case, corpus statistics were employed to empirically support the existence of phonesthemes—nonmorphemic units of sound that are associated with aspects of meaning. A third study also supported this measure, by demonstrating that it corresponds with performance in a lab experiment. Neither of these questions can be adequately explored without the use of big data in the form of linguistic corpora.