This post is based on the author’s paper to PSAI Annual Conference 2017, titled ‘Reliable text-analysis coding of emotive campaign rhetoric in referendums’, due to be presented Sunday 15 October.
‘There never was a good knife made of bad steel,’ according to Benjamin Franklin in his Poor Richard’s Almanack of 1755. It is intuitive to suggest that for anything worthwhile to be produced from primary or basic materials, those materials must be sound. This applies as much to the analysis of data as it does to knife-making. If the data we have collected are not valid and reliable, then any time spent attempting to analyse and derive meaning from them is, ultimately, time wasted.
However, despite its importance, studies that require the use of coded material do not always report or discuss the intercoder reliability of their data. According to one meta-study of communications research (Lombard, Snyder-Duch, & Bracken, 2002), examining Communication Abstracts as a bimonthly index of communication literature in over 75 journals during the period 1994-1998, less than 70% of reports considered gave any indication of intercoder agreement or reliability; of those that did provide a such an indication, less than half indicated the measure of intercoder reliability being presented. Almost 40% of these articles omitted the size of the sample used in the reliability testing, while almost 60% failed to provide the intercoder reliability for individual variables (ibid.).
Such omissions are hardly unusual in social scientific research. Another meta-study, in this case taking in 25 years’ worth of publications in the journal Journalism & Mass Communications Quarterly, in the period 1971-1995, found that only 56% of those articles presented a reliability assessment (Riffe & Freitag, 1997). The rate of reporting reliability assessments in that journal has improved since then, but has not reached anything approaching full compliance; an unpublished study of those articles published in Journalism & Mass Communications Quarterly since 1998 found that less than three-quarters published reliability assessments (Riffe, Lacy, & Fico, 2014, p. 122). More worryingly, less than a third conducted their assessments on randomly-selected content, while less than half accounted for chance-agreement (ibid.).
So, while it’s easy to make the argument in favour of a rigorous approach to social science data––and particularly political science data––that doesn’t seem to mean social scientists are taking it as seriously as it deserves. There seems to be a tendency to trust data on face value, which may be perfectly safe when using popular datasets with a wide and responsive user base, but much less so when using a novel dataset.
The example provided in this paper works its way through two attempts at creating a novel dataset of values for emotional rhetoric during the 2015 Irish Marriage Equality Referendum. Discussing two different phases of coding, and reporting the intercoder reliability scores as a proxy for data reliability, the paper shows just how unreliable novel data can be (phase 1) and suggests some ways to improve reliability in future attempts (phase 2).
While the paper sadly does not have any hard and fast answers for guaranteeing reliable social science data collection, it will hopefully get people thinking critically about the data we use.
This article is part of a series by authors presenting at this year’s PSAI Annual Conference. If you are a participant at this year’s conference and would like to have your work featured on the blog, please contact us and let us know.
Riffe, D., & Freitag, A. (1997). A content analysis of content analyses: Twenty-five years of Journalism Quarterly. Journalism & Mass Communication Quarterly, 74, pp. 873–882.
Riffe, D., Lacy, S., & Fico, F. (2014). Analysing Media Messages: Using quantitative content analysis in research (3rd ed.). New York, NY, United States of America: Routledge.