Fifty Shades of Twitter Research (It Ain’t Black and White)

In the age of social media, Twitter provides unprecedented opportunities for social researchers to listen to millions of voices, observe millions of interactions and gain new insights in our social world. This has implications for research practices, policy decisions and everyday life. But as with all research, methods matter. Luke Sloan, co-editor of the recently released SAGE Handbook of Social Media Research Methods, shares his insights into the methodological challenges of Twitter research, demonstrating how it isn’t all ‘black and white’.

On the face of it, it sounds too good to be true… and to an extent it is. Easy access to a wealth of naturally occurring data that is locomotive and instant, allowing us to see real-time changes in behaviour and attitudes as events in the real world unfold. Outstanding geographical granularity to the point of latitude and longitude. A voice for the disenfranchised and hard to reach populations that are typically difficult to engage in traditional modes of social research. High level quantitative aggregated data alongside qualitative micro-interactions. Monitoring the pulse of the world.

Wow, Twitter data really does sound like the holy grail of data sources for academics, government and private companies alike. Except that is isn’t. As with most things, it’s complicated and we should be concerned by anyone who presents a black and white case for what Twitter is telling us about human and social behaviour without reflecting on the methodological challenges we face.

Yes the data is naturally occurring and, as it is not generated or elicited as part of an explicit research project, it is not subject to the Hawthorne effect (altering the behaviour of those being studied due to their awareness of being observed). But Twitter data is not produced in a vacuum and there is nothing ‘natural’ about how behaviour is path-dependent on technology, how users construct virtual identities and the myriad of ways in which Twitter is used: as a news feed; as a professional network; as a friendship network; for self-promotion; for lurking and so on. The data is not untainted – it’s like buying organic potatoes in a plastic bag, the product looks wholesome but the packaging is artificial (yet you’ll still enjoy eating those chips).

As for the speed at which the data is produced, certainly this is something new. A vast network of human sensors producing data in real-time means that responses to emergency events can be coordinated efficiently. It even works with earthquake detection1. But there is a lot of noise on Twitter, and how do you know what to focus on and filter out to monitor the important things? Perhaps more to the point, how can you be monitoring and looking for something that hasn’t happened yet?

The ability to pinpoint the exact location of a user when they produced a tweet is incredibly powerful. It allows us to build a context around the data, to locate that individual within existing geographies relating to Census data, crime rates and voting patterns to name but a few. If someone tweets that they feel unsafe, we can look at deprivation measures for the area they are in. If someone shows political support for a particular party, we can link this to a constituency and model voting patterns. The ability to link data sources through geography means that we can investigate the most fundamental question – what is the relationship between what people tweet and the real world? Indeed, does Twitter tell us anything at all? Yet bear in mind that the proportion of tweets that have this data (aka are geotagged) are estimated to be as low as 0.85%2 and certainly those who geotag are not a random sample of the Twitter population3.

And whilst we’re talking about samples, Twitter users themselves are not a random sample of any population, certainly not in the UK4 where tweeters are disproportionately younger and certain class groups are over or under-represented5. Virtual identities are constructed and the demographics of most Twitter users are largely unknown. When you’re presented with some Twitter data and someone is making a knowledge claim, when you’re about to make an important policy decision based on the analysis, ask yourself one question – do you actually know who was in the sample? Can you even tell what country they’re tweeting from? If you’re interested in the disenfranchised, how do you know they’re present in your data?

Finally, Twitter provides a wealth of quantitative and qualitative information in the form of sentiment, networks, interactions, retweets, frequencies, topics, hashtags, URL sharing and all the rest of the meta-data that surrounds a tweet. But what does it mean? Is a retweet an endorsement, an act of passing on information or can it even be ironic? When you’re limited to 140 characters, how can meaning be ascribed to words?

The point is that Twitter does provide unprecedented opportunities to explore attitudes and behaviour but as with all research, methods matter. Consideration of research design and sampling are key, as is a detailed understanding of the technological parameters of the platform and how it’s used. These observations apply to any research that sources data from social media.

It would be naive (at best) and foolhardy (at worst) to take Twitter at face-value, but that doesn’t mean it’s not useful.

As I said, it’s complicated.

Dr Luke Sloan (@drlukesloan) is a senior lecturer in quantitative methods and deputy director of the Social Data Science Lab at the School of Social Sciences, Cardiff University UK. He has worked on a range of projects investigating the use of Twitter data for understanding social phenomena covering topics such as election prediction, tracking (mis)information propagation during food scares and ‘crime-sensing’. His research focuses on the development of demographic proxies for Twitter data to further understand who uses the platform and increase the utility of such data for the social sciences. He sits as an expert member on the Social Media Analytics Review and Information Group which brings together academics and government agencies. He is also co-editor of The SAGE Handbook of Social Media Research Methods.

Find out more about The SAGE Handbook of Social Media Research Methods here and read a free chapter here.

The growing availability of large amounts of vast and varied datasets is changing the way research is conducted and taught. In response SAGE has launched a new monthly newsletter to keep researchers up to date with what is going on in the world of big data and computational social science. Sign up here.






5: [Ch 7]

This entry was posted in SAGE Connection, Tips for You and tagged , , , . Bookmark the permalink.

Leave a Reply