7 key issues for digital methods in social science: dmi18 takeaways

What happens when you add one varied toolkit for digital methods, a research question, an enthusiastic team, the equivalent of a collective IV drip of caffeine, and pile them into a room for a week?

Last week I had the pleasure of co-facilitating a project at the Digital Methods Initiative Winter School 2018 at the University of Amsterdam. The idea is that rather than trying to take analogue methods and apply them them to the internet, we take tools that are only possible because of the unique characteristics of the web.

It’s an intense week of experimentation, research, and learning, all rolled into one; if you’ve worked in hackerspaces and startups, the general vibe of the place will be familiar. It’s like an explosion of several tangential whirlwinds that kind of finds a way to coagulate and settle within a week.

While my organisation/control-freak tendencies were a little overwhelmed, with that one week sprint we probably saved ourselves a good three months of work where we would have been faffing around. I would like to perhaps save you some of the save headaches, by making it very clear some of the key methodological points & assumptions. (You know how I like to clarify assumptions).

1. Follow the question, not the tools

With such a glorious collection of tools for digital methods, it is tempting to just throw the tools at the data to see what happens. And, truth be told, this is what is needed a lot of the time. Yet the ‘throw everything at the wall and see what sticks’ approach can only be exploration, and does not the foundations of a sound methodology make. Once that’s done, there needs to be a process of reflecting in light of the questions, to be led by what is analytically interesting, and not to be led by what is technically possible.

2. Be open to an iterative evolution of your methodology

There is a while where it feels like you’re floating headlessly in space, unsure of your footing or where you’re going. After our first day, which I felt had been a complete, chaotic mess, I asked our facilitator how he felt our day went, and his first word was ‘structured. Because you have a clear research question’. Just to give you an idea.

The long and short of it is that the experimental approach to new tools means you try things out, things break, you fix them, try again, and again. It is an incredibly iterative process without a clearly linear project plan, but instead morphs to what happens, and the deviations from the original line of thinking are also insightful.

3. You will still need local knowledge to do internet research

Image result for context meme

Quantitative digital methods are not the be-all-and-end all; we need humans, insight, and local knowledge to make meaning of it all, much in the same way as a statistical test just spits out a number and you have to make sense of it.

There are several examples of picking a topic, running data scraping on it, and finding absolutely nothing – only to be later told by somebody with more local knowledge that that particular issue was about vulnerable people who wanted to hide, rather than expose their views, on the internet, for fear of persecution. Just an example of how context, once again, is everything.

4. Mapping silences requires some serious methodological creativity

In the same vein as above, there are usually good reasons why you aren’t finding what you want to find. The trick, then, becomes whether the tools can *show* those silences, or that noise. The tools and representation have to then be inverted, triangulated, and brought into dialogue with one another – and mapping what isn’t there requires some lateral thinking.

5. You need large datasets; so think global

It’s not just that a small N is not statistically valid sample. It’s that many of the tools work on machine learning, which will only give a smidgen of accuracy if there is an opportunity for many iterations and a large data set. For instance, topic modelling on a small sample will produce absolutely useless nonsense (we tried, we really tried). In this sense, trying to adapt these digital methods to a more traditional ethnographic mindset is less helpful because your sample size is dramatically narrowed from the get-go; for example, searching for opinions on one particular policy during one election year in one particular country is very limited. Instead, think of issues and questions that could, theoretically, span and sweep the entire web.

6. Cleaning your data is vital, but long and laborious

Image result for clean data meme

Statistics codes which assumptions are embedded in your data set, but this is still missing as our methods evolve into the digital. Especially in large datasets there will be a lot of things to clean out. Outliers, for one, can be interesting, but do need to be taken out. When doing text mining, a lot of words or phrases that are specific to your text will need to be cleaned out. This means you’ll need to take a look at the data itself, not just the tool’s interface, and keep going back and forth between one and the other. For instance, if you are scraping blog posts, in all likelihood you will have copyright phrases, ‘brought to you by WordPress’, and menu items or advertisement blocks.

As you get through the last iterations, there is a certain joy in staring at the screen hoping that this time it’ll pop out something useable.

7. To use or not to use a clean research browser/smartphone/laptop

The vast majority of browsers and internet search engines track, and store, your behaviour on the internet, to create a profile of you over time, even if you have privacy settings in place. These profiles influence what you are shown – so if you are using the browser for your research, your results will be affected.

In some cases, it is recommended to use a ‘clean’ research browser; one that has been unused, has no user profiles, and has no prior ‘life’ on it, so as not to skew results. However, in some cases and depending on the RQ, this may prove to be unhelpful – for instance, one group searching for queer narratives could not find them using a ‘clean’ browser, but only when using a browser that had been ‘trained’ (i.e. used over the last year) by a feminist. As always, either is fine as long as you’re aware and explicit.

With thanks to the wonderful team of folks who worked with us for a week – I couldn’t have asked for a more creative, reflective and dedicated group! Also thanks to the Digital Methods Initiative for organising the week. More to follow, probably.