7 key issues for digital methods in social science: dmi18 takeaways

What happens when you add one varied toolkit for digital methods, a research question, an enthusiastic team, the equivalent of a collective IV drip of caffeine, and pile them into a room for a week?

Last week I had the pleasure of co-facilitating a project at the Digital Methods Initiative Winter School 2018 at the University of Amsterdam. The idea is that rather than trying to take analogue methods and apply them them to the internet, we take tools that are only possible because of the unique characteristics of the web.

It’s an intense week of experimentation, research, and learning, all rolled into one; if you’ve worked in hackerspaces and startups, the general vibe of the place will be familiar. It’s like an explosion of several tangential whirlwinds that kind of finds a way to coagulate and settle within a week.

While my organisation/control-freak tendencies were a little overwhelmed, with that one week sprint we probably saved ourselves a good three months of work where we would have been faffing around. I would like to perhaps save you some of the save headaches, by making it very clear some of the key methodological points & assumptions. (You know how I like to clarify assumptions).

1. Follow the question, not the tools

With such a glorious collection of tools for digital methods, it is tempting to just throw the tools at the data to see what happens. And, truth be told, this is what is needed a lot of the time. Yet the ‘throw everything at the wall and see what sticks’ approach can only be exploration, and does not the foundations of a sound methodology make. Once that’s done, there needs to be a process of reflecting in light of the questions, to be led by what is analytically interesting, and not to be led by what is technically possible.

2. Be open to an iterative evolution of your methodology

There is a while where it feels like you’re floating headlessly in space, unsure of your footing or where you’re going. After our first day, which I felt had been a complete, chaotic mess, I asked our facilitator how he felt our day went, and his first word was ‘structured. Because you have a clear research question’. Just to give you an idea.

The long and short of it is that the experimental approach to new tools means you try things out, things break, you fix them, try again, and again. It is an incredibly iterative process without a clearly linear project plan, but instead morphs to what happens, and the deviations from the original line of thinking are also insightful.

3. You will still need local knowledge to do internet research

Image result for context meme

Quantitative digital methods are not the be-all-and-end all; we need humans, insight, and local knowledge to make meaning of it all, much in the same way as a statistical test just spits out a number and you have to make sense of it.

There are several examples of picking a topic, running data scraping on it, and finding absolutely nothing – only to be later told by somebody with more local knowledge that that particular issue was about vulnerable people who wanted to hide, rather than expose their views, on the internet, for fear of persecution. Just an example of how context, once again, is everything.

4. Mapping silences requires some serious methodological creativity

In the same vein as above, there are usually good reasons why you aren’t finding what you want to find. The trick, then, becomes whether the tools can *show* those silences, or that noise. The tools and representation have to then be inverted, triangulated, and brought into dialogue with one another – and mapping what isn’t there requires some lateral thinking.

5. You need large datasets; so think global

It’s not just that a small N is not statistically valid sample. It’s that many of the tools work on machine learning, which will only give a smidgen of accuracy if there is an opportunity for many iterations and a large data set. For instance, topic modelling on a small sample will produce absolutely useless nonsense (we tried, we really tried). In this sense, trying to adapt these digital methods to a more traditional ethnographic mindset is less helpful because your sample size is dramatically narrowed from the get-go; for example, searching for opinions on one particular policy during one election year in one particular country is very limited. Instead, think of issues and questions that could, theoretically, span and sweep the entire web.

6. Cleaning your data is vital, but long and laborious

Image result for clean data meme

Statistics codes which assumptions are embedded in your data set, but this is still missing as our methods evolve into the digital. Especially in large datasets there will be a lot of things to clean out. Outliers, for one, can be interesting, but do need to be taken out. When doing text mining, a lot of words or phrases that are specific to your text will need to be cleaned out. This means you’ll need to take a look at the data itself, not just the tool’s interface, and keep going back and forth between one and the other. For instance, if you are scraping blog posts, in all likelihood you will have copyright phrases, ‘brought to you by WordPress’, and menu items or advertisement blocks.

As you get through the last iterations, there is a certain joy in staring at the screen hoping that this time it’ll pop out something useable.

7. To use or not to use a clean research browser/smartphone/laptop

The vast majority of browsers and internet search engines track, and store, your behaviour on the internet, to create a profile of you over time, even if you have privacy settings in place. These profiles influence what you are shown – so if you are using the browser for your research, your results will be affected.

In some cases, it is recommended to use a ‘clean’ research browser; one that has been unused, has no user profiles, and has no prior ‘life’ on it, so as not to skew results. However, in some cases and depending on the RQ, this may prove to be unhelpful – for instance, one group searching for queer narratives could not find them using a ‘clean’ browser, but only when using a browser that had been ‘trained’ (i.e. used over the last year) by a feminist. As always, either is fine as long as you’re aware and explicit.

With thanks to the wonderful team of folks who worked with us for a week – I couldn’t have asked for a more creative, reflective and dedicated group! Also thanks to the Digital Methods Initiative for organising the week. More to follow, probably.

Methodology as an entryway to ethical data research

There is a growing call for ethical oversight of AI research, and rightly so. Problem is, ethical oversight hasn’t always stopped past research with questionable ethical compasses. Part of that, I argue, is that the ethical concerns raised largely by social scientists come from a completely different world view to those from a more technical background. While AI research is raising new problems, particularly with regards to correlation vs causation in research, but the tools we have to solve them haven’t changed that much.

With this blog I want to question – can methodology help social and technical experts speak the same language?

Since my masters degree I’ve been fascinated by the fact that people working in different disciplines or types of work will have completely different approaches to the same problem.

Like in this article on flooding in Chennai, I found that ‘the answers’ to solving flooding all already existed on the ground, it’s just the variety of knowledges weren’t being integrated because of the different ways that they’re valued.

I was recently speaking with my brilliant colleague and friend, who is a social constructivist scientist working in a very digital technology oriented academic department faculty. This orientation is important to note, because the methodology deployed for science and research there and the questions being asked are influenced to a large degree by the capacities and possibilities afforded by digital technologies and data. As a result, the space scientists see for answers can be very different.

In reviewing student research proposals, she found she was struggling because some research hypotheses completely ignored the ethical implications of the proposed research.

In talking it through, we realised that most of the problems arose from the assumptions that are made in framing those questions.

To take a classic example, in the field of remote sensing to identify slums, it is relatively common to see that implicit assumption that what defines a slum is the area’s morphology, that that definition is by the city planners and not the residents, and how locals interpret the area or the boundaries of the neighbourhood may differ completely. The ethical problem, beyond epistemology, is what can then be done in terms of policy based on the answers that that research provides.

To go back to  that paper that caused the controversy about identifying people’s sexual orientation from profile pictures downloaded from a dating site. It’s based on a pre-natal hormone theory of sexual orientation, which is a massive assumption in and of itself.  Even the responses to the article have basically boiled down to ‘AI can predict sexuality’, even though that’s blatant generalisation and doesn’t look at who was actually in the dataset (every single non-straight person? Only white people?). That, and then the fact that they basically ‘built the bomb to warn us of the dangers’, has a lot of assumptions about your view of ethics in the first place.

Like my 10th grade history teacher used to say, to assume makes an ASS of U and ME. (Thanks Mr. Desmarais)

More precisely, to assume without making the assumptions explicit. Not clearly articulating what your assumptions are is a *methodological* problem for empirical research, with ethical *implications*. Unexamined assumptions mean bad science. Confounding variables and all that.

For reference, in statistics there is an entire elaborate, standardized system of dealing with assumptions by codifying them into different tests. You apply one statistical test, which has a particular name, because of the assumptions you have – i.e. I assume this data has a normal distribution.

If you’re using mixed methods, it becomes much harder to have a coherent system to talk about assumptions because the questions that are asked may not yield data that is amenable to statistical analyses and therefore cannot be interpreted with statistical significance.

All the all the more important here to make assumptions explicit so they can be discussed and scrutinized.

Some ethical concerns can be dealt with more easily when we remember methodological scrutiny and transparency, bringing research back to the possibility of constructive criticism and not only fast publication potential.

How this process is dealt with currently in academia is ethical review, hence the call for ‘ethical watchdogs’.

Thing is, In terms of the process of doing science in academic settings, ethical review is often the final check before approval to carry out the research. When I did my BSC. In psychology, sending the proposal to the ethics review board felt like an annoyingly mandatory tick-box affair.

The problem with this end-of-the-line ethical review is:

  • It’s not clear why the ethics is important to actually carry out the research
  • If the ethics board declines, you’re essentially back to the drawing board and have to start again.

Particularly under the pressure for fast publication, there aren’t many incentives to do good ethics unless you’re concerned about it from the outset.


Image result for assumptions cartoon


What if we shifted the focus from ethics as an evaluation to ethics as methodology?

Rather than having an ethics review at the end of the process of formulating hypotheses and research proposals, could there be a way to incorporate an ethics review in the middle of the ‘research life cycle’?

One would then get feedback not only on the ethics but it could provide the opportunity to explain the research’s unexamined assumptions which ultimately makes for better science.

I understand this ideal situation implies quite a significant shift in institutional processes which are notorious for moving about as fast as stale syrup. Perhaps instead there could be a list of questions researchers could ask themselves to as a self-evaluation?

In this way, you could open an entryway to an ethical discussion as a question of methodology, rather than ontology or ethics per se, which are far too easily just troubled waters in terms of interdisciplinary discussions.

Do you know of any examples of structurally incorporating these ideas as a way to effective multidisciplinary dialogue?


My thanks go to my colleague who sparked this discussion and thought it through with me, who for reasons of their position, will remain anonymous.