Society is not a flock of birds: self-learning algorithms, risk governance, and moving beyond fear.

When a self-driving car governed by self learning algorithms kills a cyclist in a ridiculously stupid error of judgement, what can we learn for the future of an algorithmic society?

Find some place to hide’ by Thomas Hawk: CC BY-NC 2.0

We certainly need to do more than just focus on how unpredictability can have fatal consequences. Franken-algorithms exist, and the challenges that unpredictability can raise for digital agency are real. But beyond grief and the shock factor, we need to move past fear that blindsides us. We certainly need to do more than only focus on how unpredictability can have fatal consequences. Rather, we need a more nuanced conversation which re-examines the tools we have to construct an algorithmic future we actually want to live in.

As a social scientist working on Global Data Justice, I am acutely aware of how critical voices are too often silenced or ignored. I am writing this blog because I am concerned that there are very smart people reading the same things I am, yet who don’t seem to acknowledge that if we stagnate in fear rather than critique, we can actually play into a discourse which reifies the antagonism to the regulation of innovation in the tech sector.

If we stagnate in fear rather than critique, we can actually play into a discourse which reifies the antagonism to the regulation of innovation in the tech sector.

As a society, the increasing use of algorithms in decision-making means we are re-confronted with how we manage and accept risk, and we need to discern what different types of risk mean for different approaches to governance. The unpredictability of self-learning algorithms which evolve and adapt over time, meaning they display emergent behaviours beyond the control of the original programmer, is one of these risks. Instead of shivering in fear and denial, why not see what we can learn from the history of applied risk governance?

Big data analytics are based on the premise of emergent patterns in enormous data sets, and the proliferation of sensors, instrumented environments and the interconnection of databases provides the data as fodder into inductive, data-driven decision-making and the phenomenon of smart urbanism (Kitchin 2015). These shifts mean we are dealing with fundamentally different logics of emergence, and we have to at least engage with a complexity perspective.

The shift to inductive reasoning and data-driven decision making means we have to shift our perspective on the social consequences of fundamentally different logics. In particular, self-learning algorithms evolve and adapt over time, which means that they display emergent behaviours beyond the control of the original programmer. We have to think through some of the tools and concepts from a critical complexity theory. 
Complexity is fascinating because it is a fundamental transformation of our perspective from linear control to emergence. Non-predictability and emergence challenge our understanding of how to interact with the world because we have to let go of the reigns a little bit. A broadly-defined complexity perspective decenters the human agent from the center of the universe to another being in the ecosystem, and this destabilisation can be profoundly emotionally unsettling (whether one likes to admit it or not). Much like the shift from a Ptolemaic to Copernican view, when the realisation dawned that the sun rather than the earth is at the center of the galaxy, we are forced to confront the reality that we are not in charge of what is about to happen.

The thing is, in the face of the unknown, we tend to be a bit like a deer in the headlights and freeze. We lump everything together into an indecipherable, scary mass of uncertainty and mentally try to run away in the other direction. In a 2018 paper called Wickedness and the anatomy of complexity, Andersson and Törnberg call this scary mass of uncertainty an intuitive sense of ‘overwhelmingness’. It’s a great word, and I fully support active attempts to integrate this into everyday language.

Importantly, ‘overwhelmingness’ is different than complexity. To be very brief: in the scientific sense of the term, complexity very specifically refers to where the individual units are of the same class, express the same function, and collectively display emergent and unpredictable behaviours. This is different than complicated, where there are many different parts which are all functionally different. A flock of birds is complex, a car engine is complicated.
 Society is not completely like a flock of birds. Some social phenomena are more amenable to complexity thinking and big data analytics than others; it is no accident that most data-driven smart city applications begin with the urban problems of traffic and energy networks. In other areas, it is less easy to avoid the fact that socio-historical conditions and institutional arrangements are differentiated and enduring. Social functions are not complex phenomena in the strict sense of the word; as Andersson and Törnberg argue, societies are more often a combination of complicated and complex, creating their own flavours of ‘wicked’ problems.

Institutional arrangements are differentiated and enduring. Social functions are not complex phenomena in the strict sense.

This means that when we investigate the challenges of applied self-learning algorithms and discover social complexity theory, we are not completely reinventing the wheel. Yes, complexity matters. And, quite frankly, it’s cool. But rather than freezing in the face of overwhelmingness, we need to acknowledge that social ‘systems’ are not flocks of birds à la Hitchcock. Instead, we need to rethink how we understand the governance of risks and how these are socially constructed.

There is a lot of thinking that has already been done around the structural transformations in the face of systemic risk that we can learn from, particularly in the fields of environmental governance. The problems we face are new, but just like the data patterns, they emerge from sociopolitical historical conditions.
 Self-driving cars are a good starting point. They are emblematic of our changing modernity, as both the evolving symbol of having ‘made it’, the threat of vaguely-defined robots taking over the world, and the ethics of trying to let algorithms take decisions. As symbols, they are useful anchors to start a conversation around algorithmic futures. However, if the only examples of an algorithmic future you look at centre on car crashes, modern warfare and aviation, these are all life-threatening sectors. Risk tolerance drops sharply, moral responsibility is undeniable. These are the warnings, the extremes, the measuring sticks.
In the case of Elaine Herzberg, the consequences of how we deal with unpredictability were indeed fatal. The self-driving car crashed into Herzberg not because it didn’t recognize her, but because the encoding of the decision-making in the algorithm was tuned too far to avoiding false-positives.

While there is an important conversation here about algorithmic decision making, here I focus on another aspect which doesn’t get as much attention: the delicate balance between risk and the unstoppable progress of innovation. The self-learning algorithm was calibrated vis-a-vis a particular understanding of the risks of getting it wrong. Don’t make too many errors or you stop the relentless charge of progress and innovation. But who gets to decide what tradeoffs are acceptable? What should we be optimising for? How do we make choices about innovation in an algorithmic society? In other words, the challenge is governance.

How do we make choices about innovation in an algorithmic society? In other words, the challenge is governance.

Certain discourses would have you believe that innovation is only about being bold and daring, and that risk-aversion suffocates innovation. In this simplistic framing, fear is associated with stasis, regulation with an attack on freedom, and responsibility for outcomes as impossible in a complex world.
 In a 2018 report called ‘Clearly Opaque : Privacy Risks in the Internet of Things’, the IoT Privacy Forum unpacks this false duality which frames the precautionary principle as anathema to any sort of innovation. Drawing from environmental governance theory, the report shows how the only time that this is actually true, that Thou Shalt Not Innovate Unless Thou Can Prove It Won’t Hurt Anybody, is in sectors where there is a serious risk of loss of life and harm, such as health and transport, i.e. cars.

Otherwise, risk governance, like most things, is a spectrum. There are different gradations of risk, different gradations of governance responses. Rather than throwing the baby out with the bathwater, the trick is nuanced understanding in order to find appropriate responses. This is the role of experienced regulators.

As a result, we cannot use the symbol of self-driving cars and their regulatory challenges as emblematic of all questions around risk and the governance of algorithms. Even the phrase ‘governance of algorithms’ is an oversimplification similar to reducing all of the things to ‘technology’. If decision-makers see only fear and overwhelmingness, we lose all hope in hell of appropriate regulation. When we are faced with unpredictability in algorithmic futures, we do not need to collectively lose our minds.

When we are faced with unpredictability in algorithmic futures, we do not need to collectively lose our minds.

If data is the new oil (in the minds of many, anyway), we need to learn from the experience of environmental governance theory as we create algorithmic futures. Environmental governance, risk management, critical complexity and the new challenges of the digital economy come together in a way that needs rethinking. We need critical voices, we need critique, and we need fearless, constructive conversations learning from the work that has already been done. This is a path forward.

What next for privacy and data protection? My CPDP2018 takeaways

The CPDP conference is the central hub of the European data protection community, bringing together policy wonks and makers, the burgeoning privacy industry, and a plethora of legal scholars. It is huge, successful, and a lot of fun. Now that I’ve had a week or so to chew on and process all my notes, here are the main themes that I’ve taken away as the next big issues in data, privacy and data protection – keeping in mind that I am not a lawyer, and I work on issues of global data justice.

YAY GDPR! But we still have our work cut out for us.

It’s a major accomplishment and it was not easy to get this level of protection – see the fantastic documentary Democracy for the story following Jan Albrecht. I am proud to be European and to have this protection. It’s a major accomplishment and let’s celebrate it.

Still, the GDPR is also limited in several ways, ways that legal scholars are working on and formed much of the basis of the conference. I cannot possibly do it all justice – but there are a few things that I can say from an interdisciplinary perspective;

  1. Relying on notification and consent is not scaleable – see what smartphone notifications look like when you have 8 million followers;
  2. There are debates about whether the concept of personal data, upon which the GDPR is based, is now so broad and inclusive that it is no longer relevant – see the recent ERC by my colleague Nadya Purtova and her fantastic panel on the subject with Peter Hustinx;
  3. The responsibilities in the GDPR are based on particular actors in particular sectors, but the multiple uses of data and connecting databases is making boundaries between sectors blurry, which is problematic for the GDPR;
  4. The GDPR is still based on individual rights and doesn’t so easily address collective harms, nor does it help for most people who are not well informed or where harms are not visible;

This isn’t going to develop into a well-structured critique of the GDPR, but rather these are the points that jumped out at me.

Data ethics is not enough. Trust is certainly not enough.

Considering the size and global interconnectivity of the data market, the dynamics of surveillance capitalism and legal enclosures that enable it mean that we need structural elements to create economic incentives and new directions. Much like CSR wouldn’t shift the underlying structures of the global economy towards fair trade and sustainability, data ethics won’t solve everything.

Indeed, as Mireille Hildebrandt commented, this is precisely why we have the rule of law, to prevent us from relying on this.

In the same-same-but-different sort of way, trust is not something we can rely on – trust implies that we do not have to ensure. I ‘trust you to pay me back’ means I don’t have to see any proof that you’ll back me back. Relying on trust as a concept means the individual is inherently vulnerable and the power lies in the hands of the provider of services, not the other way around.

There is a growing call for trust certification and trust marks as a positive incentive for companies to see data protection as an asset – the ‘carrot’ instead of the stick – and there was some very interesting work presented on these issues at this panel on private law. Yet let us be realistic, this is never going to restructure the global data economy as a whole. Therefore –>

We need to regulate the data market.

Clearly at a conference with a critical mass of legal scholars you saw this one coming. Yet rather than the broad sweeping statement there was a deep dive into how and what that might mean.

For instance, there was also a major panel on regulating monopolies which is well worth watching, drawing on discussions of anti-trust law. Particularly Barry Lynn, of Open Markets Institute (the one that infamously got kicked out of New America because they stood up to Google) was an inspiring mix of a preacher’s rant, sound economic advice, and a provocation. For someone with a background in development studies like me, it was an entry way into discerning between the neoliberal Chicago school and the potential of citizen-centred markets.

Individual rights are not the best way to solve collective problems.

There are several issues which are generally bundled into the tension between the individual and the collective.

The first is exemplified by last week’s Strava security scandal, where a data visualisation of the routes taken by the users of the fitness app unintentionally revealed the sensitive information of army base locations. Aggregated information can reveal vulnerabilities and risks with very concrete consequence without necessarily saying anything about the person or individuals. For a more in-depth discussion you can also check Taylor et. al’s book on group privacy, which has an interdisciplinary group of people who can’t seem to agree on anything except that it’s important.

The second is a cross-cultural question: the liberal individual upon which the human rights framework is based is a Western perspective that we will need to move beyond for a truly global conversation. It requires us to think about identity and the collective in ways that we aren’t quite sure how to do yet. Still, just because it’s a difficult question doesn’t mean we shouldn’t ask it. (Indeed, I tend to be that person in the room that asks the question that gets uncomfortable and awkward looks from panellists. #sorrynotsorry) This is precisely why I’m going to pursue further this further in my work on global data justice.

Lastly, and this might need some more substantiating from a legal scholar who has a better understanding of the nuances of the issue (any volunteers?) – but in court cases and legal studies, they speak now of fundamental rights and the essence of these rights, as being even more fundamental, and there are questions being asked as to whether the essence of the rights was violated in a particular case or not. As I’m sure you can gather I can’t speak much to this, but it struck me as a non-lawyer that the concept of fundamental rights was on shaky ground.

Where to next?

There are three strands that jumped out as warranting further exploration:

  1. Moving beyond a distributive paradigm and seeing data as an asset; which would require a reformulation of what data is and reshaping the market,
  2.  To what extent paternalism as an approach is legitimate and when, for which there was another excellent panel here (last round on a Friday evening, ooof)
  3. There is a lot to learn from an ecological approach, both in terms of interdependency and limits – something I am going to explore further in a paper for the Data Justice Conference in May (Come along!)


Just because these are all hard questions, doesn’t mean we shouldn’t be asking them. On the contrary.

Here is some fodder which has continued to resonate for inspiration:

7 key issues for digital methods in social science: dmi18 takeaways

What happens when you add one varied toolkit for digital methods, a research question, an enthusiastic team, the equivalent of a collective IV drip of caffeine, and pile them into a room for a week?

Last week I had the pleasure of co-facilitating a project at the Digital Methods Initiative Winter School 2018 at the University of Amsterdam. The idea is that rather than trying to take analogue methods and apply them them to the internet, we take tools that are only possible because of the unique characteristics of the web.

It’s an intense week of experimentation, research, and learning, all rolled into one; if you’ve worked in hackerspaces and startups, the general vibe of the place will be familiar. It’s like an explosion of several tangential whirlwinds that kind of finds a way to coagulate and settle within a week.

While my organisation/control-freak tendencies were a little overwhelmed, with that one week sprint we probably saved ourselves a good three months of work where we would have been faffing around. I would like to perhaps save you some of the save headaches, by making it very clear some of the key methodological points & assumptions. (You know how I like to clarify assumptions).

1. Follow the question, not the tools

With such a glorious collection of tools for digital methods, it is tempting to just throw the tools at the data to see what happens. And, truth be told, this is what is needed a lot of the time. Yet the ‘throw everything at the wall and see what sticks’ approach can only be exploration, and does not the foundations of a sound methodology make. Once that’s done, there needs to be a process of reflecting in light of the questions, to be led by what is analytically interesting, and not to be led by what is technically possible.

2. Be open to an iterative evolution of your methodology

There is a while where it feels like you’re floating headlessly in space, unsure of your footing or where you’re going. After our first day, which I felt had been a complete, chaotic mess, I asked our facilitator how he felt our day went, and his first word was ‘structured. Because you have a clear research question’. Just to give you an idea.

The long and short of it is that the experimental approach to new tools means you try things out, things break, you fix them, try again, and again. It is an incredibly iterative process without a clearly linear project plan, but instead morphs to what happens, and the deviations from the original line of thinking are also insightful.

3. You will still need local knowledge to do internet research

Image result for context meme

Quantitative digital methods are not the be-all-and-end all; we need humans, insight, and local knowledge to make meaning of it all, much in the same way as a statistical test just spits out a number and you have to make sense of it.

There are several examples of picking a topic, running data scraping on it, and finding absolutely nothing – only to be later told by somebody with more local knowledge that that particular issue was about vulnerable people who wanted to hide, rather than expose their views, on the internet, for fear of persecution. Just an example of how context, once again, is everything.

4. Mapping silences requires some serious methodological creativity

In the same vein as above, there are usually good reasons why you aren’t finding what you want to find. The trick, then, becomes whether the tools can *show* those silences, or that noise. The tools and representation have to then be inverted, triangulated, and brought into dialogue with one another – and mapping what isn’t there requires some lateral thinking.

5. You need large datasets; so think global

It’s not just that a small N is not statistically valid sample. It’s that many of the tools work on machine learning, which will only give a smidgen of accuracy if there is an opportunity for many iterations and a large data set. For instance, topic modelling on a small sample will produce absolutely useless nonsense (we tried, we really tried). In this sense, trying to adapt these digital methods to a more traditional ethnographic mindset is less helpful because your sample size is dramatically narrowed from the get-go; for example, searching for opinions on one particular policy during one election year in one particular country is very limited. Instead, think of issues and questions that could, theoretically, span and sweep the entire web.

6. Cleaning your data is vital, but long and laborious

Image result for clean data meme

Statistics codes which assumptions are embedded in your data set, but this is still missing as our methods evolve into the digital. Especially in large datasets there will be a lot of things to clean out. Outliers, for one, can be interesting, but do need to be taken out. When doing text mining, a lot of words or phrases that are specific to your text will need to be cleaned out. This means you’ll need to take a look at the data itself, not just the tool’s interface, and keep going back and forth between one and the other. For instance, if you are scraping blog posts, in all likelihood you will have copyright phrases, ‘brought to you by WordPress’, and menu items or advertisement blocks.

As you get through the last iterations, there is a certain joy in staring at the screen hoping that this time it’ll pop out something useable.

7. To use or not to use a clean research browser/smartphone/laptop

The vast majority of browsers and internet search engines track, and store, your behaviour on the internet, to create a profile of you over time, even if you have privacy settings in place. These profiles influence what you are shown – so if you are using the browser for your research, your results will be affected.

In some cases, it is recommended to use a ‘clean’ research browser; one that has been unused, has no user profiles, and has no prior ‘life’ on it, so as not to skew results. However, in some cases and depending on the RQ, this may prove to be unhelpful – for instance, one group searching for queer narratives could not find them using a ‘clean’ browser, but only when using a browser that had been ‘trained’ (i.e. used over the last year) by a feminist. As always, either is fine as long as you’re aware and explicit.

With thanks to the wonderful team of folks who worked with us for a week – I couldn’t have asked for a more creative, reflective and dedicated group! Also thanks to the Digital Methods Initiative for organising the week. More to follow, probably.

AI & Inclusion Symposium: Questions rattling my brain as I prep for it

This week I have the pleasure of representing TILT at the Network of Centres’ Artificial Intelligence and Inclusion Symposium in Rio de Janeiro. (HELL YES omigodomiggod. Ok, composure regained.)AI & Inclusion Symposium

The event […] will identify, explore, and address the opportunities and challenges of artificial intelligence (AI) as we seek to build a better, more inclusive, and diverse world together. It is co-organized on behalf of the NoC by the Institute for Technology and Society of Rio de Janeiro (ITS Rio) and the Berkman Klein Center for Internet & Society at Harvard University.

Considering the research project I’m working on is specifically about data justice and social justice issues, the timing couldn’t be better. There is not only a focus on AI, but also specific attention to inclusion and how these issues manifest specifically in the global South.

The program is one of the better ones I’ve seen, not only in terms of topics but also with the attention to first setting a common baseline of understanding together (making sure ‘we’re all speaking the same language’) based on pre-meeting surveys, splitting into breakout groups, and building common knowledge with the specific aim to

‘identify intervention points for collaboration, communication, and synergy. Which ideas, initiatives, and projects in AI & Inclusion should be discussed further, emphasized, or reconceptualized?’

As an ex-English teacher specialised in the architecture of discussion to facilitate genuine communication, this makes me happy.

I mean, some of those leading the discussion in the plenaries are called ‘Firestarters’ (cue the 90s theme song)

They’ve also put together a fantastic reading list on AI and inclusion, accessible to the public here.

In preparation, I’m thinking through some of the questions that I’ll be bringing along.

How can we value social knowledges when AI deepens the dependence on STEM?

As I underlined in my paper on varieties of knowledge for urban planning, the way a society or institution values particular types of knowledge has implications for whether complementary sets of knowledges can get taken up and mainstreamed into development planning, or not. In particular, the dominance of STEM-based knowledges (science, technology, engineering, maths) mean that very concrete insights from the social science or non-formalised knowledges don’t get taken up, despite the potential for new perspectives towards solutions.

Related image

In some cases, such as in my experience in India, this is done to depolitcise planning – which, in the sociological circles, may sound like heresy, but in certain contexts, it’s a valid way to combat corruption.

This clearly goes beyond the question of AI – on the contrary, the ‘newness’ of AI may only serve to cement this distinction. Especially in areas where such a valuation is very distinct, how can we continue to bring in non-STEM-based insights into the usage of AI for development?

Why do we talk about inclusion and not social justice?

I’m not sure the symposium will be able to answer this, but I am working on it. The program talks about social inequalities, social good, and (perhaps intentionally?) doesn’t mention social justice – even though much of the reaction to algorithmic bias is very linked to social justice advocates. For instance, the Data Justice Lab at Cardiff University is expressed as ‘for social justice’.

It’s on my mind because I’m reading Young’s seminal book Justice and the Politics of Difference, which outlines the basic theory of social justice. She argues the vast majority of justice theories are within a distributive paradigm, meaning they focus on rights as ‘things’ you can ‘have’. Problem is, that obscures that power, and injustice, are relationships, often determined by institutional structures.

While I’m still working through the book, and while I personally agree with much of the social justice movement, if I’m honest I still quibble a bit with the image that gets evoked in my brain when I think of social justice, namely that of angry Tumblr users. So perhaps it’s a political decision not to mention social justice specifically. However, I think the focus on institutional structures that social justice highlights is absolutely foundational for resolving anything.

How can AI be reliable in areas characterised by data gaps & informality?

An algorithm is only as good as the data set it is based on. What if the data set is incomplete? This is nothing revolutionary – the question is more about whether there are specific applications of AI that circumvent this problem, if at all.

I should probably know this, but I don’t, yet.

Should we not be starting with the structure rather than the technology?

Gurses and van Hoboken just came out with a fantastic chapter on ‘Privacy after the Agile Turn‘. They explain how the evolution of software development has changed the structure of the internet in terms of it’s infrastructure, complexity and modularity, with significant implications for how we think about, and try to tackle, privacy and data protection. They suggest adapting differential privacy approaches to deal with this modular, distributed system of data collection and analysis (which would explain why dynamic consent is gaining popularity).

It’s well worth a read, but the bottom line is that by focusing on algorithms or data minimisation, we focus on how data as a thing is consumed, but we do not address the overarching structures of the political economy of the internet, nor do we focus on the flows of power created by institutional structures.

NOTE – This is not to discredit the field of AI ethics and this stuff needs to be thought through. I work on data governance more broadly, so it’s kind of normal that I always go back to the institutions. I also literally just read this paper and it’s been swimming around in my head. So it is a question that acts a lens through which I’ll be engaging in the conference.

Who is best placed to broker information about best practices?

More an implementation question – information brokerage also came up as necessary in the workshop I attended on EU health data, and it’s always going to be a key role. What I’m curious about is where is the specific need for sharing best practices? How could this help direct the development of AI towards more inclusion? Who’s voice would realistically have the most impact for what audience?

I’m very much looking forward to this conference, and there will likely be a flurry of activity afterwards, stay tuned!

7 simple ways to protect your online privacy in 15 minutes

Working on data issues, I’m concerned about online privacy. And not in the way that I have something to hide, but more that it pisses me off that I’m constantly being monitored so my data can be used to line someone else’s pockets.

That’s the crux of it – a lot of the internet is now free because ad networks and others track you as your browse and collect your data to build up profiles, which can then be sold to better target ads to you. Your data is valuable, and you’re giving it away for free.

To blatantly overgeneralise, this is what cookies do. They track you as your browse across the internet. Especially third-party cookies, and those are the ones you want to avoid. When cookies were invented, there was a significant lobby to avoid them being called ‘spybots’.

Altaweel et. al (2015) found found that if you visit the top 100 websites on the internet, you easily download over 6000 cookies to your computer. That research has some serious, and seriously scary, gems.

It took me 30 minutes to implement several privacy measures and record them and write this post. Super quick, super simple. Like many of us, I don’t have the time right now to do learn how to use new software or figure out how to read code.

Each of these tips takes a grand total of 2 minutes to implement. No excuses.

Use Mozilla Firefox as a Browser

Back in the day Chrome was meant to be faster, but I’ve been using Mozilla for years and don’t notice a difference. They have more privacy settings, Chrome automatically tracks you, and Mozilla has a foundation which is actually committed to net neutrality and implementing that into practice.

Use DuckDuckGo instead of Google

It’s a search engine that doesn’t track you. Install it here, and take a look at this guy’s blog post on all the reasons why you should.

What I appreciate is that there are no filter bubbles – you don’t get shown what Google thinks you want to be shown.

You can also set it as your default search engine so that when you type into the address bar you’re searching with DuckDuckGo, and the instructions are on the installation page. It takes 30 seconds.

Turn off third party cookies

In firefox, it’s right there in Options –> privacy and security.
These are what my settings look like:

I do keep some cookies, but you could also just choose to browse in private mode all the time. Do be aware that means you can’t expect the browser to remember your passwords and you’ll have to fill them in manually each time.

Turn on ‘Do not Track’

Scrolling down on the privacy settings of Firefox, you can send out a ‘do not track’ message. You can also use a private browser to ensure you’re not tracked at all, but I tend not to do this often because then I have to re-log in everytime I want to check my email.

Install Privacy Badger

Developed by the Electronic Frontier Foundation, this is another way to block third party cookies. It runs in the background and you don’t have to do anything. Install it here.

Use Signal instead of Whatsapp, and tell your friends

Signal does the same thing as Whatsapp, except it doesn’t track you, it’s not owned by Facebook, and fewer people are on it so you don’t get added to all these random groups where it’s socially awkward to back out.

Check it out here and it’s also available on your App or iOS store.

Disable your browser’s geolocation

This one looks scary but I promise it’s not.

In Firefox, type ‘about:config’ into the address bar. Click ‘I accept the risk’ or ‘I promise I’ll be careful’.

In the search bar, type ‘geo.enabled’. You’ll have an option pop up. Double-click it, and it turns to ‘false’.

That’s it.

If you’re using another browser (tsk tsk), you can see how to do it here.

Other things i’m considering but haven’t gotten around to yet:

There are other things i’m considering but haven’t gotten aroudn to yet because they take a bit longer than 15 minutes and I need to read the documentation more carefully, like GNUPG, PGP encryption for emails, TOR browser, and ORC cloud service (dropbox alternative). Also possible changing my email away from the Google multi-verse.


So there you have it, a few simple strategies that will help tremendously. These should be the baseline.

You should also check your email privacy settings, especially if you’re using Gmail.

What did I learn from the EU Health Data workshop? Trust, consent, infrastructures & individuals

Last week I had a fun day participating in the workshop ‘Towards a European Ecosystem for Health Care Data’, hosted by the Digital Enlightenment Forum. While health data is not my area of expertise (or, let’s be honest, deep interest), the issues of how to organise regulation and cooperation around data are. And you always need a topic to ‘point to’ to bring people across sectors and type of work together. So the workshop and excellent discussions were an insight into high-level minds working on policy issues and trying out solutions to deal with the concrete issues of the changing digital landscape.

While the panels were pretty varied, ranging from the specifics of the GDPR to technical solutions to policy questions of harmonisation (you can access the presentations and check the twitter hashtag #euhealthdata), there were a few threads of conversation that ran through the workshop that as a relative outsider were very interesting to pick up.

People are willing to share their health data

What’s always been interesting about health data is that if you talk to somebody off the street with no direct relation to data discussions, they can immediately and intuitively understand the importance of the privacy of health records. Health data has always been an entry point to public discussions.

Which is why I was pleasantly surprised when Despina Spanou, Director for Consumers at the DG Justice and Consumers, presented the European Commission’s position and results of their recent public consultation on health data. 70% of respondents were individuals, and 90% said they would be willing to share their health data. I don’t have the exact numbers on hand, but you can get more information here.

So it’s incredibly encouraging that there is a public understanding of the use of data for the greater good.  (The gif is slightly less relevant, but I cannot utter the phrase ‘the greater good’ without thinking about it, so there you go).

We need to talk about control of data, not ownership

Ownership is a tricky issue when it comes to data, especially from the legal perspective.

On top of a presentation full of comics – my favourite kind – Petra Wilson from Health Connect gave an excellent analogy: you can rent a house, and you don’t own it, but you still have the right to stop people from entering.

This also explains why I have seen some interesting ideas flying by about learning from property rights and regulating data giants in similar ways to public utilities.

Instead of talking about ownership, the discussion is about control. But what does control mean, and how is that operationalised? Does it mean access, does it mean ability to remove your data from databases, do you need to give consent for every transaction?

Don’t trust, but build trustworthy systems

Trust means that you don’t know what is going on, but you decide to ‘trust’ that what’s happening is in your best interest. For instance, I don’t know that you’ll pay me back when you say you will, but it’s ok, I trust you.

If we take that and apply it to working with data, it becomes problematic, because there are fewer safeguards. You trust that the heart surgeon knows what they are doing because they’ve been to years of medical school, etc. etc. Relying on trust is not adequate – not only ethically, but also in practice; several research projects are failing because of the lack of trust and resulting inability to access data.

Rather than expecting trust, we need to build systems that safeguard trust within the infrastructure of the data system and how they’re used. Two specific ways were discussed, which I’ll address in turn below. Blockchain as the current fad of trust-less systems was mentioned in passing, but not given substantial attention.

Dynamic consent is the new favourite

Moving on to questions of how, several of the technical or technological solutions presented at the workshop were precisely about how to create a technical infrastructure to shift the locus of control firmly to the individual.

While the GDPR places a lot of emphasis on consent and consent needs to be given for each topic, how to implement this in practice becomes tricky.

‘Even though GDPR emphasises informed consent, most patients have no idea what that means or does.’ Bian Young, NTNU

Dynamic consent came up several times in the workshop as a possible solution (rather than, say, questioning if consent is the right model, it’s in the regulation so now let’s move forward from there.) For example, there was a presentation on piloting verifiable credentials – basically have a personal key which you then set your security details for each website that you can change for each website. Say, only allow the hospital to access a and b, but the shopping centre only to access a and c, or something like that. There was also discussion of homomorphic encryption.

Which, while I’m not a technical person and so have a wee bit of trouble relating it back now, at the time made sense, and I encourage you to look at the presentations if you want further follow up.

Data cooperatives as democratic data infrastructures

Several presenters discussed citizen-driven, collaborative, democratic infrastructure models piloting on health data: the from Switzerland, the from Catalonia, and the Data for Good Foundation in Denmark.

All were trying to deal with the systemic change in data governance, and this is where my ears prick up. What strikes me is that there is a lot of work being done on the data commons, and I am sure there are plenty of lessons to learn from the history of agricultural cooperatives.

What it also means: ‘It’s about organising the stakeholders, it’s not about the technology’ – Claus Nielsen, Data for Good Foundation

This is why data governance is important.

Interoperability needs information brokerage & oversight

What’s particular about health data and trying to create an EU ecosystem is that health is a member state issue. Which means the idea is to create harmonised investment and an interoperable system, but how that is operationalised into the healthcare system is up to individual member states.

Sonja Marcovic of RAND made a great point that there is a specific need for information brokerage. How do you ensure the information is out there? And accessible? And then fed back into and kept up to date? And sensitive to stakeholder needs, member state priorities, and patient-centred? How do you distill the memory of a project after that project has ended?

We need to share best, and worst practices. Feedback is one of my research interests, and being open to learning from what went wrong is also important.

Image result for best practices comic

We also need to be sure that we maintain data quality and reliability, especially as health data does not only come from hospitals, which are relatively easier to standardize for interoperability, but also increasingly from apps, from individuals, etc.

The individual as the ultimate data unit in an Enlightenment context

Throughout the workshop there was an assumption, sometimes explicit and sometimes unspoken, about the individual at the centre of data. This showed up in a few ways:

  • Linked to the control and ownership discussion above, it was the deafening vocie that the individuals should have control over their data, and that because citizens have the same needs EU-wide, this focus on the individual will also help interoperability concerns within a European framework.
  • Building patient-centred care, and putting patients’ needs at the forefront rather than those of companies or other actors. Entirely admirable and a positive movement.
  • The individual as the ‘ultimate integrator’ of data, put forward by Ernst Hafen of This makes sense from a technical level, where, I presume, you are looking at categories of data and how to organise them, and the individual is the smallest category to integrate. There is an idea here that needs to be further worked out, but for now this will do.
  • The overarching framework of ‘Digital Enlightenment’, harkening back to the values of the 18th century enlightenment.

As I’m working on a data justice project with a global focus and have been reading about the political philosophy underlying justice, I’ve been twirling around with ideas about liberalism and what does that mean, and how do understand the relation of the individual to the whole. I would not argue that individuals shouldn’t have control of their data, but I’m thinking through the implications of the liberal framework we work within, and what that might mean if we take this specifically European framework and apply (part of) it to other contexts.

And to close, this fantastic slide:

Full disclosure: I’d been working with DigEnlight for a year on the communications, and the workshop was my final shebang on working with DigEnlight in an official capacity. It’s a fantastic network of people working actively on an ethical transformation of society and technology, so I’ll still be connected with the network, especially once the upcoming trusted community platform is launched, probably in the next two weeks.

4 ways lawyers turned my assumptions on their head

I’ve been working at the Tilburg Institute of Law and Technology for a month and half now, and several times my jaw has practically dropped through the floor – not as a reflection of this awesome institute but more at the shock of uncovering how deep some of my own assumptions were. Chatting with lawyers has really challenged several of my assumptions about what it means to do research, to be an academic, and what is important to look at.

I firmly believe that asking questions and accepting you don’t know everything is the only way to actually know more. Which requires admitting a certain vulnerability, and generally makes you a nicer person to work with because you can still know your shit but remain humble.


The shock of disciplinary differences

It’s incredibly exciting to work as a social scientist in a law-dominated environment. I’d expected to learn a lot, and I am, though of course I could not have anticipated learning the things I have. Which, funnily enough, are a lot about methodology. The importance of methodology seems to be a lot on my mind recently – see a previous post on how methodology relates to ethical research and this tweet for, well, general astonishment.

It’s not the first time that I am confronted with the gaping assumptions of my own discipline. Previously I worked at the department of geography urban development and international development studies at the University of Amsterdam. It was heavily influenced by human geography and anthropology as disciplines. When I first joined the department as a masters student, it was a shock to my system because I’d done my bachelors in a postivist psychology department, and talking with cultural anthropologists about the meaning of science and truth was rather a revelation.

It can actually make you question everything you think you know – not only because of the assumptions statements are based on, but also because of the ways that those truths are arrived at. This is a big part of what motivated my work on flood management and how to get different communities of experts to actually speak the same language.

But on to the fun stuff:

What I’ve learnt from lawyers

I am still learning, so this might be totally way off, but they’re my first impressions coming into a new field.

1. Research is not necessarily empirical.

From psychology to development studies, my understanding of research was very much about ‘doing’ something. Either you create an experiment and calculate the stats (quantitative), or you go out into the world and talk to people and analyze the text (quantitative). Mass over simplifications but you get the idea.

So imagine my surprise when I realised that there are whole swathes of rigorous researchers who focus on ‘black letter law’, i.e. really looking at the what the law says and how to interpret it.  Desk research is still research, and does not neccesarily require ‘fieldwork’.

It’s still a bit of a shock because so much of my research history has been around how to do/carry out/organise/analyse empirical work.  Personally, it feels a bit empty with out it.


I can imagine that this must change the dynamics of how you build your network as a scholar.

2. Analytical is not necessarily better than descriptive.

The idea being that analytical work takes an analytical framework, a theory which strings together two concepts, which could then be used as a lens to look at the empirical data. In my masters, this critical perspective was very much a process if ‘elevating’ the work, and when I finally understood the difference between descriptive and analytical work it was a major step forward in my work.

So recently a colleague made a comment that they only wanted analytical and not descriptive work. To me this was incredibly self evident and the fact that they’d said it could only have been a benefit for up-and-coming researchers to understand the difference.

What I hadn’t expected was for this to stir quite a discussion from my lawyer colleagues. The response was: ‘If there’s no room for description was is the point of lawyers?

This was earth-shatteringly groundbreaking. See, a large part is to see how the law is to be interpreted in relation with the other parts of the law and with society itself. This requires substantial description, and the ‘analytical framework’ becomes redundant in the exercise.

Different purposes, different approaches.

It also means that there is a distinctly different ‘flavour’ to styles of writing articles – with an emphasis on logical structure and argumentation, more often than not the legal articles i’ve read are structured in a much more linear way, rather than narratively, as you find more frequently in social sciences. That said, I have still to read a lot of legal work, so this may be a totally flimsy statement.

3. Not all textual analysis is about revealing the narrative

I guess I shouldn’t be surprised with my background in cognitive psychology, where we studied al ot about the structure of language and breaking down meaning into specific units to be rearranged as a reflection of the workings of the mind.

Image result for narrative cartoon

Still, working in social science, the idea of a discourse is incredibly central to explaining how we make sense of the world. We have visions or stories that we aspire to and these stories, the narrative, shapes how we talk about something and as a result what opportunities we see to solving the problem. Personally I find it fascinating to reveal these discourses, there’s something primal about it.

Yet rather than focusing on the overarching narratives that shape the meanings of how we talk, lawyers have a tendency to analyse language in a very different, almost microscopic way: with an incredibly detailed focus on the meanings and interpretations of individual words. There is a reason for each, specific word, that can be unpacked for why we use that word and not another. And it matters.

While you find this in the social sciences too – we should use this word and not that – the way of using individual words as a unit of anlaysis almost instinctively has been a completely different approach that what I’m used to.

4. Being an engaged scholar means keeping really up-to-date with the news

Perhaps this is slightly biased because I am in a department that a) is top-notch, and b) where everybody’s work is being reshaped by the upcoming General Data Protection Regulation, which is changing all the things.

However, the lawyers that I’ve been speaking to are all incredibly engaged with current affairs. Following the new regulations and debates around them, writing in public arenas commentaries, keeping up to date with the proposed changes from government, etc. All logical, engaged things.

Not everybody is like this. Some people work on historical events. Some people hear the off-cuff thing. And that’s fine. But in this case, people are working on something with direct, immediate relevance, and there is something incredibly powerful about that and the role that it brings to academia.

And it is deemed relevant not just by scholars but also by everyday people. Questions of data governance are all the rage precisely because we can see for ourselves, as citizens and consumers and persons, that the world is changing before our eyes and we feel like we’re losing our grip just a wee tad. That’s why I’m on twitter more than ever before, so much to learn.

It’s inspiring to have this injection of relevance and urgency. But that’s for another discussion, methinks.


 Did I miss something?

Dear new colleagues – don’t take it personally, I’m finding the meeting of disciplinary boundaries incredibly fascinating 😉

Summary outline of Sen’s The Idea of Justice

Do you ever do that thing where you cite a particular seminal work so frequently that you think you know it so well but you actually sort of forget the finer argument? It’s incredibly embarassing, not to mention counter-productive, if you get caught out. (!)

As I am laying the foundations for my doctoral research on data justice, I am working through some literature reviews on big concepts to help frame the direction of work. The idea is to set a good base to stop that kind of thing from happening.

If it can be useful for you, I share here a 4 page summary outline of  Sen’s ‘The Idea of Justice’ , an important treatise on what justice is and how to acheive it.

Summary in 3 lines:
  1. A theory of justice needs to be useful in order to judge how to reduce injustice.
  2. Most theories of justice focus on what ‘the perfectly just world’ would look like, negating point #1. We need a comparative approach considering the lives people actually lead.
  3. Justice requires impartiality, which requires a certain objectivity and rationality, especially public rationale, therefore need public discussion and democracy as ‘government by  discussion’.

Methodology as an entryway to ethical data research

There is a growing call for ethical oversight of AI research, and rightly so. Problem is, ethical oversight hasn’t always stopped past research with questionable ethical compasses. Part of that, I argue, is that the ethical concerns raised largely by social scientists come from a completely different world view to those from a more technical background. While AI research is raising new problems, particularly with regards to correlation vs causation in research, but the tools we have to solve them haven’t changed that much.

With this blog I want to question – can methodology help social and technical experts speak the same language?

Since my masters degree I’ve been fascinated by the fact that people working in different disciplines or types of work will have completely different approaches to the same problem.

Like in this article on flooding in Chennai, I found that ‘the answers’ to solving flooding all already existed on the ground, it’s just the variety of knowledges weren’t being integrated because of the different ways that they’re valued.

I was recently speaking with my brilliant colleague and friend, who is a social constructivist scientist working in a very digital technology oriented academic department faculty. This orientation is important to note, because the methodology deployed for science and research there and the questions being asked are influenced to a large degree by the capacities and possibilities afforded by digital technologies and data. As a result, the space scientists see for answers can be very different.

In reviewing student research proposals, she found she was struggling because some research hypotheses completely ignored the ethical implications of the proposed research.

In talking it through, we realised that most of the problems arose from the assumptions that are made in framing those questions.

To take a classic example, in the field of remote sensing to identify slums, it is relatively common to see that implicit assumption that what defines a slum is the area’s morphology, that that definition is by the city planners and not the residents, and how locals interpret the area or the boundaries of the neighbourhood may differ completely. The ethical problem, beyond epistemology, is what can then be done in terms of policy based on the answers that that research provides.

To go back to  that paper that caused the controversy about identifying people’s sexual orientation from profile pictures downloaded from a dating site. It’s based on a pre-natal hormone theory of sexual orientation, which is a massive assumption in and of itself.  Even the responses to the article have basically boiled down to ‘AI can predict sexuality’, even though that’s blatant generalisation and doesn’t look at who was actually in the dataset (every single non-straight person? Only white people?). That, and then the fact that they basically ‘built the bomb to warn us of the dangers’, has a lot of assumptions about your view of ethics in the first place.

Like my 10th grade history teacher used to say, to assume makes an ASS of U and ME. (Thanks Mr. Desmarais)

More precisely, to assume without making the assumptions explicit. Not clearly articulating what your assumptions are is a *methodological* problem for empirical research, with ethical *implications*. Unexamined assumptions mean bad science. Confounding variables and all that.

For reference, in statistics there is an entire elaborate, standardized system of dealing with assumptions by codifying them into different tests. You apply one statistical test, which has a particular name, because of the assumptions you have – i.e. I assume this data has a normal distribution.

If you’re using mixed methods, it becomes much harder to have a coherent system to talk about assumptions because the questions that are asked may not yield data that is amenable to statistical analyses and therefore cannot be interpreted with statistical significance.

All the all the more important here to make assumptions explicit so they can be discussed and scrutinized.

Some ethical concerns can be dealt with more easily when we remember methodological scrutiny and transparency, bringing research back to the possibility of constructive criticism and not only fast publication potential.

How this process is dealt with currently in academia is ethical review, hence the call for ‘ethical watchdogs’.

Thing is, In terms of the process of doing science in academic settings, ethical review is often the final check before approval to carry out the research. When I did my BSC. In psychology, sending the proposal to the ethics review board felt like an annoyingly mandatory tick-box affair.

The problem with this end-of-the-line ethical review is:

  • It’s not clear why the ethics is important to actually carry out the research
  • If the ethics board declines, you’re essentially back to the drawing board and have to start again.

Particularly under the pressure for fast publication, there aren’t many incentives to do good ethics unless you’re concerned about it from the outset.


Image result for assumptions cartoon


What if we shifted the focus from ethics as an evaluation to ethics as methodology?

Rather than having an ethics review at the end of the process of formulating hypotheses and research proposals, could there be a way to incorporate an ethics review in the middle of the ‘research life cycle’?

One would then get feedback not only on the ethics but it could provide the opportunity to explain the research’s unexamined assumptions which ultimately makes for better science.

I understand this ideal situation implies quite a significant shift in institutional processes which are notorious for moving about as fast as stale syrup. Perhaps instead there could be a list of questions researchers could ask themselves to as a self-evaluation?

In this way, you could open an entryway to an ethical discussion as a question of methodology, rather than ontology or ethics per se, which are far too easily just troubled waters in terms of interdisciplinary discussions.

Do you know of any examples of structurally incorporating these ideas as a way to effective multidisciplinary dialogue?


My thanks go to my colleague who sparked this discussion and thought it through with me, who for reasons of their position, will remain anonymous.