What next for privacy and data protection? My CPDP2018 takeaways

The CPDP conference is the central hub of the European data protection community, bringing together policy wonks and makers, the burgeoning privacy industry, and a plethora of legal scholars. It is huge, successful, and a lot of fun. Now that I’ve had a week or so to chew on and process all my notes, here are the main themes that I’ve taken away as the next big issues in data, privacy and data protection – keeping in mind that I am not a lawyer, and I work on issues of global data justice.

YAY GDPR! But we still have our work cut out for us.

It’s a major accomplishment and it was not easy to get this level of protection – see the fantastic documentary Democracy for the story following Jan Albrecht. I am proud to be European and to have this protection. It’s a major accomplishment and let’s celebrate it.

Still, the GDPR is also limited in several ways, ways that legal scholars are working on and formed much of the basis of the conference. I cannot possibly do it all justice – but there are a few things that I can say from an interdisciplinary perspective;

  1. Relying on notification and consent is not scaleable – see what smartphone notifications look like when you have 8 million followers;
  2. There are debates about whether the concept of personal data, upon which the GDPR is based, is now so broad and inclusive that it is no longer relevant – see the recent ERC by my colleague Nadya Purtova and her fantastic panel on the subject with Peter Hustinx;
  3. The responsibilities in the GDPR are based on particular actors in particular sectors, but the multiple uses of data and connecting databases is making boundaries between sectors blurry, which is problematic for the GDPR;
  4. The GDPR is still based on individual rights and doesn’t so easily address collective harms, nor does it help for most people who are not well informed or where harms are not visible;

This isn’t going to develop into a well-structured critique of the GDPR, but rather these are the points that jumped out at me.

Data ethics is not enough. Trust is certainly not enough.

Considering the size and global interconnectivity of the data market, the dynamics of surveillance capitalism and legal enclosures that enable it mean that we need structural elements to create economic incentives and new directions. Much like CSR wouldn’t shift the underlying structures of the global economy towards fair trade and sustainability, data ethics won’t solve everything.

Indeed, as Mireille Hildebrandt commented, this is precisely why we have the rule of law, to prevent us from relying on this.

In the same-same-but-different sort of way, trust is not something we can rely on – trust implies that we do not have to ensure. I ‘trust you to pay me back’ means I don’t have to see any proof that you’ll back me back. Relying on trust as a concept means the individual is inherently vulnerable and the power lies in the hands of the provider of services, not the other way around.

There is a growing call for trust certification and trust marks as a positive incentive for companies to see data protection as an asset – the ‘carrot’ instead of the stick – and there was some very interesting work presented on these issues at this panel on private law. Yet let us be realistic, this is never going to restructure the global data economy as a whole. Therefore –>

We need to regulate the data market.

Clearly at a conference with a critical mass of legal scholars you saw this one coming. Yet rather than the broad sweeping statement there was a deep dive into how and what that might mean.

For instance, there was also a major panel on regulating monopolies which is well worth watching, drawing on discussions of anti-trust law. Particularly Barry Lynn, of Open Markets Institute (the one that infamously got kicked out of New America because they stood up to Google) was an inspiring mix of a preacher’s rant, sound economic advice, and a provocation. For someone with a background in development studies like me, it was an entry way into discerning between the neoliberal Chicago school and the potential of citizen-centred markets.

Individual rights are not the best way to solve collective problems.

There are several issues which are generally bundled into the tension between the individual and the collective.

The first is exemplified by last week’s Strava security scandal, where a data visualisation of the routes taken by the users of the fitness app unintentionally revealed the sensitive information of army base locations. Aggregated information can reveal vulnerabilities and risks with very concrete consequence without necessarily saying anything about the person or individuals. For a more in-depth discussion you can also check Taylor et. al’s book on group privacy, which has an interdisciplinary group of people who can’t seem to agree on anything except that it’s important.

The second is a cross-cultural question: the liberal individual upon which the human rights framework is based is a Western perspective that we will need to move beyond for a truly global conversation. It requires us to think about identity and the collective in ways that we aren’t quite sure how to do yet. Still, just because it’s a difficult question doesn’t mean we shouldn’t ask it. (Indeed, I tend to be that person in the room that asks the question that gets uncomfortable and awkward looks from panellists. #sorrynotsorry) This is precisely why I’m going to pursue further this further in my work on global data justice.

Lastly, and this might need some more substantiating from a legal scholar who has a better understanding of the nuances of the issue (any volunteers?) – but in court cases and legal studies, they speak now of fundamental rights and the essence of these rights, as being even more fundamental, and there are questions being asked as to whether the essence of the rights was violated in a particular case or not. As I’m sure you can gather I can’t speak much to this, but it struck me as a non-lawyer that the concept of fundamental rights was on shaky ground.

Where to next?

There are three strands that jumped out as warranting further exploration:

  1. Moving beyond a distributive paradigm and seeing data as an asset; which would require a reformulation of what data is and reshaping the market,
  2.  To what extent paternalism as an approach is legitimate and when, for which there was another excellent panel here (last round on a Friday evening, ooof)
  3. There is a lot to learn from an ecological approach, both in terms of interdependency and limits – something I am going to explore further in a paper for the Data Justice Conference in May (Come along!)


Just because these are all hard questions, doesn’t mean we shouldn’t be asking them. On the contrary.

Here is some fodder which has continued to resonate for inspiration:

7 key issues for digital methods in social science: dmi18 takeaways

What happens when you add one varied toolkit for digital methods, a research question, an enthusiastic team, the equivalent of a collective IV drip of caffeine, and pile them into a room for a week?

Last week I had the pleasure of co-facilitating a project at the Digital Methods Initiative Winter School 2018 at the University of Amsterdam. The idea is that rather than trying to take analogue methods and apply them them to the internet, we take tools that are only possible because of the unique characteristics of the web.

It’s an intense week of experimentation, research, and learning, all rolled into one; if you’ve worked in hackerspaces and startups, the general vibe of the place will be familiar. It’s like an explosion of several tangential whirlwinds that kind of finds a way to coagulate and settle within a week.

While my organisation/control-freak tendencies were a little overwhelmed, with that one week sprint we probably saved ourselves a good three months of work where we would have been faffing around. I would like to perhaps save you some of the save headaches, by making it very clear some of the key methodological points & assumptions. (You know how I like to clarify assumptions).

1. Follow the question, not the tools

With such a glorious collection of tools for digital methods, it is tempting to just throw the tools at the data to see what happens. And, truth be told, this is what is needed a lot of the time. Yet the ‘throw everything at the wall and see what sticks’ approach can only be exploration, and does not the foundations of a sound methodology make. Once that’s done, there needs to be a process of reflecting in light of the questions, to be led by what is analytically interesting, and not to be led by what is technically possible.

2. Be open to an iterative evolution of your methodology

There is a while where it feels like you’re floating headlessly in space, unsure of your footing or where you’re going. After our first day, which I felt had been a complete, chaotic mess, I asked our facilitator how he felt our day went, and his first word was ‘structured. Because you have a clear research question’. Just to give you an idea.

The long and short of it is that the experimental approach to new tools means you try things out, things break, you fix them, try again, and again. It is an incredibly iterative process without a clearly linear project plan, but instead morphs to what happens, and the deviations from the original line of thinking are also insightful.

3. You will still need local knowledge to do internet research

Image result for context meme

Quantitative digital methods are not the be-all-and-end all; we need humans, insight, and local knowledge to make meaning of it all, much in the same way as a statistical test just spits out a number and you have to make sense of it.

There are several examples of picking a topic, running data scraping on it, and finding absolutely nothing – only to be later told by somebody with more local knowledge that that particular issue was about vulnerable people who wanted to hide, rather than expose their views, on the internet, for fear of persecution. Just an example of how context, once again, is everything.

4. Mapping silences requires some serious methodological creativity

In the same vein as above, there are usually good reasons why you aren’t finding what you want to find. The trick, then, becomes whether the tools can *show* those silences, or that noise. The tools and representation have to then be inverted, triangulated, and brought into dialogue with one another – and mapping what isn’t there requires some lateral thinking.

5. You need large datasets; so think global

It’s not just that a small N is not statistically valid sample. It’s that many of the tools work on machine learning, which will only give a smidgen of accuracy if there is an opportunity for many iterations and a large data set. For instance, topic modelling on a small sample will produce absolutely useless nonsense (we tried, we really tried). In this sense, trying to adapt these digital methods to a more traditional ethnographic mindset is less helpful because your sample size is dramatically narrowed from the get-go; for example, searching for opinions on one particular policy during one election year in one particular country is very limited. Instead, think of issues and questions that could, theoretically, span and sweep the entire web.

6. Cleaning your data is vital, but long and laborious

Image result for clean data meme

Statistics codes which assumptions are embedded in your data set, but this is still missing as our methods evolve into the digital. Especially in large datasets there will be a lot of things to clean out. Outliers, for one, can be interesting, but do need to be taken out. When doing text mining, a lot of words or phrases that are specific to your text will need to be cleaned out. This means you’ll need to take a look at the data itself, not just the tool’s interface, and keep going back and forth between one and the other. For instance, if you are scraping blog posts, in all likelihood you will have copyright phrases, ‘brought to you by WordPress’, and menu items or advertisement blocks.

As you get through the last iterations, there is a certain joy in staring at the screen hoping that this time it’ll pop out something useable.

7. To use or not to use a clean research browser/smartphone/laptop

The vast majority of browsers and internet search engines track, and store, your behaviour on the internet, to create a profile of you over time, even if you have privacy settings in place. These profiles influence what you are shown – so if you are using the browser for your research, your results will be affected.

In some cases, it is recommended to use a ‘clean’ research browser; one that has been unused, has no user profiles, and has no prior ‘life’ on it, so as not to skew results. However, in some cases and depending on the RQ, this may prove to be unhelpful – for instance, one group searching for queer narratives could not find them using a ‘clean’ browser, but only when using a browser that had been ‘trained’ (i.e. used over the last year) by a feminist. As always, either is fine as long as you’re aware and explicit.

With thanks to the wonderful team of folks who worked with us for a week – I couldn’t have asked for a more creative, reflective and dedicated group! Also thanks to the Digital Methods Initiative for organising the week. More to follow, probably.

AI & Inclusion Symposium: Questions rattling my brain as I prep for it

This week I have the pleasure of representing TILT at the Network of Centres’ Artificial Intelligence and Inclusion Symposium in Rio de Janeiro. (HELL YES omigodomiggod. Ok, composure regained.)AI & Inclusion Symposium

The event […] will identify, explore, and address the opportunities and challenges of artificial intelligence (AI) as we seek to build a better, more inclusive, and diverse world together. It is co-organized on behalf of the NoC by the Institute for Technology and Society of Rio de Janeiro (ITS Rio) and the Berkman Klein Center for Internet & Society at Harvard University.

Considering the research project I’m working on is specifically about data justice and social justice issues, the timing couldn’t be better. There is not only a focus on AI, but also specific attention to inclusion and how these issues manifest specifically in the global South.

The program is one of the better ones I’ve seen, not only in terms of topics but also with the attention to first setting a common baseline of understanding together (making sure ‘we’re all speaking the same language’) based on pre-meeting surveys, splitting into breakout groups, and building common knowledge with the specific aim to

‘identify intervention points for collaboration, communication, and synergy. Which ideas, initiatives, and projects in AI & Inclusion should be discussed further, emphasized, or reconceptualized?’

As an ex-English teacher specialised in the architecture of discussion to facilitate genuine communication, this makes me happy.

I mean, some of those leading the discussion in the plenaries are called ‘Firestarters’ (cue the 90s theme song)

They’ve also put together a fantastic reading list on AI and inclusion, accessible to the public here.

In preparation, I’m thinking through some of the questions that I’ll be bringing along.

How can we value social knowledges when AI deepens the dependence on STEM?

As I underlined in my paper on varieties of knowledge for urban planning, the way a society or institution values particular types of knowledge has implications for whether complementary sets of knowledges can get taken up and mainstreamed into development planning, or not. In particular, the dominance of STEM-based knowledges (science, technology, engineering, maths) mean that very concrete insights from the social science or non-formalised knowledges don’t get taken up, despite the potential for new perspectives towards solutions.

Related image

In some cases, such as in my experience in India, this is done to depolitcise planning – which, in the sociological circles, may sound like heresy, but in certain contexts, it’s a valid way to combat corruption.

This clearly goes beyond the question of AI – on the contrary, the ‘newness’ of AI may only serve to cement this distinction. Especially in areas where such a valuation is very distinct, how can we continue to bring in non-STEM-based insights into the usage of AI for development?

Why do we talk about inclusion and not social justice?

I’m not sure the symposium will be able to answer this, but I am working on it. The program talks about social inequalities, social good, and (perhaps intentionally?) doesn’t mention social justice – even though much of the reaction to algorithmic bias is very linked to social justice advocates. For instance, the Data Justice Lab at Cardiff University is expressed as ‘for social justice’.

It’s on my mind because I’m reading Young’s seminal book Justice and the Politics of Difference, which outlines the basic theory of social justice. She argues the vast majority of justice theories are within a distributive paradigm, meaning they focus on rights as ‘things’ you can ‘have’. Problem is, that obscures that power, and injustice, are relationships, often determined by institutional structures.

While I’m still working through the book, and while I personally agree with much of the social justice movement, if I’m honest I still quibble a bit with the image that gets evoked in my brain when I think of social justice, namely that of angry Tumblr users. So perhaps it’s a political decision not to mention social justice specifically. However, I think the focus on institutional structures that social justice highlights is absolutely foundational for resolving anything.

How can AI be reliable in areas characterised by data gaps & informality?

An algorithm is only as good as the data set it is based on. What if the data set is incomplete? This is nothing revolutionary – the question is more about whether there are specific applications of AI that circumvent this problem, if at all.

I should probably know this, but I don’t, yet.

Should we not be starting with the structure rather than the technology?

Gurses and van Hoboken just came out with a fantastic chapter on ‘Privacy after the Agile Turn‘. They explain how the evolution of software development has changed the structure of the internet in terms of it’s infrastructure, complexity and modularity, with significant implications for how we think about, and try to tackle, privacy and data protection. They suggest adapting differential privacy approaches to deal with this modular, distributed system of data collection and analysis (which would explain why dynamic consent is gaining popularity).

It’s well worth a read, but the bottom line is that by focusing on algorithms or data minimisation, we focus on how data as a thing is consumed, but we do not address the overarching structures of the political economy of the internet, nor do we focus on the flows of power created by institutional structures.

NOTE – This is not to discredit the field of AI ethics and this stuff needs to be thought through. I work on data governance more broadly, so it’s kind of normal that I always go back to the institutions. I also literally just read this paper and it’s been swimming around in my head. So it is a question that acts a lens through which I’ll be engaging in the conference.

Who is best placed to broker information about best practices?

More an implementation question – information brokerage also came up as necessary in the workshop I attended on EU health data, and it’s always going to be a key role. What I’m curious about is where is the specific need for sharing best practices? How could this help direct the development of AI towards more inclusion? Who’s voice would realistically have the most impact for what audience?

I’m very much looking forward to this conference, and there will likely be a flurry of activity afterwards, stay tuned!

What did I learn from the EU Health Data workshop? Trust, consent, infrastructures & individuals

Last week I had a fun day participating in the workshop ‘Towards a European Ecosystem for Health Care Data’, hosted by the Digital Enlightenment Forum. While health data is not my area of expertise (or, let’s be honest, deep interest), the issues of how to organise regulation and cooperation around data are. And you always need a topic to ‘point to’ to bring people across sectors and type of work together. So the workshop and excellent discussions were an insight into high-level minds working on policy issues and trying out solutions to deal with the concrete issues of the changing digital landscape.

While the panels were pretty varied, ranging from the specifics of the GDPR to technical solutions to policy questions of harmonisation (you can access the presentations and check the twitter hashtag #euhealthdata), there were a few threads of conversation that ran through the workshop that as a relative outsider were very interesting to pick up.

People are willing to share their health data

What’s always been interesting about health data is that if you talk to somebody off the street with no direct relation to data discussions, they can immediately and intuitively understand the importance of the privacy of health records. Health data has always been an entry point to public discussions.

Which is why I was pleasantly surprised when Despina Spanou, Director for Consumers at the DG Justice and Consumers, presented the European Commission’s position and results of their recent public consultation on health data. 70% of respondents were individuals, and 90% said they would be willing to share their health data. I don’t have the exact numbers on hand, but you can get more information here.

So it’s incredibly encouraging that there is a public understanding of the use of data for the greater good.  (The gif is slightly less relevant, but I cannot utter the phrase ‘the greater good’ without thinking about it, so there you go).

We need to talk about control of data, not ownership

Ownership is a tricky issue when it comes to data, especially from the legal perspective.

On top of a presentation full of comics – my favourite kind – Petra Wilson from Health Connect gave an excellent analogy: you can rent a house, and you don’t own it, but you still have the right to stop people from entering.

This also explains why I have seen some interesting ideas flying by about learning from property rights and regulating data giants in similar ways to public utilities.

Instead of talking about ownership, the discussion is about control. But what does control mean, and how is that operationalised? Does it mean access, does it mean ability to remove your data from databases, do you need to give consent for every transaction?

Don’t trust, but build trustworthy systems

Trust means that you don’t know what is going on, but you decide to ‘trust’ that what’s happening is in your best interest. For instance, I don’t know that you’ll pay me back when you say you will, but it’s ok, I trust you.


If we take that and apply it to working with data, it becomes problematic, because there are fewer safeguards. You trust that the heart surgeon knows what they are doing because they’ve been to years of medical school, etc. etc. Relying on trust is not adequate – not only ethically, but also in practice; several research projects are failing because of the lack of trust and resulting inability to access data.

Rather than expecting trust, we need to build systems that safeguard trust within the infrastructure of the data system and how they’re used. Two specific ways were discussed, which I’ll address in turn below. Blockchain as the current fad of trust-less systems was mentioned in passing, but not given substantial attention.

Dynamic consent is the new favourite

Moving on to questions of how, several of the technical or technological solutions presented at the workshop were precisely about how to create a technical infrastructure to shift the locus of control firmly to the individual.

While the GDPR places a lot of emphasis on consent and consent needs to be given for each topic, how to implement this in practice becomes tricky.

‘Even though GDPR emphasises informed consent, most patients have no idea what that means or does.’ Bian Young, NTNU

Dynamic consent came up several times in the workshop as a possible solution (rather than, say, questioning if consent is the right model, it’s in the regulation so now let’s move forward from there.) For example, there was a presentation on piloting verifiable credentials – basically have a personal key which you then set your security details for each website that you can change for each website. Say, only allow the hospital to access a and b, but the shopping centre only to access a and c, or something like that. There was also discussion of homomorphic encryption.

Which, while I’m not a technical person and so have a wee bit of trouble relating it back now, at the time made sense, and I encourage you to look at the presentations if you want further follow up.

Data cooperatives as democratic data infrastructures

Several presenters discussed citizen-driven, collaborative, democratic infrastructure models piloting on health data: the midata.coop from Switzerland, the salus.coop from Catalonia, and the Data for Good Foundation in Denmark.

All were trying to deal with the systemic change in data governance, and this is where my ears prick up. What strikes me is that there is a lot of work being done on the data commons, and I am sure there are plenty of lessons to learn from the history of agricultural cooperatives.

What it also means: ‘It’s about organising the stakeholders, it’s not about the technology’ – Claus Nielsen, Data for Good Foundation

This is why data governance is important.

Interoperability needs information brokerage & oversight

What’s particular about health data and trying to create an EU ecosystem is that health is a member state issue. Which means the idea is to create harmonised investment and an interoperable system, but how that is operationalised into the healthcare system is up to individual member states.

Sonja Marcovic of RAND made a great point that there is a specific need for information brokerage. How do you ensure the information is out there? And accessible? And then fed back into and kept up to date? And sensitive to stakeholder needs, member state priorities, and patient-centred? How do you distill the memory of a project after that project has ended?

We need to share best, and worst practices. Feedback is one of my research interests, and being open to learning from what went wrong is also important.

Image result for best practices comic

We also need to be sure that we maintain data quality and reliability, especially as health data does not only come from hospitals, which are relatively easier to standardize for interoperability, but also increasingly from apps, from individuals, etc.

The individual as the ultimate data unit in an Enlightenment context

Throughout the workshop there was an assumption, sometimes explicit and sometimes unspoken, about the individual at the centre of data. This showed up in a few ways:

  • Linked to the control and ownership discussion above, it was the deafening vocie that the individuals should have control over their data, and that because citizens have the same needs EU-wide, this focus on the individual will also help interoperability concerns within a European framework.
  • Building patient-centred care, and putting patients’ needs at the forefront rather than those of companies or other actors. Entirely admirable and a positive movement.
  • The individual as the ‘ultimate integrator’ of data, put forward by Ernst Hafen of midata.coop. This makes sense from a technical level, where, I presume, you are looking at categories of data and how to organise them, and the individual is the smallest category to integrate. There is an idea here that needs to be further worked out, but for now this will do.
  • The overarching framework of ‘Digital Enlightenment’, harkening back to the values of the 18th century enlightenment.

As I’m working on a data justice project with a global focus and have been reading about the political philosophy underlying justice, I’ve been twirling around with ideas about liberalism and what does that mean, and how do understand the relation of the individual to the whole. I would not argue that individuals shouldn’t have control of their data, but I’m thinking through the implications of the liberal framework we work within, and what that might mean if we take this specifically European framework and apply (part of) it to other contexts.

And to close, this fantastic slide:

Full disclosure: I’d been working with DigEnlight for a year on the communications, and the workshop was my final shebang on working with DigEnlight in an official capacity. It’s a fantastic network of people working actively on an ethical transformation of society and technology, so I’ll still be connected with the network, especially once the upcoming trusted community platform is launched, probably in the next two weeks.