What next for privacy and data protection? My CPDP2018 takeaways

The CPDP conference is the central hub of the European data protection community, bringing together policy wonks and makers, the burgeoning privacy industry, and a plethora of legal scholars. It is huge, successful, and a lot of fun. Now that I’ve had a week or so to chew on and process all my notes, here are the main themes that I’ve taken away as the next big issues in data, privacy and data protection – keeping in mind that I am not a lawyer, and I work on issues of global data justice.

YAY GDPR! But we still have our work cut out for us.

It’s a major accomplishment and it was not easy to get this level of protection – see the fantastic documentary Democracy for the story following Jan Albrecht. I am proud to be European and to have this protection. It’s a major accomplishment and let’s celebrate it.

Still, the GDPR is also limited in several ways, ways that legal scholars are working on and formed much of the basis of the conference. I cannot possibly do it all justice – but there are a few things that I can say from an interdisciplinary perspective;

  1. Relying on notification and consent is not scaleable – see what smartphone notifications look like when you have 8 million followers;
  2. There are debates about whether the concept of personal data, upon which the GDPR is based, is now so broad and inclusive that it is no longer relevant – see the recent ERC by my colleague Nadya Purtova and her fantastic panel on the subject with Peter Hustinx;
  3. The responsibilities in the GDPR are based on particular actors in particular sectors, but the multiple uses of data and connecting databases is making boundaries between sectors blurry, which is problematic for the GDPR;
  4. The GDPR is still based on individual rights and doesn’t so easily address collective harms, nor does it help for most people who are not well informed or where harms are not visible;

This isn’t going to develop into a well-structured critique of the GDPR, but rather these are the points that jumped out at me.

Data ethics is not enough. Trust is certainly not enough.

Considering the size and global interconnectivity of the data market, the dynamics of surveillance capitalism and legal enclosures that enable it mean that we need structural elements to create economic incentives and new directions. Much like CSR wouldn’t shift the underlying structures of the global economy towards fair trade and sustainability, data ethics won’t solve everything.


Indeed, as Mireille Hildebrandt commented, this is precisely why we have the rule of law, to prevent us from relying on this.

In the same-same-but-different sort of way, trust is not something we can rely on – trust implies that we do not have to ensure. I ‘trust you to pay me back’ means I don’t have to see any proof that you’ll back me back. Relying on trust as a concept means the individual is inherently vulnerable and the power lies in the hands of the provider of services, not the other way around.

There is a growing call for trust certification and trust marks as a positive incentive for companies to see data protection as an asset – the ‘carrot’ instead of the stick – and there was some very interesting work presented on these issues at this panel on private law. Yet let us be realistic, this is never going to restructure the global data economy as a whole. Therefore –>

We need to regulate the data market.

Clearly at a conference with a critical mass of legal scholars you saw this one coming. Yet rather than the broad sweeping statement there was a deep dive into how and what that might mean.

For instance, there was also a major panel on regulating monopolies which is well worth watching, drawing on discussions of anti-trust law. Particularly Barry Lynn, of Open Markets Institute (the one that infamously got kicked out of New America because they stood up to Google) was an inspiring mix of a preacher’s rant, sound economic advice, and a provocation. For someone with a background in development studies like me, it was an entry way into discerning between the neoliberal Chicago school and the potential of citizen-centred markets.

Individual rights are not the best way to solve collective problems.

There are several issues which are generally bundled into the tension between the individual and the collective.

The first is exemplified by last week’s Strava security scandal, where a data visualisation of the routes taken by the users of the fitness app unintentionally revealed the sensitive information of army base locations. Aggregated information can reveal vulnerabilities and risks with very concrete consequence without necessarily saying anything about the person or individuals. For a more in-depth discussion you can also check Taylor et. al’s book on group privacy, which has an interdisciplinary group of people who can’t seem to agree on anything except that it’s important.

The second is a cross-cultural question: the liberal individual upon which the human rights framework is based is a Western perspective that we will need to move beyond for a truly global conversation. It requires us to think about identity and the collective in ways that we aren’t quite sure how to do yet. Still, just because it’s a difficult question doesn’t mean we shouldn’t ask it. (Indeed, I tend to be that person in the room that asks the question that gets uncomfortable and awkward looks from panellists. #sorrynotsorry) This is precisely why I’m going to pursue further this further in my work on global data justice.

Lastly, and this might need some more substantiating from a legal scholar who has a better understanding of the nuances of the issue (any volunteers?) – but in court cases and legal studies, they speak now of fundamental rights and the essence of these rights, as being even more fundamental, and there are questions being asked as to whether the essence of the rights was violated in a particular case or not. As I’m sure you can gather I can’t speak much to this, but it struck me as a non-lawyer that the concept of fundamental rights was on shaky ground.

Where to next?

There are three strands that jumped out as warranting further exploration:

  1. Moving beyond a distributive paradigm and seeing data as an asset; which would require a reformulation of what data is and reshaping the market,
  2.  To what extent paternalism as an approach is legitimate and when, for which there was another excellent panel here (last round on a Friday evening, ooof)
  3. There is a lot to learn from an ecological approach, both in terms of interdependency and limits – something I am going to explore further in a paper for the Data Justice Conference in May (Come along!)

 

Just because these are all hard questions, doesn’t mean we shouldn’t be asking them. On the contrary.

Here is some fodder which has continued to resonate for inspiration:

AI & Inclusion Symposium: Questions rattling my brain as I prep for it

This week I have the pleasure of representing TILT at the Network of Centres’ Artificial Intelligence and Inclusion Symposium in Rio de Janeiro. (HELL YES omigodomiggod. Ok, composure regained.)AI & Inclusion Symposium

The event […] will identify, explore, and address the opportunities and challenges of artificial intelligence (AI) as we seek to build a better, more inclusive, and diverse world together. It is co-organized on behalf of the NoC by the Institute for Technology and Society of Rio de Janeiro (ITS Rio) and the Berkman Klein Center for Internet & Society at Harvard University.

Considering the research project I’m working on is specifically about data justice and social justice issues, the timing couldn’t be better. There is not only a focus on AI, but also specific attention to inclusion and how these issues manifest specifically in the global South.

The program is one of the better ones I’ve seen, not only in terms of topics but also with the attention to first setting a common baseline of understanding together (making sure ‘we’re all speaking the same language’) based on pre-meeting surveys, splitting into breakout groups, and building common knowledge with the specific aim to

‘identify intervention points for collaboration, communication, and synergy. Which ideas, initiatives, and projects in AI & Inclusion should be discussed further, emphasized, or reconceptualized?’

As an ex-English teacher specialised in the architecture of discussion to facilitate genuine communication, this makes me happy.

I mean, some of those leading the discussion in the plenaries are called ‘Firestarters’ (cue the 90s theme song)

They’ve also put together a fantastic reading list on AI and inclusion, accessible to the public here.

In preparation, I’m thinking through some of the questions that I’ll be bringing along.

How can we value social knowledges when AI deepens the dependence on STEM?

As I underlined in my paper on varieties of knowledge for urban planning, the way a society or institution values particular types of knowledge has implications for whether complementary sets of knowledges can get taken up and mainstreamed into development planning, or not. In particular, the dominance of STEM-based knowledges (science, technology, engineering, maths) mean that very concrete insights from the social science or non-formalised knowledges don’t get taken up, despite the potential for new perspectives towards solutions.

Related image

In some cases, such as in my experience in India, this is done to depolitcise planning – which, in the sociological circles, may sound like heresy, but in certain contexts, it’s a valid way to combat corruption.

This clearly goes beyond the question of AI – on the contrary, the ‘newness’ of AI may only serve to cement this distinction. Especially in areas where such a valuation is very distinct, how can we continue to bring in non-STEM-based insights into the usage of AI for development?

Why do we talk about inclusion and not social justice?

I’m not sure the symposium will be able to answer this, but I am working on it. The program talks about social inequalities, social good, and (perhaps intentionally?) doesn’t mention social justice – even though much of the reaction to algorithmic bias is very linked to social justice advocates. For instance, the Data Justice Lab at Cardiff University is expressed as ‘for social justice’.

It’s on my mind because I’m reading Young’s seminal book Justice and the Politics of Difference, which outlines the basic theory of social justice. She argues the vast majority of justice theories are within a distributive paradigm, meaning they focus on rights as ‘things’ you can ‘have’. Problem is, that obscures that power, and injustice, are relationships, often determined by institutional structures.

While I’m still working through the book, and while I personally agree with much of the social justice movement, if I’m honest I still quibble a bit with the image that gets evoked in my brain when I think of social justice, namely that of angry Tumblr users. So perhaps it’s a political decision not to mention social justice specifically. However, I think the focus on institutional structures that social justice highlights is absolutely foundational for resolving anything.

How can AI be reliable in areas characterised by data gaps & informality?

An algorithm is only as good as the data set it is based on. What if the data set is incomplete? This is nothing revolutionary – the question is more about whether there are specific applications of AI that circumvent this problem, if at all.

I should probably know this, but I don’t, yet.

Should we not be starting with the structure rather than the technology?

Gurses and van Hoboken just came out with a fantastic chapter on ‘Privacy after the Agile Turn‘. They explain how the evolution of software development has changed the structure of the internet in terms of it’s infrastructure, complexity and modularity, with significant implications for how we think about, and try to tackle, privacy and data protection. They suggest adapting differential privacy approaches to deal with this modular, distributed system of data collection and analysis (which would explain why dynamic consent is gaining popularity).

It’s well worth a read, but the bottom line is that by focusing on algorithms or data minimisation, we focus on how data as a thing is consumed, but we do not address the overarching structures of the political economy of the internet, nor do we focus on the flows of power created by institutional structures.

NOTE – This is not to discredit the field of AI ethics and this stuff needs to be thought through. I work on data governance more broadly, so it’s kind of normal that I always go back to the institutions. I also literally just read this paper and it’s been swimming around in my head. So it is a question that acts a lens through which I’ll be engaging in the conference.

Who is best placed to broker information about best practices?

More an implementation question – information brokerage also came up as necessary in the workshop I attended on EU health data, and it’s always going to be a key role. What I’m curious about is where is the specific need for sharing best practices? How could this help direct the development of AI towards more inclusion? Who’s voice would realistically have the most impact for what audience?

I’m very much looking forward to this conference, and there will likely be a flurry of activity afterwards, stay tuned!

What did I learn from the EU Health Data workshop? Trust, consent, infrastructures & individuals

Last week I had a fun day participating in the workshop ‘Towards a European Ecosystem for Health Care Data’, hosted by the Digital Enlightenment Forum. While health data is not my area of expertise (or, let’s be honest, deep interest), the issues of how to organise regulation and cooperation around data are. And you always need a topic to ‘point to’ to bring people across sectors and type of work together. So the workshop and excellent discussions were an insight into high-level minds working on policy issues and trying out solutions to deal with the concrete issues of the changing digital landscape.

While the panels were pretty varied, ranging from the specifics of the GDPR to technical solutions to policy questions of harmonisation (you can access the presentations and check the twitter hashtag #euhealthdata), there were a few threads of conversation that ran through the workshop that as a relative outsider were very interesting to pick up.

People are willing to share their health data

What’s always been interesting about health data is that if you talk to somebody off the street with no direct relation to data discussions, they can immediately and intuitively understand the importance of the privacy of health records. Health data has always been an entry point to public discussions.

Which is why I was pleasantly surprised when Despina Spanou, Director for Consumers at the DG Justice and Consumers, presented the European Commission’s position and results of their recent public consultation on health data. 70% of respondents were individuals, and 90% said they would be willing to share their health data. I don’t have the exact numbers on hand, but you can get more information here.

So it’s incredibly encouraging that there is a public understanding of the use of data for the greater good.  (The gif is slightly less relevant, but I cannot utter the phrase ‘the greater good’ without thinking about it, so there you go).

We need to talk about control of data, not ownership

Ownership is a tricky issue when it comes to data, especially from the legal perspective.

On top of a presentation full of comics – my favourite kind – Petra Wilson from Health Connect gave an excellent analogy: you can rent a house, and you don’t own it, but you still have the right to stop people from entering.

This also explains why I have seen some interesting ideas flying by about learning from property rights and regulating data giants in similar ways to public utilities.

Instead of talking about ownership, the discussion is about control. But what does control mean, and how is that operationalised? Does it mean access, does it mean ability to remove your data from databases, do you need to give consent for every transaction?

Don’t trust, but build trustworthy systems

Trust means that you don’t know what is going on, but you decide to ‘trust’ that what’s happening is in your best interest. For instance, I don’t know that you’ll pay me back when you say you will, but it’s ok, I trust you.

https://s3.amazonaws.com/lowres.cartoonstock.com/law-order-court-law-lawyers-swindled-trust-ear0934_low.jpg

If we take that and apply it to working with data, it becomes problematic, because there are fewer safeguards. You trust that the heart surgeon knows what they are doing because they’ve been to years of medical school, etc. etc. Relying on trust is not adequate – not only ethically, but also in practice; several research projects are failing because of the lack of trust and resulting inability to access data.

Rather than expecting trust, we need to build systems that safeguard trust within the infrastructure of the data system and how they’re used. Two specific ways were discussed, which I’ll address in turn below. Blockchain as the current fad of trust-less systems was mentioned in passing, but not given substantial attention.

Dynamic consent is the new favourite

Moving on to questions of how, several of the technical or technological solutions presented at the workshop were precisely about how to create a technical infrastructure to shift the locus of control firmly to the individual.

While the GDPR places a lot of emphasis on consent and consent needs to be given for each topic, how to implement this in practice becomes tricky.

‘Even though GDPR emphasises informed consent, most patients have no idea what that means or does.’ Bian Young, NTNU

Dynamic consent came up several times in the workshop as a possible solution (rather than, say, questioning if consent is the right model, it’s in the regulation so now let’s move forward from there.) For example, there was a presentation on piloting verifiable credentials – basically have a personal key which you then set your security details for each website that you can change for each website. Say, only allow the hospital to access a and b, but the shopping centre only to access a and c, or something like that. There was also discussion of homomorphic encryption.

Which, while I’m not a technical person and so have a wee bit of trouble relating it back now, at the time made sense, and I encourage you to look at the presentations if you want further follow up.

Data cooperatives as democratic data infrastructures

Several presenters discussed citizen-driven, collaborative, democratic infrastructure models piloting on health data: the midata.coop from Switzerland, the salus.coop from Catalonia, and the Data for Good Foundation in Denmark.

All were trying to deal with the systemic change in data governance, and this is where my ears prick up. What strikes me is that there is a lot of work being done on the data commons, and I am sure there are plenty of lessons to learn from the history of agricultural cooperatives.

What it also means: ‘It’s about organising the stakeholders, it’s not about the technology’ – Claus Nielsen, Data for Good Foundation

This is why data governance is important.

Interoperability needs information brokerage & oversight

What’s particular about health data and trying to create an EU ecosystem is that health is a member state issue. Which means the idea is to create harmonised investment and an interoperable system, but how that is operationalised into the healthcare system is up to individual member states.

Sonja Marcovic of RAND made a great point that there is a specific need for information brokerage. How do you ensure the information is out there? And accessible? And then fed back into and kept up to date? And sensitive to stakeholder needs, member state priorities, and patient-centred? How do you distill the memory of a project after that project has ended?

We need to share best, and worst practices. Feedback is one of my research interests, and being open to learning from what went wrong is also important.

Image result for best practices comic

We also need to be sure that we maintain data quality and reliability, especially as health data does not only come from hospitals, which are relatively easier to standardize for interoperability, but also increasingly from apps, from individuals, etc.

The individual as the ultimate data unit in an Enlightenment context

Throughout the workshop there was an assumption, sometimes explicit and sometimes unspoken, about the individual at the centre of data. This showed up in a few ways:

  • Linked to the control and ownership discussion above, it was the deafening vocie that the individuals should have control over their data, and that because citizens have the same needs EU-wide, this focus on the individual will also help interoperability concerns within a European framework.
  • Building patient-centred care, and putting patients’ needs at the forefront rather than those of companies or other actors. Entirely admirable and a positive movement.
  • The individual as the ‘ultimate integrator’ of data, put forward by Ernst Hafen of midata.coop. This makes sense from a technical level, where, I presume, you are looking at categories of data and how to organise them, and the individual is the smallest category to integrate. There is an idea here that needs to be further worked out, but for now this will do.
  • The overarching framework of ‘Digital Enlightenment’, harkening back to the values of the 18th century enlightenment.

As I’m working on a data justice project with a global focus and have been reading about the political philosophy underlying justice, I’ve been twirling around with ideas about liberalism and what does that mean, and how do understand the relation of the individual to the whole. I would not argue that individuals shouldn’t have control of their data, but I’m thinking through the implications of the liberal framework we work within, and what that might mean if we take this specifically European framework and apply (part of) it to other contexts.

And to close, this fantastic slide:

Full disclosure: I’d been working with DigEnlight for a year on the communications, and the workshop was my final shebang on working with DigEnlight in an official capacity. It’s a fantastic network of people working actively on an ethical transformation of society and technology, so I’ll still be connected with the network, especially once the upcoming trusted community platform is launched, probably in the next two weeks.

Methodology as an entryway to ethical data research

There is a growing call for ethical oversight of AI research, and rightly so. Problem is, ethical oversight hasn’t always stopped past research with questionable ethical compasses. Part of that, I argue, is that the ethical concerns raised largely by social scientists come from a completely different world view to those from a more technical background. While AI research is raising new problems, particularly with regards to correlation vs causation in research, but the tools we have to solve them haven’t changed that much.

With this blog I want to question – can methodology help social and technical experts speak the same language?

Since my masters degree I’ve been fascinated by the fact that people working in different disciplines or types of work will have completely different approaches to the same problem.

Like in this article on flooding in Chennai, I found that ‘the answers’ to solving flooding all already existed on the ground, it’s just the variety of knowledges weren’t being integrated because of the different ways that they’re valued.

I was recently speaking with my brilliant colleague and friend, who is a social constructivist scientist working in a very digital technology oriented academic department faculty. This orientation is important to note, because the methodology deployed for science and research there and the questions being asked are influenced to a large degree by the capacities and possibilities afforded by digital technologies and data. As a result, the space scientists see for answers can be very different.

In reviewing student research proposals, she found she was struggling because some research hypotheses completely ignored the ethical implications of the proposed research.

In talking it through, we realised that most of the problems arose from the assumptions that are made in framing those questions.

To take a classic example, in the field of remote sensing to identify slums, it is relatively common to see that implicit assumption that what defines a slum is the area’s morphology, that that definition is by the city planners and not the residents, and how locals interpret the area or the boundaries of the neighbourhood may differ completely. The ethical problem, beyond epistemology, is what can then be done in terms of policy based on the answers that that research provides.

To go back to  that paper that caused the controversy about identifying people’s sexual orientation from profile pictures downloaded from a dating site. It’s based on a pre-natal hormone theory of sexual orientation, which is a massive assumption in and of itself.  Even the responses to the article have basically boiled down to ‘AI can predict sexuality’, even though that’s blatant generalisation and doesn’t look at who was actually in the dataset (every single non-straight person? Only white people?). That, and then the fact that they basically ‘built the bomb to warn us of the dangers’, has a lot of assumptions about your view of ethics in the first place.

Like my 10th grade history teacher used to say, to assume makes an ASS of U and ME. (Thanks Mr. Desmarais)

More precisely, to assume without making the assumptions explicit. Not clearly articulating what your assumptions are is a *methodological* problem for empirical research, with ethical *implications*. Unexamined assumptions mean bad science. Confounding variables and all that.

For reference, in statistics there is an entire elaborate, standardized system of dealing with assumptions by codifying them into different tests. You apply one statistical test, which has a particular name, because of the assumptions you have – i.e. I assume this data has a normal distribution.

If you’re using mixed methods, it becomes much harder to have a coherent system to talk about assumptions because the questions that are asked may not yield data that is amenable to statistical analyses and therefore cannot be interpreted with statistical significance.

All the all the more important here to make assumptions explicit so they can be discussed and scrutinized.

Some ethical concerns can be dealt with more easily when we remember methodological scrutiny and transparency, bringing research back to the possibility of constructive criticism and not only fast publication potential.

How this process is dealt with currently in academia is ethical review, hence the call for ‘ethical watchdogs’.

Thing is, In terms of the process of doing science in academic settings, ethical review is often the final check before approval to carry out the research. When I did my BSC. In psychology, sending the proposal to the ethics review board felt like an annoyingly mandatory tick-box affair.

The problem with this end-of-the-line ethical review is:

  • It’s not clear why the ethics is important to actually carry out the research
  • If the ethics board declines, you’re essentially back to the drawing board and have to start again.

Particularly under the pressure for fast publication, there aren’t many incentives to do good ethics unless you’re concerned about it from the outset.

 

Image result for assumptions cartoon

 

What if we shifted the focus from ethics as an evaluation to ethics as methodology?

Rather than having an ethics review at the end of the process of formulating hypotheses and research proposals, could there be a way to incorporate an ethics review in the middle of the ‘research life cycle’?

One would then get feedback not only on the ethics but it could provide the opportunity to explain the research’s unexamined assumptions which ultimately makes for better science.

I understand this ideal situation implies quite a significant shift in institutional processes which are notorious for moving about as fast as stale syrup. Perhaps instead there could be a list of questions researchers could ask themselves to as a self-evaluation?

In this way, you could open an entryway to an ethical discussion as a question of methodology, rather than ontology or ethics per se, which are far too easily just troubled waters in terms of interdisciplinary discussions.

Do you know of any examples of structurally incorporating these ideas as a way to effective multidisciplinary dialogue?

 

My thanks go to my colleague who sparked this discussion and thought it through with me, who for reasons of their position, will remain anonymous.

 

People first, then tech: How context solves eGovernment platform problems

This post draws on classic social perspectives to present a whirlwind tour of how understanding context is crucial to designing more effective platforms. Particularly as the Netherlands is one of the most digitised nations advanced in digital governance and others are following suit. Right now, the major problem is that people are getting fined for things they didn’t know they were responsible for.

Continue…