The ethical footprints of big data, business intelligence and data mining

By Andrew Collins
Wednesday, 06 March, 2013

There’s a lot of talk these days about carbon footprint. I’d like to talk about ethical footprints: the impact our day-to-day activities have on others. In particular, I want to look at the effects of technology.

On the one hand, there are the obvious examples of ethical failures in tech: companies using child labour in their factories or having work conditions so bad that scores of employees commit suicide. The companies in these instances should be named and shamed until such practices cease. And we should all be aware of the effect that our purchasing of these products has on the world around us.

But with the increasing popularity of big data, business intelligence, and data mining and analytics, our ethical footprints are becoming both more expansive and harder to foresee.

Take, for example, Facebook’s Graph Search, which the company revealed just last month. The tool allows users to search Facebook’s database of friends and interests using specific queries in something resembling natural language.

Facebook provided several examples of such queries, including: “music liked by people who like Mitt Romney”, “people who have been product managers and who have been founders” and “languages my friends speak”.

But the tool has attracted attention for the way it could be used maliciously. Tom Scott, a Facebook user with early access to Graph Search, set up a tumblr account to demonstrate specific examples of how a malicious user could exploit the tool. The account is called ‘Actual Facebook Graph Searches’ (http://actualfacebookgraphsearches.tumblr.com/).

Scott entered a few queries into the search engine and posted the results on his tumblr (with any identifying information such as people’s names or faces redacted). One query was “married people who like prostitutes”. It returned a list of people who had listed themselves as married on their Facebook profile, and also listed prostitutes as an interest. From the search results page, Scott could get in touch with the spouses of these people.

That example is more amusing than alarming - it’s pretty unlikely that any married person in their right mind who makes use of such services is going to list prostitutes as an interest, especially if they’re Friends with their spouse on Facebook.

But Scott also provides pretty terrifying examples of queries that could be used maliciously, like “family members of people who live in China and like Falun Gong” and “Islamic men interested in men who live in Tehran, Iran”.

These queries could easily be used by organisations or private citizens with malicious intent to persecute individuals.

To be fair, the tool doesn’t aim to open up every detail of every Facebook user to every person on the internet. In the company’s own words: “With Graph Search, you can look up anything shared with you on Facebook, and others can find stuff you’ve shared with them, including content set to Public.”

But the fact that Scott’s searches returned so many results shows that many people either aren’t aware of how to change their privacy settings or don’t realise that there is a need to do so.

Also, consider that Facebook has previously made sudden changes to its privacy settings. What happens if you go on a six-month holiday and Facebook changes how its privacy settings work (again), revealing some part of your profile that you thought you’d hidden from everyone but your friends?

The obvious solution in this scenario is to avoid putting private information on Facebook and the like in the first place.

But that doesn’t always work. With the sophistication of today’s data analysis tools, organisations can use the little information that you do give them to make shockingly correct inferences about you.

Take, for example, the story of how retailer Target found out that a teenage girl was secretly pregnant and started a chain of events that led to her secret being revealed to her parents.

A fascinating 2012 New York Times article detailed how Target developed a way of determining if a customer is pregnant, even if they didn’t want the company to know. New parents are a goldmine for retailers, so the company wanted a way to target expectant mothers and deliver them related advertising before the birth.

When it can, Target assigns each of its customers a unique ‘Guest ID’ and uses that to keep track of what those customers are buying, as well as other demographic information.

The company’s statisticians were able to look at that information - for example, that a customer had recently been buying scent-free soap, large bags of cotton balls, magnesium supplements, etc - and make a statistical inference that the customer was however many per cent likely to be pregnant, and also estimate when they were due. Once identified, the customer could be fed advertising in the build-up to their due date, in the hope that Target became their one-stop shop post-birth.

This strategy backfired when the company sent baby-related advertising material to a teenage girl, still in high school, in the US. The girl was in fact pregnant, but her parents didn’t know. At least, not until their daughter started receiving advertising for baby clothes and cribs, and questions started being asked.

The girl’s pregnancy wasn’t revealed by a careless Facebook status update, but by the workings of a data analytics program. All it took was for her to be alive in the modern age and her privacy was busted wide open.

The above examples probably seemed quite innocent to their creators when they were first thought up. But these technologies have great capacity to inflict harm on those around us - and ourselves - if we’re not careful.

Get thinking about your ethical footprint and the dodgy practices you yourself are being exposed to. We all need to be more aware of what effect we’re having on the world and what effect the world is having on us.

See something you don’t like? Speak up.

The ethical footprints of big data, business intelligence and data mining

How to tackle the rising threat of shadow AI

Coding a clean slate: how AI is writing off tech debt

The hidden legal risk in your AI workflow

Content from other channels on our network