Big data: real-world benefits and ethical implications
The meaning of the term ‘big data’ seems stupidly obvious at first glance: you assemble a bunch of data and presumably you then run some analytics to get some meaningful insight out of it. But like many terms in IT, big data means different things to different people, and the ethical considerations around the tech are complicated.
We’ve assembled a panel of experts to discuss the issue, comprising:
- Kyle Evans, Chief Data Officer, RP Data
- Professor Pascal Perez, Research Director, SMART Infrastructure Facility, University of Wollongong
- Jonathan Lui, COO and Co-founder, Airtasker
- Ian Bertram, Asia Pacific Head of Research, Gartner
Q. What is big data to you?
Our panellists’ use of data covers a broad spectrum. Evans said that RP Data - which offers real estate market information services - analyses log files of 120,000 end users on the company’s systems, including its web interfaces.
“We track every piece of behavioural information that those users do,” Evans says. Historically, the company could only house about two months’ - 30 GB - worth of this data, which Evans says is insufficient to establish trends. Now, “by using big data technologies and custom big data solutions, we can put 18 months’ worth of data together and start to really see some trends and patterns”.
He describes big data as “data of a magnitude that cannot be handled by traditional database management technologies and processes”. Specifically, he subscribes to the ‘three Vs’ idea of big data: data stores of increasing volume (amounts of data), velocity (speed of data moving through a system) and variety (types of data, like text, images, audio, video and so on).
Perez said that the SMART Infrastructure Facility project at Wollongong Uni - which is mandated to look at infrastructure systems like water, transport and energy from an integrated perspective - is not (yet) in the big data category. Instead, he labels it as “smart data”.
Over the last two years his project has pooled “any kind of data we could grab” related to energy and water consumption, solid waste or sewage pollution, transport network usage, electricity distribution, road and railway usage - and more. They’ve also mixed in data from the Bureau of Meteorology and the Bureau of Statistics.
Airtasker is a small start-up whose website aims to connect those with odd jobs that need doing with those that will do them for a fee. Lui says that for Airtasker, big data is about “collecting as much information as possible … from our users and from our platform. We use all our data sources and try to link them all together.”
Q. What real-world insights have you gained from your analyses?
Airtasker examined the impact of several variables - including server performance and specific website features - on conversion rates on its website and made some changes to get some interesting improvements.
One such change was the introduction of a bartering system that allowed users to negotiate the price for a given task, instead of having only static prices for tasks.
“Overnight it increased our conversions by 30%, and it sustained,” Lui said.
Perez’s unit at the University of Wollongong is not aiming so much to glean insights of its own, but rather to provide big data tools to third parties in Australia like local councils looking to plan for the future in their regions. To this end they’ve incorporated business intelligence tools from vendor Yellowfin into their system.
RP Data, which uses services from Bridge Point, has used its analyses to help inform the pricing of its products. “What we try and do through that mining of user activity is to get a good understanding of ... are we charging appropriately? Are our services presenting appropriate value for our customers?” Evans says.
Big data has also given the company a better idea of what types of properties are seeing the most interest. A popular traditional method of assessing the market - examining auction clearance rates - is limited, Evans says, as it relies on a small sample set.
With big data, the company examines data on property sales and analyses factors like discount rate (the difference between a property’s listing price and what actually sells for), time on market and the number of properties on the market.
For example, in Sydney RP Data has seen “a trend towards much less discounting - so vendors are getting pretty much what they’re asking for the property - and the time on market has dropped dramatically”.
Evans says that with the assembly of that sort of information, “we can actually see how the market is trending. What we’re seeing in Sydney at the moment is that it’s had a great run, but it’s starting to show a little bit of tiredness in that run.”
Q. How will the ethical and privacy concerns around big data play out?
One concern around big data is that it may harm personal privacy. There’s one particular example that gets trotted out pretty frequently to illustrate this worry (including in Technology Decisions Feb/Mar - so it may sound familiar to regular readers).
According to a 2012 New York Times article, retailer Target, using big data techniques designed to discover which of its customers was actually a pregnant woman, accidentally revealed to the parents of a teenage girl in the US that the teen was pregnant - before the girl told her folks.
But while this example concerns some, Bertram says that when the last four times he’s presented that story to a room full of people and asked them if it’s crossed a ‘creepy line’, only 50% said “yes”. Some people simply consider it an extension of profile-based purchasing incentivisation that organisations have been trying to do for years, he says.
Bertram says that the public will be somewhat tolerant of governments that cross that line for security purposes, because most citizens are willing to make some concessions where their personal safety is concerned. But they won’t be so forgiving with commercial organisations.
“In a commercial world your brand is your livelihood. If you cross that creepy line to a point where you will get consumer backlash, then your brand and your products go down the toilet,” Bertram says.
He says that we’ll see more examples of creepy behaviour from organisations. In fact, he predicts that a major brand will cross that line so savagely that the public rejects them outright. “I don’t know which brand, but someone’s going to step over that line and that brand is going to crumble.”
Perez says he and his cohorts have “tried to avoid the issue from the start”, deciding early on to stay within safe boundaries by focusing data at the ABS’s Statistical Area Level 1 (SA1) - some level of abstraction away from person-specific data, with one ‘SA1’ covering 200-800 people.
Evans points to the impending changes to the Privacy Act, due next year, but says by the time they are made law, they will be out of date. The law simply can’t keep up with the technology, he says.
Given this legal lethargy, Evans says, consumers can’t rely on legislation to protect their privacy - they need to educate themselves.
He also says: “There’s a responsibility for people like myself, as custodians of data, to use the data in a way that a consumer would be happy with, and accepting of. There’s always going to be an opportunity to outpace the legislation. But you have a moral obligation to do what’s right for consumers.”
Evans’s company has a data review board that examines any proposed new ways of using data and judges if it’s appropriate or not. Along similar lines, Bertram says that “organisations need their own policies and guidelines” and that “more companies will put more ethical policies and guidelines in place”.
Looking outside of the organisation, Evans says, “there needs to be best practice standards and [industry] bodies that encourage appropriate behaviour”. Then, consumers should demand companies associate with those bodies and follow the standards.
Despite these concerns, Bertram, Evans and Lui all emphasise that good can come out of these big data techniques.
Bertram hypothesises about a truck driver that wears a medical tattoo monitoring his vitals. “If I’m am employer,” he says, “should I monitor that person so that if his vital signs show that he’s about to have a heart attack, I can do something about it, so I can stop him from having an accident and killing a family driving up the Pacific Highway?
“Or am I crossing the creepy line in that case, because I’m collecting his vital signs?” Bertram asks.
Is the Australian tech skills gap a myth?
As Australia navigates this shift towards a skills-based economy, addressing the learning gap...
How 'pre-mortem' analysis can support successful IT deployments
As IT projects become more complex, the adoption of pre-mortem analysis should be a standard...
The key to navigating the data privacy dilemma
Feeding personal and sensitive consumer data into AI models presents a privacy challenge.