Have you got a 'big data problem'?


By Hugh Rogers*
Tuesday, 10 September, 2013


Have you got a 'big data problem'?

Many organisations can be overwhelmed by the challenge of understanding their own data - they have the data and they know that buried within it is a wealth of information that could be used to gain better insight and drive better business outcomes. The challenge varies by organisation, but it is essentially the same: traditional information management approaches struggle to deal with the complexity of modern data.

The problem of Volume, Velocity and Variety was first identified by META Group (now Gartner) analyst Doug Laney in 2001 and this has become the catchcry of big data, often referred to as the 3 Vs. Latter-day writers have played with the concept of the Vs to add other features, such as Value and Veracity. A problem does not have to be defined by all the Vs; in the real world, any one of them is enough to give an organisation a ‘big data problem’.

The big data problem is not the exclusive preserve of a large organisation. Many mid-sized organisations can be faced with the same problem too. A large enterprise might be challenged by the need to handle petabytes of data, while for a mid-sized organisation 50 terabytes of data can well be beyond their capacity to deal with, at a cost they can afford, using traditional technologies. In other words, it’s all relative.

A lot of the confusion and debate about the merit of big data is due to the ubiquitous nature of the term. There are discussions about how many Vs make a big data problem. There are different technologies: appliances, Hadoop-based platforms and NoSQL to name three of the more common approaches - each has its champions and its critics; the likelihood is more technologies will emerge in the future.

But that debate doesn’t invalidate the concept. Think of the predecessor to big data: the relational database; what is it? Anything from a few small tables in an Access Database for handling mailing lists to a large DB2 operational database running on an IBM mainframe to an Oracle Data Warehouse running on Unix. Relational databases can be highly normalised or more denormalised to enhance response times. Relational databases can store very structured, well-defined data or can hold textual data and blobs such as documents or pictures.

Big data is no more ubiquitous than the relational database, we just lack 25 years of experience with it to intuitively understand what it is.

Big data as a concept has been around since at least 1997 when two researchers at NASA wrote: “Visualisation provides an interesting challenge for computer systems: data sets are generally quite large, taxing the capacities of main memory, local disk and even remote disk. We call this the problem of big data.” Yet its newness as a concept in mainstream thinking helps to explain why it seems to generate equal amounts of interest and confusion.

In our view, we should worry less about the theoretical discussions about what is big data and focus more on the practicalities of how it can deliver value to an organisation. The question should not be “What is big data?” The question should be “What problems can big data help solve?” Big data should become part of an organisation’s technical arsenal and can be used when the problem is understood and can provide the right solution.

One of the most commonly cited use cases for big data is gaining insight into the customer’s digital experience. Examples include analysing an organisation’s weblogs to understand how people interact with its website. Sentiment analysis is also regularly mentioned is this context: “What are people saying in social media about the organisation, is it positive or negative?”

Another often cited use case is the ability of big data to help companies that have a lot of meter and monitoring data that is problematic to deal with using typical relational database technologies, both because of the amount of data (volume) and the regularity with which it arrives (velocity); while utility companies are the most obvious candidates for such a solution, there are also some interesting case studies of how it has been applied in scientific research and in hospitals.

However, our experience is that if you start with the problem, there are many other possible applications. We have worked with an organisation that was in the process of reengineering its analytical platform to cope with a new business model, and while it wanted to avoid the expense of migrating its legacy data to the new paradigm, it still wanted access to the older data for historical analysis. Our solution was to use a Hadoop-based big data platform to archive the historical data but still make it available to analysts.

In another scenario, an organisation had more data than it knew what to do with, but believed that one day the data would be valuable. Our answer was to use a big data platform to store all the data, and to only worry about making sense of the bits of data as it became useful.

The point of these examples is that if big data is seen as part of the data toolkit, then it can sometimes be useful in ways that might not have been expected, not because of the need to find a use for big data, but because it is available to use when appropriate when the problem is understood.

In 25 years big data will be old hat. By then we will understand what it is as well as we understand what a relational database is today. Just as very few relational database practitioners today have ever heard of CJ Date, whose theory underpinned the first relational databases, so in 25 years very few people will be worried about “What is big data?” Instead, organisations will be using it to solve a myriad of challenges we can’t even predict today and driving real business value from it.

*Hugh Rogers is Practice Manager, Information Management, at Oakton.

Related Articles

Is the Australian tech skills gap a myth?

As Australia navigates this shift towards a skills-based economy, addressing the learning gap...

How 'pre-mortem' analysis can support successful IT deployments

As IT projects become more complex, the adoption of pre-mortem analysis should be a standard...

The key to navigating the data privacy dilemma

Feeding personal and sensitive consumer data into AI models presents a privacy challenge.


  • All content Copyright © 2024 Westwick-Farrow Pty Ltd