Beyond the big data hype curve


Tuesday, 18 December, 2012


Beyond the big data hype curve

Big data is here and IT leaders are wondering how to successfully pull together the technical and business elements to deliver a coherent strategy that delivers value. Anthony Caruana spoke with a panel of experts to find out how to deliver big data success.

How do you know if your company needs to have a big data strategy?

In Gartlan’s view, “Companies of all sizes should consider implementing a big data strategy if they want to stay competitive in today’s fast-paced environment. IDC forecasts that this year alone we’ll be generating 2.7 zetabytes of data, as the adoption of social mediums and cloud-based tools continue to grow at a fast pace.”

Adrian suggests that if you have problems with your data handling today that you may have a big data problem. He says, “If you’ve got data-related problems that aren’t being adequately handled with the existing technology stack that need something more than more, more than just different, more than more of the same, you may have a big data problem.”

Interestingly, Gartlan added that everyone will have a big data problem. “Given the explosion of data over the last couple of years, both inside the company - everything is now being monitored, tracked and measured - and outside the company, all companies have a big data problem. But do all companies choose to address it with a big data or even an overall data strategy?”

One of the challenges that many businesses face is understanding what big data is. Adrian says, “What’s different about this is volume that exceeds system capacity, variety that requires new tools and new approaches, velocity that is highly variable and isn’t easily accommodated and typically complexity because of the differences.”

In Foster’s view, it also comes down to the use of the data and ability to use it to make decisions. “It is important to note that ‘big’ is not an absolute measure - if you have potential access to data that you are not leveraging to make better business decisions, then you need a strategy for big data.”

What is the biggest obstacle to successfully implementing a big data strategy?

“The biggest obstacle is not technical - there are big data tools out there for any job. This is part of the challenge - organisations need to understand the problem they are trying to solve and the business benefits prior to jumping on a particular technology bandwagon,” according to Rabie.

While tools are important, people are perhaps even more critical. Both Foster and Adrian agreed when asked about obstacles. “Skills, by far. Most people simply don’t have the skills today for what we describe as big data. Either because it’s data we have left alone for good reason - we didn’t know how to work with it.”

Foster added, “In a recent survey conducted by SAS in Australia and New Zealand, executives indicated that the top two challenges to successfully implementing a big data strategy were the lack of appropriate skills and access to quality data.”

Once the skills and tools issues are considered, it’s interesting to note Adrian’s observations about data that’s already in businesses but not used. “You need to begin with an information audit. Most organisations begin with a lot of ‘dark data’. It is sitting in systems because auditors tell us to keep it, the government tells us to keep it, or just because we always have.”

From that point, engagement is key - especially when it comes to getting the right people engaged. “We have found that the big data implementations that deliver the most value are aimed at decision makers and data consumers, rather than just a few data scientists. People extract value from data, so ensuring mass distribution of your data will make your big data asset more valuable. Find a way to make big data available and useful for everyone,” said Rabie.

Are there technical limitations to creating a big data infrastructure?

Every member of our panel agreed that existing approaches to managing data need to be revisited in the big data era.

“Generally, new infrastructure is required to implement a big data infrastructure and companies need to ask themselves, Will this investment generate the results required by the business in the required time frame? Or can I leverage additional cloud-based infrastructure to assist my business?” said Foster.

When we delved into this question a little further, it was clear that there was no one-size-fits-all solution. Rabie explained, “Your typical organisation’s big data issues are not the same as a large telco, which in turn doesn’t compare to internet giants like Facebook or Google. Every big data tool has its own limitations, but if you are using the right tool for the job, your only limitations are financial.”

According to Adrian, “People walk in the door and say ‘we need the software stack and hardware platform. And, by the way, it’s probably going to be different to the hardware platforms I’m running today’.”

Gartlan said, “Generally, new infrastructure is required to implement a big data infrastructure and companies need to ask themselves, Will this investment generate the results required by the business in the required time frame? Or can I leverage additional cloud-based infrastructure to assist my business?”

So, as is always the case, a business case needs to be established and followed through so that a return on what can be a significant investment can be realised.

The deployment options vary. “The most frequently deployed option is node-by-node scale-out. This has been the way many in-house solutions have been developed. As the need arose, more servers were purchased. Except for those do it in the cloud. Amazon started two million clusters last year. Even if those we just ‘Hello World’, there’s still a lot of work being done and there wasn’t any capex - people didn’t have to buy computers or software,” added Adrian.

Foster’s view was that perhaps the future would include three different approaches that work with what he calls ‘high-performance analytics’.

He advocates an approach that combines a number of different capabilities to help get past traditional limitations. These include grid computing that leverages commodity infrastructure in a horizontally scalable architecture to support growing ranks of analytical users and data within an organisation. In-database processing pushes more advanced analytical processes down to the database engine, which means less data movement, increased data governance and better performance. These are married with in-memory analytics for running analytical algorithms completely in-memory, scaling across commodity infrastructure, resulting in significant performance gains and the ability to analyse much larger data volumes.

What do I need to do to the rest of my IT infrastructure ready for big data?

So, what’s it take to get your business ready for a big data project. According to Adrian, it might be happening in isolation to the IT department.

“Big data projects often begin in isolation. Many of the biggest ones start and have nothing to do with IT. The people who built them would run as fast as the can in the other direction if they saw an IT person coming. And this is even some who have been successful enough to speak at conferences. It are often considered blockers rather than enablers.”

Gartlan suggested that you should “leverage the current infrastructure you already have in place, including existing enterprise applications - continue to get more out of those investments - and look to cloud-based business intelligence solutions which can assist in integrating additional data sources. Businesses need to focus on mining the data already within the organisation and look for the optimal way to combine it with both cloud-application data and unstructured data such as social media and web logs.”

Contrasting those views, it’s clear that a significant challenge is balancing what you expect from a big data project with not knowing exactly what you’re going to find. As Adrian puts it, “Typically, either the data source is an entirely new one or it is data that is fundamentally not being used today and it’s only downstream from it that we combine the results of the big data project with some other existing data.”

Foster took a more traditional approach suggesting that you must “consider the entire life cycle of big data in the context of infrastructure requirements”.

His suggested approach recommended that you work through acquiring, storing, cleansing and transformation, managing the information and then leveraging the information.

Rabie reiterated the importance of having the right skills and knowledge. “Make sure your team is familiar with the different big data solutions available. There are tools that specialise in storage, search, documents, images, video, social media, machine data, analytics and many others. Short-list the tools that best address your issues and do a thorough evaluation. In the big data world, choosing the right tools for the job usually means planning for a mixed architecture.”

Once the architecture and people issues are resolved, it’s critical to not set unrealistic expectations in the business. “That said, companies do need to be cautious of not underestimating the effort, resource and operational challenges of standing up and maintaining an infrastructure to support a Big Data project,” said Gartland.

Is there an optimal IT infrastructure model that will best support a big data project?

Most of our panellists answered us similarly, best summarised by Adrian. “Frankly you could take the ‘big’ out and ask the same question. The answer is ‘it depends’.”

Gartland suggested, “The optimal IT infrastructure varies from one company to the next; however, cloud-based business intelligence can provide a cost-effective and scalable solution which delivers faster results compared with building your big data infrastructure internally.”

Foster provided guidance based on his experience on what he considers some best practice approaches.

“Create a centralised, enterprise analytical platform. Many organisations have historically centralised their data platforms, but left analytical capabilities at the departmental or individual user level. By centralising the infrastructure to support analytical activities, performance can be improved, costs can be lowered and opportunities for collaboration within the business increased.

“Leverage your existing assets and platforms. If an enterprise data warehouse exists, implement in-database analytics to better utilise that capability.”

In contrast, Rabie took a more definitive approach, utilising the popular Hadoop platform, although he cautions against expecting Hadoop to deal with all your data needs. “For business intelligence environments on big data, the optimal model is to have your BI tool querying a fast analytical database. The analytical database pulls data from your operational data sources or data warehouse and ensures production is not affected by too many complex or concurrent queries. We are also starting to see more companies moving to tools such as Hadoop, which is an extremely scalable and flexible data store. Scalability, or volume, is a different big data issue to velocity, or speed, and there is a common misperception that Hadoop is also very good at ad hoc queries. In our experience, it is slow compared to analytical databases, and most analytical databases today have Hadoop connectors.”

Adrian supported this view. “There are different optimal structures for streaming data that needs to processed in real time than there are for massive amounts of historical data that need to be mined periodically for trend analysis. You’re going to configure those systems very differently to one another.  You’ll architect them differently, use different tools and deploy on different platforms.”

Image credit ©iStockphoto.com/loops7

Related Articles

Is the Australian tech skills gap a myth?

As Australia navigates this shift towards a skills-based economy, addressing the learning gap...

How 'pre-mortem' analysis can support successful IT deployments

As IT projects become more complex, the adoption of pre-mortem analysis should be a standard...

The key to navigating the data privacy dilemma

Feeding personal and sensitive consumer data into AI models presents a privacy challenge.


  • All content Copyright © 2024 Westwick-Farrow Pty Ltd