Does your big data strategy deliver?
Three years ago, concepts like BYOD, big data and the cloud were barely on the CIO’s roadmap. Today, they dominate discussions whenever technology is mentioned in the C-suite. Of those, big data is the one that can potentially have the most impact on your company’s bottom line.
There’s an old Chinese curse that says “May you live in interesting times”. If you’re the person responsible for developing and delivering your business’s technology strategy, you probably feel that doing this is like trying to fly a kite in a hurricane - you know you have the right equipment but the environment has changed so radically that most of your assumptions about how it will all work have been shattered.
In our view, big data is really about three technical elements. It’s about large volumes of data - many millions or even billions of records. It’s about unstructured data - not your relatively easy to manage databases but documents, social media, video and other non-database data. And it’s about multiple data sources - the data isn’t sitting in one place but is spread across multiple repositories.
Without all three of these all you’re really dealing with is data. However, it’s our view that the term ‘big data’ is just a transient one that will disappear in a year or two. All we’ll be talking about then is plain, old data.
For many businesses, their ongoing success will be measured by their capacity to effectively use big data. According to Gartner’s Senior Vice-President for Research Peter Sondergaard, “Leading businesses of the future will be judged by the strength of their predictive algorithms.” Think about that - your capacity to see what is both within and outside around your business, analyse that information and then use it to make better decisions will be a key determinant of your business’s longevity.
Strategic data vs operational data
Decision-making processes fall into two groups - operational and strategic. The systems you put in place for each of these requires different data, different analysis tools and deliver very different outcomes. However, they do share a common goal.
Think about when you’re driving your car and want to make a turn at an intersection. In order to make the turn safely your brain processes hundreds of bits of data in a few microseconds and comes up with a plan - not just a single decision but a series of related outcomes - for how to make the turn safely and within the confines of the traffic laws. This is really a big data problem. We take lots of data and use it to make decisions. The challenge of big data is to create systems that are able to make human-like decisions in human-like time frames.
The systems we put in place to manage operational data need to be fast enough to allow us to make decisions that support the business without causing surprises. These are more rule-driven and are designed to detect and arrest unwanted business outcomes.
The Australian electricity grid is designed to operate at a frequency of 50 Hz. Over 50,000 data points are collected every few seconds in order to monitor this. If the frequency either increases of decreases outside a defined tolerance level, systems are automatically activated to arrest the frequency change within a few seconds and bring it back into the acceptable range in minutes.
With long-term planning for the power system, a different approach is used. Data from numerous sources is collated, processed and analysed to deliver a 20-year view of the power system. While the velocity of incoming data is less of a factor, there are dozens of data sources, high volumes of data and the structure varies significantly with the sources. Although the requirements and outcomes are different, this is no less a big data issue than the operational data.
Skills
Every analyst and expert we spoke to gave the same answer when we asked them what the biggest obstacle to successfully implementing a big data strategy was. While the volume of data continues to grow, the requisite skills for managing and analysing that data are in greater demand.
According to Gartner analyst Peter Sondergaard, “By 2015, 4.4 million IT jobs globally will be created to support big data.” That demand is being met by the market. Of those jobs, about 1 million will be in the Asia-Pacific region, according to Gartner’s research.
But what are those skills? Is it simply a case of finding more DBAs and business analysts? Or do we need so something new?
The role many analysts point to is that of the ‘data scientist’. What’s interesting is that this is often considered a new discipline that requires specific qualifications and experience. However, our view is that this is really an evolution of existing skills rather than something completely new.
In a recent report on technology trends for 2012, Deloitte said, as did almost every other analyst organisation, that a shortage of people with the right skills to turn big data into a valuable business tool was imminent. However, Harvey Lewis of Deloitte says, “The skills needed are not just statistics and mathematics, but in being able to align the data with the business.”
So, we are back to the same challenge IT has faced for the last three decades - the problem is not a technical one. It remains the classic issue of business and IT alignment.
The best way, in our view, to address this upcoming skills deficit is to manage it from within the business. In our experience and through observation of many different organisations, it’s clear that if you understand the data you understand the business. By creating cross-functional teams with technical skills from the IT department and skills from business units, you can build the skills for a successful big data strategy.
The technical skills are needed to support the infrastructure and to operate the systems will come from IT, but the ability to look into the data and ask the right questions will come from the business.
Big data tools
Every major software vendor developing database and analytic tools is a player in the big data business. But like so many other enterprise tools, it’s the cloud that is grabbing attention.
Perhaps the most significant cloud tool impacting big data is Apache Hadoop. Named after a toy elephant that belonged to the original developer’s son, Hadoop delivers a scalable system for data storage and applications that runs on relatively inexpensive hardware. The platform has been adopted by Microsoft as part of its cloud-based Azure platform.
Although any mention of a choice between SQLServer, Oracle, MySQL or some other database platform is likely to launch a technical religious war, the reality is that every major database platform has been improving flexibility and performance for the last decade. We think the real changes have been coming from improvements in the way storage is designed.
Traditional spinning drives still dominate the storage market. Interfaces are faster and spin speeds have increased but the fundamental technology is no longer keeping up with the rapid growth in volume - a nine-fold increase according to recent research from IDC - and our desire to carry out real-time analysis.
The hardware we depend on for storage and data access is evolving and is, we believe, at a transition. Flash memory-based systems are starting to become more prevalent but the cost is still too high for most businesses to consider as a full replacement for spinning drives.
What we’re now seeing is hybrid storage appliances, either deployed locally or at cloud-based service providers, that combine flash storage with spinning drives. Every major vendor is now delivering hardware that balances the low cost of spinning drives with the performance of flash memory. Smart caching moves data from the spinning media to flash media based on the demand that the data is subject to. This goes even further with faster systems doing everything in RAM. When data is no longer in heavy use, it moves to hard drives or other, slower media.
The challenge here is that these new ways of managing storage are quite different. The technology is less mature than traditional storage and may require a new approach to the rest of your infrastructure and operations. Processing and data transfer bottlenecks are going to move. Again, it’s critical to develop new skills in your system architects.
Management engagement
We’re going to make a bet with you. Other than the CIO, we don’t think anyone in the C-suite or boardroom is talking about big data. Were certain that they are talking about performance and KPIs, strategic planning and tracking against the plan, and going beyond gut feel and looking for real data about how business units are tracking. But they won’t care about big data.
We’re not even sure that the business cares about data specifically. Their attention is on information that can be used to drive decisions.
In order to get the business engaged in a ‘big data project’, the focus can’t be on the technology. It must be on the business outcomes. In our view, the first step along that path starts with an analysis of how the business uses the data it has.
What reports are being produced? Which of those are being read? What data is being accessed the most. How is it being used? Are there lots of satellite systems in the business? For example, is a lot of the reporting to management and the board being done via spreadsheets? If that’s the case, are they all carrying out calculations and manipulations of the data in the same way or are different business units using the same pool of data in different ways?
In order to deliver a successful big data project, you need to start with the business. And don’t name it The Big Data Project - there’s no surer way to ensure that the business won’t be interested.
Before getting stuck into the technical solution, it is critical to come up with a business case. While it may sound like a good idea to start collecting and storing social media data and then looking for how negative and positive brand sentiment correlates with sales, there’s no point making that leap unless the business can use the information.
Final word
Big data is likely to appear on every card in a game of Buzzword Bingo. Along with BYOD, it’s probably the most hyped technology of the last year or so. However, we think the hype bubble will burst. That doesn’t mean big data is going away. It’s simply that we’ll all stop talking about big data as if it’s something special and simply get back to calling it what it really is - data.
Is the Australian tech skills gap a myth?
As Australia navigates this shift towards a skills-based economy, addressing the learning gap...
How 'pre-mortem' analysis can support successful IT deployments
As IT projects become more complex, the adoption of pre-mortem analysis should be a standard...
The key to navigating the data privacy dilemma
Feeding personal and sensitive consumer data into AI models presents a privacy challenge.