Big data skills to pay the big data bills
Machine-generated data is beginning to outstrip human-generated data, and the availability of skilled big data experts to analyse it is not keeping up.
For some time now, analysts have reported that Australian organisations are having problems getting big data projects off the ground because they can’t find staff with the skills that these big data projects require.
In February this year, Deloitte spoke of the “the skills gap left by underinvestment in data management capabilities”.
“Australian universities are doing their part [in developing data management skills] but simply do not have the resources to meet the scale of the challenge,” Deloitte consulting partner Tim Nugent said at the time.
“In the current skills landscape, every company that is serious about using big data needs a plan to develop and access external data talent. Without this type of open engagement, Australian industry risks an encounter with critical shortages in data skills,” Nugent said.
Seven months on, the story seemed the same. In September, Gartner released the results of its annual big data survey in a report titled ‘Survey Analysis: Big Data Investment Grows but Deployments Remain Scarce in 2014’.
According to Gartner’s figures, almost a third (31%) of organisations that had already invested in big data reported that “obtaining skills and capabilities needed” was one of their top challenges. A similar proportion (36%) of organisations that were in the big data planning phase also reported obtaining the required skills as one of their top problems.
In fact, Gartner found that skills and capabilities is the third biggest challenge organisations reported having with big data - and the problem has increased from last year, according to Gartner Managing VP Ian Bertram. “Skills are still an issue for them - and that’s organisational skills and IT skills.”
Kyle Evans, chief data officer at property information services business RP Data, agrees: “I think there is a skills shortage. We’ve already got a shortage, and the problem is going to get worse before it gets better.”
Evans pointed to the idea that the amount of machine-generated data - “things like web logs, sensor data, database logs” - will soon outstrip human-generated data, and after that point, that gap will continue to grow.
“More data - more work to be done, more analyses to be done. The production of data is going to put even more and more demand on the skill set that can manage and harness the data,” and that skill set is already scarce, Evans said.
A brief history of big data
Dr Mukund Deshpande - who leads the analytics practice at Persistent Systems - said that to understand the reasons for the big data skills shortage, you need to look at the history of the technology.
Most computing technologies on the market were developed by vendors to be sold as products to customers, he said. “But big data is special. Big data technologies came out of the e-commerce or the web world.”
The companies that developed them - like Google, Facebook, LinkedIn and Yahoo - “were not in the business of building software. They were offering digital services to the customer.”
Some of the technologies these businesses relied on were just not adequate for the jobs that were being asked of them. On top of that, the companies were offering their services at a very, very low cost, so they were forced to innovate, Deshpande said. Thus, they created the technologies behind what we call big data.
These companies were not in the software business, so they made their creations open source. As a result, “What you have is a technology … in these large organisations working 24/7 in extremely demanding conditions, and it’s available for free, so everybody thinks: ‘wow that’s great, I have a similar problem, can I just use the technology in an enterprise?’
“What people forget is that this technology was built and run by folks who were technologists and computer scientists” who built the technology to suit their needs. “They really didn’t care about building it for a typical IT organisation. And that is the reason why you see that these technologies - even though they are available, and they are powerful - that there is a huge gap in making them usable by average IT professionals.
“A lot of companies are already using this and employing a lot of these people,” Deshpande said. So if you’re not one of these companies, bringing those skills into your organisation may be very difficult indeed.
Not just tech skills
But the skills shortage goes behind mere tech skills, according to Evans, who said that on top of being able to “program in whatever language or use whatever tool is required”, analysts “also need to have the commercial acumen and be able to engage with internal stakeholders. The combination of those three things is a very, very hard thing to find.
“It can be found,” Evans said, but often you’ll “pay a lot of money for them, and sometimes your business case doesn’t stack up”.
With technical skills, “anyone who is clever technically can learn to work in the environment you need them to work in. But it’s very hard to teach the business skills and the communication skills,” he said.
Bertram has a similar perspective on combinations of skills.
“I don’t think we’ve got a shortage in technology skills,” Bertram said. “The big shortage is the combination of business and technology skills. That’s where some of the big data analysts - or the big data business analysts, or the data scientists - that’s where they’re coming from.
“You don’t have to have a double PhD to be a data scientist. You really have to have a good understanding of business problems, a good way of being creative and expressing those problems and a good understanding of the data itself. So it’s a combination of the three,” he said.
An industry-level approach
Evans thinks ‘sandwich course’ workplace placement programs - which involve students “going out from the university and spending time with a commercial organisation and actually seeing how their technical skills translate into the real world and start to deliver real world solutions” - can help address skills gaps.
His organisation, RP Data, is affiliated with Sydney’s Macquarie University.
“Within their management graduate school they have a program where PhD and Honours students can come in on a rotational basis. We have an agreement in place with them where we’ll actually receive some of their students. That’s one of the ways that we’re looking to address this issue,” Evans said.
However, he did note that universities are lagging when it comes to offering data science degrees.
“As much as the industry has moved to a need for data scientists, you can’t actually go and do that degree just yet.”
So the answer lies in “a combination of these things - doing the sandwich courses, and having commercial businesses aligning with educational institutions to make sure that you can help craft people as they go through their process of becoming educated”.
And while certification programs are a feature of some spaces in IT - like those from networking and OS vendors and industry groups - Bertram doesn’t think this will happen in the big data space.
This is, in part, because “big data is heading towards the drop of disillusionment in [Gartner’s] hype cycle - it will likely in the next 12 months hit that drop”.
This means that “the likelihood of things being just ‘big data’ related is going to lessen, and we’re going to start to just talk about how do we manage all of the varied sources of data that are available to us”.
Having a great variety of sources means the handling of information is still complex, even though you may not have ‘big’ volumes of it. “We’re going to start to talk about just data,” Bertram said.
As such, “I think we’re going to continue to invest in statistical analysis - but what we’re going to start to see is more rounded courses being offered from a base level. Business acumen, business management, philosophical skills, artistic skills, communication skills, along with some of those IT skills,” he said.
What you can do
On a more local, immediate level, if you’re struggling to find skilled data staff, Bertram recommends searching your own company. “There are already many of these people within organisations - we just call them something different. We call them a financial analyst, a business analyst, a data analyst.”
These staff already have a good deal of the skills required for big data work. “We just need to supplement and continue to invest in their development” by identifying the gaps in their skill sets and investing in filling those gaps, he said.
“Look at the people that you’ve got in your organisations today, and identify where some of those gaps are across the multiple characteristics that you’re looking for. You’re looking for: communications skills, business acumen, business process skills, technology skills, statistical skills,” he said.
If you can define the characteristics you’re looking for and develop a gap analysis, you may find that by investing in one area, you don’t need to recruit new staff.
“Alternatively, you can try and fill those gaps by different sourcing methodologies. Many of the vendors today offer some of those skills as a practice,” Bertram said.
Evans has a similar approach to upskilling. RP Data focuses on identifying junior staff with potential, and on developing their skills to a level the business desires.
“As we take in graduates, we look to try and focus on people that we think can get to the level we want. They obviously don’t have to be there at [recruitment], but they need to have, more than anything, a passion for the data. You want to see people that go, yes, I’m hungry and I want to work hard and get to the data,” Evans said.
Your ideal candidates have business skills - they “have a sense of what commercial outputs you can get from data” - as well as communications skills.
“The technical skills are important - you need to at least prove that you know how to do things technically - but those other skill sets are more important. As we go through our recruitment process, we have a technical test: go and see if you can program in this language or that language or whatever the requirement might be.”
But Evans actually places more emphasis on scenario testing, which he described as a way to get candidates to demonstrate how they think.
“You might say to somebody: we’re a property information services business. What model do you think would be good to build for a business like ours? Who would you sell that to, and why do you think that’s of value to them? They are the sort of questions you try and explore,” he said.
Evans also uses scenarios that are more abstract, in order to get an understanding of candidates’ critical thinking skills. For example, he may ask how many light bulbs there are in Australia.
“You know what? I’ve got no idea what the answer to that is. But I’d want someone to go: well there’s 23 million population, in each average room there’s whatever, and extrapolate out, just to show some level of rational and considered thinking to a seemingly impossible question. It’s not about getting the right answer - it’s about having an approach.”
Managerial misunderstanding
And while there’s a lack of big data skills on the ground - in the IT or analyst staff - there’s also a lack of big data capabilities in management. This can manifest in misunderstanding what big data actually is, or how to get the most out of a big data project.
“When everyone’s on the same page of what big data actually is, then you tend to have a better understanding of what then potentially the outcomes should be. If the executives think big data is one thing and the IT staff thinks it’s another thing, that just creates the conflict,” Bertram said.
Evans agreed that there’s frequently a misunderstanding about big data at a senior executive level. He said that quite often, big data is the answer, “but no-one really understands what the question is yet”.
“And so there’s this perception that ‘we’ve got to be good at it, we’ve got to have all the answers, and we’ve got all this data, let’s try and get the answers’.” But you must be aware of what business question you’re trying to answer with big data, Evans said.
Well done examples of big data projects involve “an articulation of the problem you’re trying to solve, data that you think can help support that and then a process to actually deliver that”, he said.
Why the information lifecycle will be vital to data privacy in 2025
Data accessibility, accountability, confidentiality and integrity are becoming increasingly...
You can't win the AI game without a playmaker captain
Kubernetes and containers promise to bring cohesion to the otherwise complex world of modern apps.
Fixing the cybersecurity skills gap in Australia
Industry needs to mend the broken pathway from cybersecurity education to employment.