Six challenges for big data by Graham Oakes

Big data is about more than Hadoop and a bunch of fancy technology: there are some very real organisational barriers too.

It’s a bit of a mirage. As soon as you get your head around it, it ceases to exist.

How so? The accepted definition for Big Data talks about exploiting “data sets whose size is beyond the ability of commonly used tools to process it within tolerable time”. By that definition, as soon as you’re comfortably handling the data, it ceases to be big.

Nonetheless, Big Data is clearly trending amongst the tech analysts, and it’s doing so for good reasons. The volume of data we’re handling is growing dramatically, Social media, the internet of things.  The mass of data produced by smart electric grids, intelligent traffic systems, etc.

90% of the data ever created has been created in the last two years…

And yes, it’s not just about size. Gartner’s “3Vs” (Volume, Velocity, Variety) are all growing. We’re being asked to process data ever more quickly so we can respond to events as they happen, and that data is coming from an ever wider array of channels, sensors and formats.

Our data is fast and complex as well as big.

So let’s all go out and buy Hadoop, and our problems will be solved. Hurrah!

Not so fast. I can see at least six things that are going to get in the way of Big Data in the typical organisation:
1. Infrastructure
2. Applications
3. Skills
4. Attitude
5. Fragmentation
6. Valuation

Of those six challenges, the first two, infrastructure and applications, are fairly straightforward. The tools we need are (largely) there. We just need to learn how to use them and to fine-tune their economics.

It’s in the next two that the challenge lies: building multi-skilled teams with the right attitude. Right now, many Big Data projects are merely playing with the data, exploring the tools and shifting data around within its silos.

If we could build some stable, cross-functional teams and focus them on business-led experimentation, then we’d probably begin to find real value in the data we have stashed away. And along the way, we’d start to break down some of the silos that have grown around our data.