Big-data and cloud computing have become the new chocolate and peanut butter: very good on their own -- but, wow, what a match if they could be brought together.
The problem is that even as the cost and capabilities benefits of cloud become more attractive to IT planners (as they seek to manage ever-exploding data sets), security and privacy concerns remain stubborn concerns. No cupcakes quite yet.
So what is the proper recipe to make the cloud a safe for your big-data workloads? The topic was a hot one at the recent Open Group winter conference in Newport Beach, Calif.
The convergence of big-data and the cloud means that the security "has to be done right," or the trend may never get off the ground, said Mary Ann Mezzapelle, security strategist for HP Enterprise Services, at the conference. (Disclosure: Both HP and The Open Group are sponsors of my BriefingsDirect podcasts.)
Speakers emphasized that the way a typical enterprise protects its data -- big or otherwise -- can't be easily reapplied to the cloud. That's because the big-data infrastructure inside of enterprises is not secure, it's just kept behind secure perimeters. The data infrastructure was not initially architected to be secure.
This is why corporate boards and governance leaders are now identifying big-data security and privacy as a top new risk.
What's more, when the move to the cloud is contemplated, it's not just the big-data itself that needs to be adequately protected, the analysis also needs to be secured, so that means the applications and business intelligence suites banging on the data need to be considered too.
So here's where the recipe gets tricky. Over-applying security can also backfire. Making security protection too onerous, complex, or costly means that analytics practitioners won't be able to scale to the volume of data they need to make it, well, big-data.
To reach the right balance -- enough security, not too much overhead, and huge volumes of data -- you need to understand why big-data is different, said Adrian Lane, analyst and CTO, Securosis, who also addressed the conference.
Lane's research at Securosis shows that big-data means lots of data nodes, distributed storage, the need for fast data insertion, a lot of parallelism, and often distributed management. Also, big-data installations are usually hardware agnostic (read: commodity hardware), with the systems designed for failure (read: easily replaceable parts). So, the infrastructure is both inexpensive and accessible. These are often environments ripe for open-source software, too, with a heavy dependence on clusters of Hadoop and NoSQL.
Big-data is different. It has been architectured differently, and securing it can't just be a re-run of, say, securing typical enterprise applications, or web server infrastructure, said Lane.
What's worse, so far -- most third-party security products amount to bolt-on access-control appliances. Yet this model doesn't scale well, and can easily grow very expensive and complex, given all the data nodes inherent in big-data installations, said Lane.
Until that market gets sorted out, Lane said, the best tools are the security benefits already built into cloud environments, including deployment validation tools and logging. File-layer encryption with good key methods is also a good tool for big-data, said Lane. He advises a liberal use of other built-in security tools like Kerberos to validate nodes and clients.
The bottom line is that you need to choose your cloud provider with an eye to its security records and ongoing advancements. And you want a cloud provider that takes a lifecycle approach to security processes, with deep risk management methods in place, said Mezzapelle.
The experts said that context-aware security is also more important than ever, and that it’s essential to have ongoing training and documented policies and methods for all those who can access the big-data and analysis applications.
So, while many enterprises would like to have their cake and eat it too when it comes to doing big-data activities in the cloud, this is a time for cautious adoption as the risks of big-data in the cloud become fully understood and mitigated.