7 Myths on Big Data

1. Big Data is purely about volume—NOT TRUE

Besides volume, several industry leaders have also touted variety, variability, velocity, and value.

2. Traditional SQL doesn’t work with Hadoop—NOT TRUE

This is how Hive, Pig, and Sqoop were ultimately hatched. Given that so much data on earth is managed through SQL, many companies and projects are offering ways to address the compatibility of Hadoop and SQL.

3. Kill the Mainframe! Hadoop is the only the new IT data platform—NOT TRUE

While the mainframe isn’t being buried by companies, it definitely needs a new strategy to grow new legs and expand on the value of it’s existing investment. For many of our customers that run into issues with mainframe speed, scale, or cost, there are incremental ways to evolve the big iron data platform and actually get more use out of it.

4. Virtualized Hadoop takes a performance hit—NOT TRUE

Major Hadoop distributions from MapR, Hortonworks, Cloudera, and Greenplum all support Project Serengeti and Hadoop Virtualization Extensions (HVE) for this reason.

5. Hadoop only works in your data center—NOT TRUE

There are SaaS-based, cloud solutions, like Cetas, that allow you to run Hadoop, SQL, and real-time analytics in the cloud without investing the time and money it takes do build a large project inside your data center.

6. Hadoop doesn’t make financial sense to virtualize—NOT TRUE

Hadoop is typically explained as running on a bank of commodity servers—so, one might conclude that adding a virtualization layer adds extra cost but no extra value.

7. Hadoop doesn’t work on SAN or NAS—NOT TRUE

Hadoop runs on local disks, but it can also run well in a shared SAN environment for small to medium sized clusters with different cost and performance characteristics.