Hadoop, the same software that lies at the heart of successful companies such as Google, Facebook, Yahoo!, and others, has been proven time and again with said companies to be a successful data management server, keeping data secure and fault-free spread across multiple servers. It isn’t the easiest piece of software to configure, however, which is why the Cloudera company has just announced a freely downloadable and easier to use custom distribution of Hadoop to bring the power of entities like Google to smaller businesses.The Cloudera company first began in October of 2008, offering paid support for Apache’s Hadoop, the software that manages big data stored on up to thousands of servers. They’ve already delivered training, selling, and consulting services and support contracts to varied industries including biotech, finance, advertising, mobile, and research. Now Cloudera is offering a free distribution of Hadoop that’s more easily deployed and configured, centered on helping smaller businesses with smaller IT staff to be able to run Hadoop with the efficiency and potency that larger companies possess.
Cloudera has its Hadoop distribution available for free as an RPM package downloadable for Red Had Linux distributions or as an
image. The RPM can be downloaded at a newly launched website that enables a user to utilize a web-based configuration tool for the distribution. The configuration wizard includes varied questions in plain language that, when completed, generates a customized configuration file that composes the 300 or so settings Hadoop requires, making the set-up process much easier. The RPM is available at the end of the six-step process, and the configuration files will be saved on the website for future use and alterations, if needed. The custom package creator is free of charge as is the download, all of it being under the same Apache Software License v2.0 that the core Hadoop project is.
The company is also making available free, preconfigured VMware images to run on Linux, Windows, or Mac machines. The image includes example code and all components needed to use the Cloudera Distribution for Hadoop, including a master server and single node.
According to Cloudera, Hadoop is generally deployed on single- or multi-CPU systems with at least two cores per CPU, that have at least 16 GB or more of RAM, and a couple of terabytes of hard drive space. However, the specifications for optimal performance depends more on the data being managed and the analyses needed to be run rather than on Hadoop itself. As it is, Hadoop runs well on older hardware; users can build smaller clusters of more powerful machines or larger clusters of older, less powerful ones.
Though there currently aren’t any projects to port Hadoop for other systems, Cloudera CEO Mike Olson says that that could be a possibility in the future.
…it depends on what our customers tell us they want. The Hadoop community has a home at the Apache Software Foundation, but the community’s not the Foundation. It includes committers at companies like Facebook, Yahoo!, Microsoft, Cloudera and others. The community will continue to innovate and enhance the system, and that may mean new ports.
In addition to Cloudera’s free services and free Hadoop distribution, they also offer free community support and paid professional support and training. Basic training is also free.
Olson explained that Cloudera intends to bring the data technologies of successful companies like Google and Facebook to those many smaller ones throughout the globe:
Smaller companies generally have smaller IT staff, and there aren’t many companies that can match Google for the number of PhDs employed. Our aim in making Hadoop easy to acquire, install, configure and operate is to make it much easier for smaller, more traditional companies to use. We don’t think that Hadoop is a tool for big companies. Instead, it’s a powerful data analytics engine for companies that have big data problems. There are many, many enterprises that collect or generate more than a terabyte of unstructured data… Hadoop’s extraordinarily well-suited for processing information like that inexpensively and reliably. We’re convinced that there’s a great opportunity to bring the power of Google to companies worldwide.
It can’t be said better than those up top at Cloudera in their video below:
Correction to summary, Google does not use hadoop in its infrastructure. Only to teach the mapreduce concepts in partnership with IBM