30 May, 2014
Big Data Landscape CategoryAnalytics and Visualization
Employees41 (Management Committee)
Hadoop is an open source software framework by the Apache Software Foundation that helps facilitate data analysis for structured and unstructured databases. Hadoop can perform its functions across distributed computer networks—meaning that data can be moved to and from separate applications, helping to integrate the data analysis process and guarding against data loss.
Hadoop software uses MapReduce, a Google-produced model for programming that processes large sets of data by dividing applications into smaller work sets that can be run in any distinct node (computer or other device in a network) in a cluster (networked group of such devices). There is some disagreement over where and how MapReduce began to proliferate the data processing space, but most point to a paper from Google titled MapReduce: Simplified Data Processing on Large Clusters, written by Jeffrey Dean and Sanjay Ghemawat. The release of this paper is seen by many as the springboard for MapReduce’s massive growth, as well as Hadoop’s creation.
Not long after the release of this paper, a software engineer named Doug Cutting saw the potential that MapReduce had for increasing the scale of work that could be done by data analysis software on the web, particularly for two important projects he had been working on—Nutch, a spider or crawler, and Lucene, a search indexer. Along with Mike Cafarella, Cutting created Hadoop in 2005, naming the framework after his son’s toy elephant. In 2006, Cutting joined Yahoo!, and the company soon adopted Hadoop.
Since 2005, Hadoop has been used as a data analysis solution for thousands of enterprises. Those numbers have grown with the explosion of digital business intelligence technology, as companies have been established with the sole purpose of producing business intelligence applications for Hadoop. The software that these companies create act as an application layer between Hadoop and the end user, expanding the number of business intelligence tools available through Hadoop and making Hadoop functionality more accessible to engineers and lay-users alike. In 2009, Doug Cutting left Yahoo! for just such a company—Cloudera—and now works producing applications that run over Hadoop.