At the Hadoop Summit ‘09, the large auditorium at the Santa Clara Marriot was overflowing with people. There was huge enthusiasm for this open source software and its applications. Some key notes I got from this event:
The Hadoop ecosystem provides the framework to run big data applications using computer science techniques (distributed systems, algorithms, machine learning, …) faster, and at much lower costs. 
The underlying compute infrastructure is cloud computing, pioneered by Amazon, that offers low cost to run multiple computers, on an as-needed basis. The ease to scale quickly to thousands of computers makes cloud computing elastic. This elasticity shortens the time for innovations/problem solving by deployment scalable parallel computing resources. Hadoop simplies the programming of the large distributed system with the map-reduce programming model and a distributed file system.
In the 3 years since its inception, there is a large ecosystem around
Hadoop, including Pig, Hive, HBase, Zookeeper, Cascading, Mahout, Katta, … Cloudera has stepped in to provide a distribution to ease the deployment of Hadoop.
The range of problems that could be solved by using Hadoop is expanding: from gene sequence matching, to Large Hadron Collider data analysis, to matching potential mates, to SPAM filtering, to classifying and organizing photos, to machine understanding of human language, to analyzing large mobile call data. The practitioners in these diverse fields say by using Hadoop in a cloud infrastructure, they could solve their problems at much lower cost and faster– in hours/days rather than weeks.
Linkedin (at ScaleCamp the night before) claimed that they could produce revenue generating products in less than 2 weeks using Hadoop and machine learning on their data. For example, Linkedin is analyzing the profile data to map out career paths. Other companies like Facebook are banking on growth strategy that leverage the extraction of information from the large subscriber data available from their services.
The was excitement at the Hadoop Summit. Hadoop and its ecosystem offer a promising platform to address big data problems and applications.
Copyright (c) 2009 by Waiming Mok