Andy Feng of Yahoo presented his work on Apache Storm. This picture shows the 3 types of Hadoop 2 processing scenarios:
- Hadoop batch processing (MapReduce or the newer Tez providing DAG based processing)
- Spark iterative processing (for machine learning where the algorithms crunch on the same data repeated to minimize some objective function. Spark supports the Directed Acyclic Graph processing model. With the capabilities of Spark, it has drawn increasing interests.
- Storm stream processing for real time data
This platform presents an “operating system”-like set of functions to manage cluster of compute and storage:
- HDFS to manage storage
- YARN to manage compute resources
- MapReduce/Tez/Storm and Spark to schedule and run tasks
All open source and changing quickly.