标签:
What is Spark?
1,Apache Spark is a fast and general engine for large-scale data processing
2,Speed
3,Ease of Use
4,Generality
5,Integrated with Hadoop
One Stack to rule them all: Spark Streaming ======> Ad-hoc Queries
\\ SPARK //
\\ //
=> Batch Processing <=
Hadoop Data Sharing And Spark Data Sharing:
ps:Hadoop频繁与磁盘上的数据进行交互,频繁对数据进行序列化和反序列化
Hadoop的Map Reduce类比图:
Why Spark fast ?
1,Memory based computation
2,DAG 有向无环图(Directed Acyclic Graph)
3,Thread model
4,Optimization (eg:delay scheduling)
API: Scala Python Java R
Cluster Manager: Local Standalone Yarn Mesos
Dependency:norrow wide
标签:
原文地址:http://www.cnblogs.com/wsongmao/p/5765777.html