SPARK学习笔记-20160812

时间：2016-08-12 18:22:33 阅读：134 评论：0 收藏：0 [点我收藏+]

标签：

What is Spark?

1,Apache Spark is a fast and general engine for large-scale data processing

2,Speed

3,Ease of Use

4,Generality

5,Integrated with Hadoop

One Stack to rule them all:　　　　　　　　Spark Streaming ======> Ad-hoc Queries

　　　　　　　　　　　　　　　　　　　　　　　　　　\\　　　　　　SPARK　　　　　　//

　　　　　　　　　　　　　　　　　　　　　　　　　　　\\　　　　　　　　　　　　　　//

　　　　　　　　　　　　　　　　　　　　　　　　　　　　=>　　Batch Processing <=

Hadoop Data Sharing And Spark Data Sharing:

技术分享

ps:Hadoop频繁与磁盘上的数据进行交互，频繁对数据进行序列化和反序列化

Hadoop的Map Reduce类比图：技术分享

Why Spark fast ?

1,Memory based computation

2,DAG　　　有向无环图（Directed Acyclic Graph）

3,Thread model

4,Optimization (eg:delay scheduling)

API: Scala　　Python　　Java　　R

Cluster Manager:　Local　　Standalone　　Yarn　　Mesos

技术分享

Dependency:norrow wide 技术分享

技术分享

标签：

原文地址：http://www.cnblogs.com/wsongmao/p/5765777.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

周排行