Spark 大数据平台

时间：2016-01-04 19:38:22 阅读：132 评论：0 收藏：0 [点我收藏+]

标签：

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.

BDAS, the Berkeley Data Analytics Stack, is an open source software stack that integrates software components being built by the AMPLab to make sense of Big Data.

技术分享 ?

Spark Components	VS.	Hadoop Components
Spark Core	<------>	Apache Hadoop MR
Spark Streaming	<------>	Apache Storm
Spark SQL	<------>	Apache Hive
Spark GraphX	<------>	MPI(taobao)
Spark MLlib	<------>	Apache Mahout

BlinkDB is a massively parallel, approximate query engine for running interactive SQL queries on large volumes of data. It allows users to +, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars.
Two key ideas:

An adaptive optimization framework that builds and maintains a set of multi-dimensional samples from original data over time
A dynamic sample selection strategy that selects an appropriately sized sample based on a query’s accuracy and/or response time requirements.

Why spark is fast:

in-memory computing
Directed Acyclic Graph (DAG) engine, compiler can see the whole computing graph in advance so that it can optimize it. Delay Scheduling

Resilient Distributed Dataset

A list of partitions
A function for computing each split
A list of dependencies on other RDDs
Optionally, a Partitioner for key-value RDDs (e.g. to say that the RDD is hash-partitioned)
Optionally, a list of preferred locations to compute each split on (e.g. block locations for an HDFS file)

Storage Strategy

class StorageLevel private(
    private var useDisk_ : Boolean,
    private var useMemory_ : Boolean,
    private var deserialized_ : Boolean,
    private var replication_ : Int = 1)
    
val MEMORY_ONLY_ = new StorageLevel(false, true, true)

RDD, transformation & action

lazy evaluation
技术分享 ?

Spark 大数据平台

标签：

原文地址：http://www.cnblogs.com/rainbow203/p/Spark-da-shu-ju-ping-tai.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行