CarbonData编译与安装

时间：2016-08-15 22:23:04 阅读：1383 评论：0 收藏：0 [点我收藏+]

标签：

原文连接 http://xiguada.org/carbondata_compile/

CarbonData是啥？

CarbonData is a fully indexed columnar and Hadoop native data-store for processing heavy analytical workloads and detailed queries on big data. In customer benchmarks, CarbonData has proven to manage Petabyte of data running on extraordinarily low-cost hardware and answers queries around 10 times faster than the current open source solutions (column-oriented SQL on Hadoop data-stores).

编译安装

本想迅速试用一下，不过官网居然没有现成编译好的工程，没办法，只能自己编译一个。

安装需要三步（当然还需要jdk7或jdk8,，maven 3.3以上）

- 下载 Spark 1.5.0 或更新的版本。

- 下载并安装 Apache Thrift 0.9.3，并确认加到系统路径。

- 下载 Apache CarbonData code 并编译。

1 Spark可以直接下载，解压后设置PATH可执行spark-submit。

2 安装thrift前需要安装依赖，我的虚拟机啊ubuntu下安装依赖的命令如下。

sudo apt-get install libboost-dev libboost-test-dev libboost-program-options-dev libevent-dev automake libtool flex bison pkg-config g++ libssl-dev

然后到thrift下编译安装

./configure

sudo make

sudo make install

3 编译CarbonData

mvn -DskipTests -Pspark-1.6 -Dspark.version=1.6.2 clean package

4 进入bin目录，修改carbon-spark-sql 文件中的 /bin/spark-submit，改为spark-submit

5 生成sample.csv文件

cd carbondata

cat > sample.csv << EOF

id,name,city,age

1,david,shenzhen,31

2,eason,shenzhen,27

3,jarry,wuhan,35

EOF

6 执行

./carbon-spark-sql

spark-sql> create table if not exists test_table (id string, name string, city string, age Int) STORED BY ‘carbondata‘

spark-sql> load data inpath ‘../sample.csv‘ into table test_table

spark-sql> select city, avg(age), sum(age) from test_table group by city

执行结果

shenzhen 29.0 58

wuhan 35.0 35

看起来和执行SparkSQL一样，CarbonData这中间做了啥，有啥效果呢？后面继续分析。

CarbonData编译与安装

标签：

原文地址：http://www.cnblogs.com/shenh062326/p/5774432.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行