标签:云帆大数据学院 mahout graphx hive flume sqoop
1、ApacheHadoop(100%永久开源)下载地址:
- http://hadoop.apache.org/releases.html
- SVN:http://svn.apache.org/repos/asf/hadoop/common/branches/
2、CDH(ClouderaDistributed Hadoop,100%永久开源)下载地址:
- http://archive.cloudera.com/cdh4/cdh/4/(是tar.gz文件!)
- http://archive.cloudera.com/cdh5/cdh/ (是tar.gz文件!)
(1) 官网:http://hadoop.apache.org
(2) 下载Hadoop包
(3) 官方版本存在的问题
官方版本是在Linux 32位环境下编译的,在Linux64为环境下运行会出错:
u 错误警告:WARNutil.NativeCodeLoader: Unable to load native-hadoop library for yourplatform... using builtin-java classes where applicable。
u 官网提供的二进制包,里面的native库,是32位的可以通过以下命令进行查看:
$file $HADOOP_PREFIX/lib/native/libhadoop.so.1.0.0
可以看到该库是基于32位的
libhadoop.so.1.0.0: ELF 32-bit LSBshared object, Intel 80386, version 1 (SYSV), dynamically linked,BuildID[sha1]=0x9eb1d49b05f67d38454e42b216e053a27ae8bac9, not stripped。
在下载下来的hadoop-2.2.0-src.tar.gz包下有个BUILDING.txt文件,这个文件详细说明了编译步骤
Build instructions for Hadoop
----------------------------------------------------------------------------------
Requirements:先决条件
* Unix System (这里采用社区版Linux CentOS 6.4版本 64位)
* JDK 1.6+ (JDK 1.6以上)
* Maven 3.0 or later (建议最好采用 3.0.5版本)
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer (if compiling native code) (编译本地库)
* Internet connection for first build (to fetch allMaven and Hadoop dependencies) (联网下载依赖包)
----------------------------------------------------------------------------------
Maven main modules:
hadoop (Main Hadoopproject)
-hadoop-project (Parent POM forall Hadoop Maven modules. )
(Allplugins & dependencies versions are defined here.)
-hadoop-project-dist (Parent POM formodules that generate distributions.)
-hadoop-annotations (Generates theHadoop doclet used to generated the Javadocs)
-hadoop-assemblies (Mavenassemblies used by the different modules)
-hadoop-common-project (Hadoop Common)
-hadoop-hdfs-project (Hadoop HDFS)
-hadoop-mapreduce-project (Hadoop MapReduce)
-hadoop-tools (Hadoop toolslike Streaming, Distcp, etc.)
-hadoop-dist (Hadoopdistribution assembler)
----------------------------------------------------------------------------------
Where to run Maven from?
It can berun from any module. The only catch is that if not run from utrunk all modules that are not part of the buildrun must be installed in the local Mavencache or available in a Maven repository.
----------------------------------------------------------------------------------
Maven build goals:
* Clean : mvn clean
*Compile : mvn compile[-Pnative]
* Runtests : mvn test[-Pnative]
* CreateJAR : mvn package
* Runfindbugs : mvn compilefindbugs:findbugs
* Runcheckstyle : mvn compilecheckstyle:checkstyle
* InstallJAR in M2 cache : mvn install
* Deploy JARto Maven repo : mvn deploy
* Runclover : mvn test -Pclover[-DcloverLicenseLocation=${user.name}/.clover.license]
* RunRat : mvnapache-rat:check
* Buildjavadocs : mvn javadoc:javadoc
* Builddistribution : mvn package[-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar]
* Change Hadoopversion : mvn versions:set-DnewVersion=NEWVERSION
Buildoptions:
* Use-Pnative to compile/bundle native code
* Use-Pdocs to generate & bundle the documentation in the distribution (using-Pdist)
* Use -Psrcto create a project source TAR.GZ
* Use -Dtarto create a TAR with the distribution (using -Pdist)
Snappybuild options:
Snappy isa compression library that can be utilized by the native code. It is currentlyan optional component, meaning that Hadoop can be built with or without this dependency.
* Use-Drequire.snappy to fail the build if libsnappy.so is not found. If this optionis not specified and the snappy library is missing, we silently build a version of libhadoop.sothat cannot make use of snappy. Thisoption is recommended if you plan on making use of snappy and want to get more repeatable builds.
* Use-Dsnappy.prefix to specify a nonstandard location for the libsnappy headerfiles and library files. You do not need this option if you have installedsnappy using a package manager.
* Use-Dsnappy.lib to specify a nonstandard location for the libsnappy library files. Similarly to nappy.prefix, you do not need this option if you have installed snappy using a package manager.
* Use-Dbundle.snappy to copy the contents of the snappy.lib directory into the finaltar file. This option requires that -Dsnappy.lib is also given, and it ignoresthe -Dsnappy.prefix option.
---------------------------------------------------------------------------------
Building components separately
If you are building a submodule directory, all thehadoop dependencies this submodule has will be resolved as all other 3rd partydependencies. This is,from the Maven cache or from a Maven repository (if notavailable in the cache or the SNAPSHOT ‘timed out‘).
An alternative is to run ‘mvn install -DskipTests‘ from Hadoop source top levelonce; and then work from the submodule. Keep in mind that SNAPSHOTs time outafter a while, using the Maven ‘-nsu‘ will stop Maven from trying to updateSNAPSHOTs from external repos.
----------------------------------------------------------------------------------
Protocol Buffer compiler
The version of Protocol Buffer compiler, protoc,must match the version of the protobuf JAR.
If you have multiple versions of protoc in yoursystem, you can set in your build shell the HADOOP_PROTOC_PATH environmentvariable to point to the one you want to use for the Hadoop build. If you don‘tdefine this environment variable,protoc is looked up in the PATH.
----------------------------------------------------------------------------------
Importing projects to eclipse
When you import the project to eclipse, installhadoop-maven-plugins at first.
$ cdhadoop-maven-plugins
$ mvninstall
Then, generate eclipse project files.
$ mvneclipse:eclipse -DskipTests
At last, import to eclipse by specifying the rootdirectory of the project via
[File] > [Import] > [Existing Projects intoWorkspace].
----------------------------------------------------------------------------------
Building distributions: (编译发布)
Create binary distribution without native codeand without documentation:(二进制源码)
$ mvnpackage -Pdist -DskipTests –Dtar
Create binary distribution with native code andwith documentation:(二进制源码+本地库+文档)
$ mvnpackage -Pdist,native,docs -DskipTests –Dtar
Create source distribution:(源码)
$ mvnpackage -Psrc –DskipTests
Create source and binarydistributions with native code and documentation:(源码+二进制源码+本地库+文档)
$ mvnpackage -Pdist,native,docs,src -DskipTests –Dtar
Create a local staging version of the website (in/tmp/hadoop-site)
$ mvn cleansite; mvn site:stage -DstagingDirectory=/tmp/hadoop-site
----------------------------------------------------------------------------------
Handling out of memory errors in builds(解决内存溢出问题)
If the build process fails with an out of memoryerror, you should be able to fix it by increasing the memory used by maven-which can be done via the environment variable MAVEN_OPTS.
Here is an example setting to allocate between 256and 512 MB of heap space to Maven
export MAVEN_OPTS="-Xms256m -Xmx512m"
----------------------------------------------------------------------------------
这里采用社区版CentOS 6.4版本 64位. 下载地址:http://www.centoscn.com/CentosSoft/
(1) 设置VMware虚拟机网络模式为:NAT模式
(2) 设置Linux操作系统的网络类型为:动态获取DHCP服务器地址,与宿主机共享网络
(3) 测试:ping www.baidu.com
说明: JDK版本为1.5以上 ; 64位编译版本 (本环境采用jdk-6u45-linux-x64.bin)
(1)使用FTP工具(WinSCP工具或FileZilla)将jdk-6u45-linux-x64.bin上传到Linxu系统/software/目录下
(2)安装jdk
cd /software/
chmod u+x jdk-6u45-linux-x64.bin --授予执行权限
mkdir /workDir --创建一个软件安装目录(个人习惯而已)
cp jdk-6u45-linux-x64.bin /workDir --复制到workDir目录
./ jdk-6u45-linux-x64.bin --执行自解压文件
mv jdk1.6.0_45 jdk6u45 --方便起见,对文件夹重命名
(3)配置环境变量
Vi /etc/profile
增加如下配置:
export JAVA_HOME=/workDir/jdk6u45
export PATH=.:$PATH:$JAVA_HOME/bin
(1) 使环境变量生效
source /etc/profile
(5)验证jdk是否安装成功
java –verson
yum install autoconf -y
yum install automake -y
yum install libtool -y
yum install cmake -y
yum installncurses-devel -y
yum installopenssl-devel -y
yum installgcc -y
yum install gcc-c++ -y
yum install lzo-devel -y
yum installzlib-devel -y
说明:-y 代表在安装过程中提示选择默认为“yes”
验证:
rpm –qa | grep autoconf
【yum命令简介】:
yum(全称为 Yellow dog Updater, Modified)是一个在Fedora和RedHat以及SUSE中的Shell前端软件包管理器。基於RPM包管理,能够从指定的服务器自动下载RPM包并且安装,可以自动处理依赖性关系,并且一次安装所有依赖的软体包,无须繁琐地一次次下载、安装。yum提供了查找、安装、删除某一个、一组甚至全部软件包的命令,而且命令简洁而又好记。
yum的命令形式一般是如下:yum [options] [command] [package...]
其中的[options]是可选的,选项包括-h(帮助),-y(当安装过程提示选择全部为"yes"),-q(不显示安装的过程)等等。[command]为所要进行的操作,[package ...]是操作的对象。
- 部分常用的命令包括:
自动搜索最快镜像插件: yum install yum-fastestmirror
安装yum图形窗口插件: yum install yumex
查看可能批量安装的列表: yum grouplist
- 安装
yuminstall 全部安装
yuminstall package1 安装指定的安装包package1
yumgroupinsall group1 安装程序组group1
(1) Maven 版本下载apache-maven-3.0.5-bin.tar.gz
说明:不要使用最新的Maven 3.1.1,Hadoop2.2.0的源码与Maven3.x存在兼容性问题,所以会出现
java.lang.NoClassDefFoundError:org/sonatype/aether/graph/DependencyFilter
建议使用Maven3.0.5版本
(2) 下载
地址: http://maven.apache.org/download.cgi
选择 apache-maven-3.0.5-bin.tar.gz下载
(3) 上传到Linux并解压到安装目录
tar –zxvf apache-maven-3.0.5-bin.tar.gz –C/workDir
(4) 设置环境变量
vi/etc/profile
新增:
exportMAVEN_HOME=/workDir/apache-maven-3.0.5
exportPATH=$PATH:$MAVEN_HOME/bin
执行命令:source /etc/profile 或者 . /etc/profile
验证:
mvn-v
(1) 编辑 settings.xml文件
进入安装目录 /workDir/apache-maven-3.0.5/conf
* 修改<mirrors>内容:
<mirror>
<id>nexus-osc</id>
<mirrorOf>*</mirrorOf>
<name>Nexusosc</name>
<url>http://maven.oschina.net/content/groups/public/</url>
</mirror>
* 修改<profiles>内容:
<profile>
<id>jdk-1.6</id>
<activation>
<jdk>1.6</jdk>
</activation>
<repositories>
<repository>
<id>nexus</id>
<name>localprivate nexus</name>
<url>http://maven.oschina.net/content/groups/public/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>nexus</id>
<name>localprivate nexus</name>
<url>http://maven.oschina.net/content/groups/public/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</pluginRepository>
</pluginRepositories>
</profile>
(2) 复制配置
说明:将settings.xml文件复制到用户目录,使得每次对maven创建时,都采用该配置
cd /home/Hadoop --*查看用户目录【/home/hadoop】是否存在【.m2】文件夹,如没有,则创建
mkdir .m2
cp /workDir/apache-maven-3.0.5/conf/settings.xml~/.m2 --复制文件
(3) 配置DNS
vi /etc/resolv.conf
修改如下:
nameserver 8.8.8.8
nameserver 8.8.4.4
(1) 下载protobuf-2.5.0.tar.gz
https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
(2) 解压到安装目录
cd /software
tar-zxvf protobuf-2.5.0.tar.gz –C /wrokDir
(3) 安装下面3个依赖包(如果已经安装可以跳过)
yuminstall gcc -y
yuminstall gcc-c++ -y
yuminstall make -y
【说明】:如果缺少这个3个依赖包,会报下面的错误:
ERROR]Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.2.0:protoc(compile-protoc) on project hadoop-common:org.apache.maven.plugin.MojoExecutionException: ‘protoc --version‘ did notreturn a version -> [Help 1]
[ERROR]
[ERROR]To see the full stack trace of the errors, re-run Maven with the -eswitch.
[ERROR]Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR]For more information about the errors and possible solutions, please read thefollowing articles:
[ERROR][Help 1]http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR]After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :hadoop-common
(4) 编译安装,执行配置文件
进入安装目录,执行configure文件
cd/workDir/protobuf-2.5.0 --进入安装目录
./configure --执行配置文件
(5) 安装
make& make check & make install
说明:安装protobuf需要安装gcc gcc-c++系统包(如果之前安装的话就不用再安装)
(6) 配置环境变量
vi /etc/profile
新增:
export PROTOBUF_HOME=/workDir/ protobuf-2.5.0
export PATH=$PATH:$PROTOBUF_HOME/bin
使配置生效:
source /etc/profile 或者 . /etc/profile
验证:
protoc --version
(1) 下载:findbugs-3.0.0.tar.gz
http://sourceforge.jp/projects/sfnet_findbugs/releases/
(2) 解压到安装目录
cd /software
tar -zxvf findbugs-3.0.0.tar.gz-C /workDir
(3) 设置环境变量
vi/etc/profile
增加如下内容:
exportFINDBUGS_HOME=/wrokDir/findbugs-3.0.0
exportPATH=$PATH:$FINDBUGS_HOME/bin
(4) 使环境变量生效
source/etc/profile 或者 ./etc/profile
(5) 验证
findbugs-version
重要说明:
如果出现以下错误,说明jdk版本不兼容导致。findbugs-2.5.0和findbugs3.0.0是在jdk7以上编译的,所以需要在Linux上安装jdk7才可以。
错误提示:
(1) 下载:hadoop-2.2.0-src.tar.gz
http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0-src.tar.gz
(2) 解压到安装目录
cd/software
tar-zxvf hadoop-2.2.0-src.tar.gz –C/workDir
(3) 源码包打Patch
- 重要说明:hadoop-2.2.0版本的源码存在bug,在apache官方JIRA上有说明:
JIRA地址:https://issues.apache.org/jira/browse/HADOOP-10110
- Bug修复办法:
Index: hadoop-common-project/hadoop-auth/pom.xml
===================================================================
--- hadoop-common-project/hadoop-auth/pom.xml (revision 1543124)
+++ hadoop-common-project/hadoop-auth/pom.xml (working copy)
@@ -54,6 +54,11 @@
</dependency>
<dependency>
<groupId>org.mortbay.jetty</groupId>
+ <artifactId>jetty-util</artifactId>
+ <scope>test</scope>
+ </dependency>
+ <dependency>
+ <groupId>org.mortbay.jetty</groupId>
<artifactId>jetty</artifactId>
<scope>test</scope>
</dependency>
从上面官方的bug修复说明中可以看到,需要编辑目录$HADOOP_SRC_HOME/hadoop-common-project/hadoop-auth中的pom.xml文件,在第55行下增加以下内容:
<dependency>
<groupId>org.mortbay.jetty</groupId>
<artifactId>jetty-util</artifactId>
<scope>test</scope>
</dependency>
否则会报下面的错误:
[ERROR]Failed to execute goalorg.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile(default-testCompile) on project hadoop-auth: Compilation failure: Compilationfailure:
[ERROR]/home/chuan/trunk/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/client/AuthenticatorTestCase.java:[84,13]cannot access org.mortbay.component.AbstractLifeCycle
[ERROR]class file for org.mortbay.component.AbstractLifeCycle not found
(4) 编译
官方编译说明:
Createsource and binary distributions with native code and documentation:(源码+二进制源码+本地库+文档)
$ mvnpackage -Pdist,native,docs,src -DskipTests –Dtar
cd/wrokDir/Hadoop-2.2.0-src
mvnpackage -DskipTests -Pdist,native -Dtar
说明:如果在编译过程中出现内存溢出的情况时,可以调整一下内存大小
export MAVEN_OPTS="-Xms256m -Xmx512m"
这个过程时间比较久,需要上网下载依赖包……
直到看到下面的信息,说明编译成功:
[INFO]------------------------------------------------------------------------
[INFO]BUILD SUCCESS
[INFO]------------------------------------------------------------------------
[INFO]Total time: 11:53.144s
[INFO]Finished at: Fri Nov 22 16:58:32 CST 2013
[INFO]Final Memory: 70M/239M
[INFO]------------------------------------------------------------------------
1. 查看编译后的文件
编译后的路径在:hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0
cd /workDir/ hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0
ll --查看编译好的目录
编译后hadoop-2.2.0目录下的目录:
drwxr-xr-x. 2 root root 4096 Aug 11 12:00 bin
drwxr-xr-x. 3 root root 4096 Aug 11 12:00 etc
drwxr-xr-x. 2 root root 4096 Aug 11 12:00 include
drwxr-xr-x. 3 root root 4096 Aug 11 12:00 lib
drwxr-xr-x. 2 root root 4096 Aug 11 12:00 libexec
drwxr-xr-x. 2 root root 4096 Aug 11 12:00 sbin
drwxr-xr-x. 4 root root 4096 Aug 11 12:00 share
进入 bin目录,执行hadoop命令查看脚本
cd bin
./Hadoop version
可以看到所有版本:
[root@localhost bin]# ./hadoop version
Hadoop 2.2.0
Subversion Unknown -r Unknown
Compiled by root on 2014-08-11T18:34Z
Compiled with protoc 2.5.0
From source with checksum79e53ce7994d1628b240f09af91e1af4
This command was run using /workDir/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/share/hadoop/common/
hadoop-common-2.2.0.jar
2. 查看本地库编译版本
cd /workDir/ hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0
file lib//native/*
可以看到是64位的版本了(红色字部分):
[root@localhost hadoop-2.2.0]# file lib//native/*
lib//native/libhadoop.a: current ar archive
lib//native/libhadooppipes.a: current ar archive
lib//native/libhadoop.so: symbolic link to `libhadoop.so.1.0.0‘
lib//native/libhadoop.so.1.0.0: ELF 64-bit LSB shared object, x86-64, version 1(SYSV), dynamically linked, not stripped
lib//native/libhadooputils.a: current ar archive
lib//native/libhdfs.a: current ar archive
lib//native/libhdfs.so: symbolic link to `libhdfs.so.0.0.0‘
lib//native/libhdfs.so.0.0.0: ELF 64-bit LSBshared object, x86-64, version 1 (SYSV), dynamically linked, not stripped
至此,编译成功!
标签:云帆大数据学院 mahout graphx hive flume sqoop
原文地址:http://yfteach01.blog.51cto.com/9428662/1629703