标签:大数据 hadoop伪分布式
Apache Hadoop是一款支持数据密集型分布式应用并以Apache 2.0许可协议发布的开源软件框架。它支持在商品硬件构建的大型集群上运行的应用程序。Hadoop是根据Google公司发表的MapReduce和Google档案系统的论文自行实作而成。
Hadoop框架透明地为应用提供可靠性和数据移动。它实现了名为MapReduce的编程范式:应用程序被分割成许多小部分,而每个部分都能在集群中的任意节点上执行或重新执行。此外,Hadoop还提供了分布式文件系统,用以存储所有计算节点的数据,这为整个集群带来了非常高的带宽。MapReduce和分布式文件系统的设计,使得整个框架能够自动处理节点故障。它使应用程序与成千上万的独立计算的电脑和PB级的数据。现在普遍认为整个Apache Hadoop“平台”包括Hadoop内核、MapReduce、Hadoop分布式文件系统(HDFS)以及一些相关项目,有Apache Hive和Apache HBase等等。<http://zh.wikipedia.org/wiki/Apache_Hadoop>
IP:10.15.62.228
系统环境:
# uname -srn
Linux localhost 2.6.32-358.el6.x86_64
解决依赖关系:
# yum groupinstall " Server Platfrom Developments" "Development Tools" -y
编译安装安装jdk
tar xf jdk-8u5-linux-x64.gz -C /usr/local/
cd /usr/local/
ln -sv /usr/local/jdk-8u5-linux-x64 /usr/local/java
导出java的环境变量
# vim /etc/profile.d/java.sh
export JAVA_HOME=/usr/local/java
export JAVA_BIN=/usr/local/java/bin
export PATH PATH=$PATH:$JAVA_BIN
export CLASSPATH=export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
测试安装的JDK
# java -version
java version "1.7.0_09-icedtea"
OpenJDK Runtime Environment (rhel-2.3.4.1.el6_3-x86_64)
OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)
安装hadoop
创建hadoop的运行用户
#useradd hadoop
# echo "password" | passwd --stdin hadoop
# tar xf hadoop-1.0.3.tar.gz -C /usr/local/
# cd /usr/local/hadoop-1.0.3/
# chown -R hadoop:hadoop ./*
# ln -sv /usr/local/hadoop-1.0.3/ /usr/local/hadoop
导出hadoop的环境变量:
# vim /etc/profile.d/hadoop.sh
HADOOP_PREFIX=/usr/local/hadoop
PATH=$HADOOP_PREFIX/bin:$PATH
export HADOOP_PREFIX PATH
#. /etc/profile.d/hadoop.sh
# su - hadoop
配置hadoop用户能够以基于密钥的验正方式登录本地主机,以便Hadoop可远程启动各节点上的Hadoop进程并执行监控等额外的管理工作。
$ ssh-keygen -t rsa -P ‘‘
$ ssh-copy-id .ssh/id_rsa.pub hadoop@localhost
测试安装的hadoop
$ hadoop -version
java version "1.8.0_05"
Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)
修改hadoop如下三个配置文件core-site.xml ,hdfs-site.xml,mapred-site.xml
修改core-site.xml 配置文件,fs.default.name中最好使用HOST:PORT
[hadoop@zabbix conf]$ vim /usr/local/hadoop/conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/temp</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>1440</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://zabbix.zkg.com:9000</value>
</property>
</configuration>
修改hdfs-site.xml 配置文件
[hadoop@zabbix conf]$ vim /usr/local/hadoop/conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/hadoop/temp/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/hadoop/temp/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
</configuration>
修改mapred-site.xml 配置文件
[hadoop@zabbix conf]$ vim /usr/local/hadoop/conf/mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<!--<value>10.15.62.228:9001</value>-->
<value>zabbix.zkg.com:9001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/hadoop/temp/local</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/hadoop/temp/system</value>
</property>
</configuration>
创建 hadoop.tmp.dir所指定的目录,并修改属组和属主为hadoop的运行用户(hadoop)
# mkdir -pv /hadoop/temp mkdir: created directory `/hadoop‘
mkdir: created directory `/hadoop/temp‘
# chown -R hadoop.hadoop /hadoop/*
格式化名称节点
启用hadoop,hadoop自身提供的有start-all.sh hadoop启动脚本(stop-all.sh hadoop停止脚本),在hadoop的bin目录下
[hadoop@zabbix conf]$ start-all.sh
starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-zabbix.server.com.out
zabbix.zkg.com: Warning: $HADOOP_HOME is deprecated.
zabbix.zkg.com:
zabbix.zkg.com: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-zabbix.zkg.com.out
zabbix.zkg.com: Warning: $HADOOP_HOME is deprecated.
zabbix.zkg.com:
zabbix.zkg.com: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-zabbix.zkg.com.out
starting jobtracker, logging to /usr/local/hadoop/logs/hadoop-hadoop-jobtracker-zabbix.server.com.out
zabbix.zkg.com: Warning: $HADOOP_HOME is deprecated.
zabbix.zkg.com:
zabbix.zkg.com: starting tasktracker, logging to /usr/local/hadoop/logs/hadoop-hadoop-tasktracker-zabbix.zkg.com.out
使用jps验证hadoop是否正常启动
[hadoop@zabbix conf]$ jps
3168 DataNode
3281 SecondaryNameNode
3553 Jps
3059 NameNode
3369 JobTracker
3499 TaskTracker
$ ss -untlp | grep java
tcp LISTEN 0 128 :::50060 :::* users:(("java",8022,83))
tcp LISTEN 0 50 :::38829 :::* users:(("java",7683,61))
tcp LISTEN 0 128 ::ffff:127.0.0.1:56334 :::* users:(("java",8022,66))
tcp LISTEN 0 128 :::50030 :::* users:(("java",7888,79))
tcp LISTEN 0 128 :::50070 :::* users:(("java",7568,82))
tcp LISTEN 0 50 :::50010 :::* users:(("java",7683,72))
tcp LISTEN 0 128 :::50075 :::* users:(("java",7683,73))
tcp LISTEN 0 50 :::47517 :::* users:(("java",7568,61))
tcp LISTEN 0 128 :::50020 :::* users:(("java",7683,79))
tcp LISTEN 0 50 :::41540 :::* users:(("java",7803,61))
tcp LISTEN 0 128 ::ffff:10.15.62.228:9000 :::* users:(("java",7568,71))
tcp LISTEN 0 128 ::ffff:10.15.62.228:9001 :::* users:(("java",7888,68))
tcp LISTEN 0 128 :::50090 :::* users:(("java",7803,74))
tcp LISTEN 0 50 :::50698 :::* users:(("java",7888,61))
使用hadoop就可以列出要使用的命令
使用hadoop fs -ls列出目录,出现ls: Cannot access .: No such file or directory.时,创建一个test目录,然后使用hadoop fs -ls列出目录:
在本地编辑一个测试文件:
$ vim had.txt
hello
word
word
This is a test file!
welcome to hadoop!
将创建的文件拷贝到/user/hadoop/test
$ hadoop fs -put ~/had.txt test
$ hadoop fs -ls test
Found 1 items
-rw-r--r-- 1 hadoop supergroup 56 2014-07-25 14:13 /user/hadoop/test/had.txt
打开查看文件内容:
$ hadoop fs -cat wordcount/*
hello
word
word
This is a test file!
welcome to hadoop!
离开hodoop的安全模式
$ hadoop dfsadmin -safemode leave
测试运行wordcount程序
$ hadoop jar /usr/local/hadoop/hadoop-examples-1.0.3.jar wordcount test output
14/07/25 14:29:56 INFO input.FileInputFormat: Total input paths to process : 1
14/07/25 14:29:56 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/07/25 14:29:56 WARN snappy.LoadSnappy: Snappy native library not loaded
14/07/25 14:29:57 INFO mapred.JobClient: Running job: job_201407251359_0001
14/07/25 14:29:58 INFO mapred.JobClient: map 0% reduce 0%
14/07/25 14:30:15 INFO mapred.JobClient: map 100% reduce 0%
14/07/25 14:30:27 INFO mapred.JobClient: map 100% reduce 100%
14/07/25 14:30:32 INFO mapred.JobClient: Job complete: job_201407251359_0001
14/07/25 14:30:32 INFO mapred.JobClient: Counters: 29
14/07/25 14:30:32 INFO mapred.JobClient: Map-Reduce Framework
14/07/25 14:30:32 INFO mapred.JobClient: Spilled Records=20
14/07/25 14:30:32 INFO mapred.JobClient: Map output materialized bytes=117
14/07/25 14:30:32 INFO mapred.JobClient: Reduce input records=10
14/07/25 14:30:32 INFO mapred.JobClient: Virtual memory (bytes) snapshot=5486075904
14/07/25 14:30:32 INFO mapred.JobClient: Map input records=5
14/07/25 14:30:32 INFO mapred.JobClient: SPLIT_RAW_BYTES=114
14/07/25 14:30:32 INFO mapred.JobClient: Map output bytes=100
14/07/25 14:30:32 INFO mapred.JobClient: Reduce shuffle bytes=117
14/07/25 14:30:32 INFO mapred.JobClient: Physical memory (bytes) snapshot=247083008
14/07/25 14:30:32 INFO mapred.JobClient: Reduce input groups=10
14/07/25 14:30:32 INFO mapred.JobClient: Combine output records=10
14/07/25 14:30:32 INFO mapred.JobClient: Reduce output records=10
14/07/25 14:30:32 INFO mapred.JobClient: Map output records=11
14/07/25 14:30:32 INFO mapred.JobClient: Combine input records=11
14/07/25 14:30:32 INFO mapred.JobClient: CPU time spent (ms)=5820
14/07/25 14:30:32 INFO mapred.JobClient: Total committed heap usage (bytes)=138809344
14/07/25 14:30:32 INFO mapred.JobClient: File Input Format Counters
14/07/25 14:30:32 INFO mapred.JobClient: Bytes Read=56
14/07/25 14:30:32 INFO mapred.JobClient: FileSystemCounters
14/07/25 14:30:32 INFO mapred.JobClient: HDFS_BYTES_READ=170
14/07/25 14:30:32 INFO mapred.JobClient: FILE_BYTES_WRITTEN=43211
14/07/25 14:30:32 INFO mapred.JobClient: FILE_BYTES_READ=117
14/07/25 14:30:32 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=71
14/07/25 14:30:32 INFO mapred.JobClient: Job Counters
14/07/25 14:30:32 INFO mapred.JobClient: Launched map tasks=1
14/07/25 14:30:32 INFO mapred.JobClient: Launched reduce tasks=1
14/07/25 14:30:32 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10875
14/07/25 14:30:32 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/07/25 14:30:32 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=17994
14/07/25 14:30:32 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/07/25 14:30:32 INFO mapred.JobClient: Data-local map tasks=1
14/07/25 14:30:32 INFO mapred.JobClient: File Output Format Counters
14/07/25 14:30:32 INFO mapred.JobClient: Bytes Written=71
查看输出目录output,出现如下目录则执行成功
$ hadoop fs -ls output
Found 3 items
-rw-r--r-- 1 hadoop supergroup 0 2014-07-25 14:30 /user/hadoop/output/_SUCCESS
drwxr-xr-x - hadoop supergroup 0 2014-07-25 14:29 /user/hadoop/output/_logs
-rw-r--r-- 1 hadoop supergroup 71 2014-07-25 14:30 /user/hadoop/output/part-r-00000
查看单词统计结果:
$ hadoop fs -cat /user/hadoop/output/part-r-00000
This1
a1
file!1
hadoop!1
hello1
is1
test1
to1
welcome1
word2
在WEB页面下查看Hadoop工作情况
启动后可以通过以下两个页面查看节点状况和job状况
http://IP:50070;
http://IP:50030。
可以查看任务的执行情况
本文出自 “Linux之旅” 博客,请务必保留此出处http://openlinuxfly.blog.51cto.com/7120723/1688779
标签:大数据 hadoop伪分布式
原文地址:http://openlinuxfly.blog.51cto.com/7120723/1688779