标签:mon invalid 目的 lin success oca 登录 direct 常见错误
7.8.1. dfs.namenode.rpc-address 11
11.1. 运行“hdfs dfs -ls”时报ConnectException 17
11.2. Incompatible clusterIDs 18
11.3. Inconsistent checkpoint fields 20
本文的目的是为当前最新版本号的Hadoop 2.4.0提供最为具体的安装说明,以帮助降低安装过程中遇到的困难,并对一些错误原因进行说明。本文的安装仅仅涉及了hadoop-common、hadoop-hdfs、hadoop-mapreduce和hadoop-yarn,并不包括HBase、Hive和Pig等。
共5台机器,部署例如以下表所看到的:
NameNode |
SecondaryNameNode |
DataNodes |
172.25.40.171 |
172.25.39.166 |
10.12.154.77 10.12.154.78 10.12.154.79 |
机器IP |
相应的主机名 |
172.25.40.171 |
VM-40-171-sles10-64 |
172.25.39.166 |
VM-39-166-sles10-64 |
10.12.154.77 |
DEVNET-154-77 |
10.12.154.78 |
DEVNET-154-70 |
10.12.154.79 |
DEVNET-154-79 |
注意主机名不能有下划线,否则启动时。SecondaryNameNode节点会报例如以下所看到的的错误(取自hadoop-hadoop-secondarynamenode-VM_39_166_sles10_64.out文件):
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /data/hadoop/hadoop-2.4.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It‘s highly recommended that you fix the library with ‘execstack -c <libfile>‘, or link it with ‘-z noexecstack‘. Exception in thread "main" java.lang.IllegalArgumentException: The value of property bind.address must not be null at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.apache.hadoop.conf.Configuration.set(Configuration.java:971) at org.apache.hadoop.conf.Configuration.set(Configuration.java:953) at org.apache.hadoop.http.HttpServer2.initializeWebServer(HttpServer2.java:391) at org.apache.hadoop.http.HttpServer2.<init>(HttpServer2.java:344) at org.apache.hadoop.http.HttpServer2.<init>(HttpServer2.java:104) at org.apache.hadoop.http.HttpServer2$Builder.build(HttpServer2.java:292) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:264) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNameNode.java:192) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:651) |
命令hostname不但能够查看主机名,还能够用它来改动主机名,格式为:hostname 新主机名。
在改动之前172.25.40.171相应的主机名为VM-40-171-sles10-64,而172.25.39.166相应的主机名为VM_39_166_sles10_64。两者的主机名均带有下划线,因此须要改动。为求简单,仅将原下划线改成横线:
hostname VM-40-171-sles10-64
hostname VM-39-166-sles10-64
经过上述改动后,还不够。类似于改动环境变量。还须要通过改动系统配置文件做永久改动。
不行的Linux发行版本号。相应的系统配置文件可能不同。对于SuSE 10.1。它的是/etc/HOSTNAME:
# cat /etc/HOSTNAME VM_39_166_sles10_64 |
将文件里的“VM_39_166_sles10_64”,改成“VM-39-166-sles10-64”。
有些Linux发行版本号相应的可能是/etc/hostname文件,有些可能是/etc/sysconfig/network文件。
不但所在文件不同,改动的方法可能也不一样。比方有些是名字对形式,如:HOSTNAME=主机名。
改动之后,须要重新启动网卡,以使改动生效。运行命令:/etc/rc.d/boot.localnet start(不同系统,命令会有差异。这是SuSE上的方法),再次使用hostname查看,会发现主机名变了。
直接重新启动系统,也能够使改动生效。
注意改动主机名后,须要又一次验证ssh免password登录,方法为:ssh username@新的主机名。
要求能通过免登录包含使用IP和主机名都能免password登录:
1) NameNode能免password登录全部的DataNode
2) SecondaryNameNode能免password登录全部的DataNode
3) NameNode能免password登录自己
4) SecondaryNameNode能免password登录自己
5) NameNode能免password登录SecondaryNameNode
6) SecondaryNameNode能免password登录NameNode
7) DataNode能免password登录自己
8) DataNode不须要配置免password登录NameNode、SecondaryNameNode和其他DataNode。
为便于解说。本文约定Hadoop、JDK安装文件夹例如以下:
|
安装文件夹 |
版本号 |
说明 |
JDK |
/data/jdk |
1.7.0 |
ln -s /data/jdk1.7.0_55 /data/jdk |
Hadoop |
/data/hadoop/current |
2.4.0 |
ln -s /data/hadoop/hadoop-2.4.0 /data/hadoop/current |
在实际安装部署时。能够依据实际进行改动。
port |
作用 |
9000 |
fs.defaultFS,如:hdfs://172.25.40.171:9000 |
9001 |
dfs.namenode.rpc-address,DataNode会连接这个port |
50070 |
dfs.namenode.http-address |
50470 |
dfs.namenode.https-address |
50100 |
dfs.namenode.backup.address |
50105 |
dfs.namenode.backup.http-address |
50090 |
dfs.namenode.secondary.http-address,如:172.25.39.166:50090 |
50091 |
dfs.namenode.secondary.https-address,如:172.25.39.166:50091 |
50020 |
dfs.datanode.ipc.address |
50075 |
dfs.datanode.http.address |
50475 |
dfs.datanode.https.address |
50010 |
dfs.datanode.address,DataNode的传输数据port |
8480 |
dfs.journalnode.rpc-address |
8481 |
dfs.journalnode.https-address |
8032 |
yarn.resourcemanager.address |
8088 |
yarn.resourcemanager.webapp.address,YARN的httpport |
8090 |
yarn.resourcemanager.webapp.https.address |
8030 |
yarn.resourcemanager.scheduler.address |
8031 |
yarn.resourcemanager.resource-tracker.address |
8033 |
yarn.resourcemanager.admin.address |
8042 |
yarn.nodemanager.webapp.address |
8040 |
yarn.nodemanager.localizer.address |
8188 |
yarn.timeline-service.webapp.address |
10020 |
mapreduce.jobhistory.address |
19888 |
mapreduce.jobhistory.webapp.address |
2888 |
ZooKeeper,假设是Leader,用来监听Follower的连接 |
3888 |
ZooKeeper,用于Leader选举 |
2181 |
ZooKeeper。用来监听client的连接 |
60010 |
hbase.master.info.port,HMaster的http端口 |
60000 |
hbase.master.port,HMaster的RPC端口 |
60030 |
hbase.regionserver.info.port,HRegionServer的http端口 |
60020 |
hbase.regionserver.port。HRegionServer的RPC端口 |
8080 |
hbase.rest.port,HBase REST server的端口 |
10000 |
hive.server2.thrift.port |
9083 |
hive.metastore.uris |
为执行Hadoop(HDFS、YARN和MapReduce)须要完毕的工作详单:
Hadoop是Java语言开发的。所以须要。 |
|
NameNode控制SecondaryNameNode和DataNode使用了ssh和scp命令,须要无password运行。 |
|
Hadoop安装和配置 |
这里指的是HDFS、YARN和MapReduce,不包括HBase、Hive等的安装。 |
本文安装的JDK 1.7.0版本号,基于JDK1.8版本号也能够成功安装。但建议採用JDK1.7版本号。
原因是在编译Hadoop 2.4.0源代码时,使用JDK1.8时大量语法错误,改用JDK1.7版本号后,顺序通过。详情请參见《在Linux上编译Hadoop-2.4.0》一文。
JDK最新二进制安装包下载网址:
http://www.oracle.com/technetwork/java/javase/downloads
JDK1.7二进制安装包下载网址:
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html
本文下载的是64位Linux版本号的JDK1.7:jdk-7u55-linux-x64.gz。
请不要安装JDK1.8版本号,JDK1.8和Hadoop 2.4.0不匹配,编译Hadoop 2.4.0源代码时会报非常多错误。
JDK的安装很easy,将jdk-7u55-linux-x64.gz上传到Linux,然后解压,接着配置好环境变量就可以(本文jdk-7u55-linux-x64.gz被上传在/data文件夹下):
1) 进入/data文件夹
2) 解压安装包:tar xzf jdk-7u55-linux-x64.gz,解压后会在生成文件夹/data/jdk1.7.0_55
3) 建立软件链接:ln -s /data/jdk1.7.0_55 /data/jdk
4) 改动/etc/profile或用户文件夹下的profile。或同等文件,配置例如以下所看到的环境变量:
export JAVA_HOME=/data/jdk export CLASSPATH=$JAVA_HOME/lib/tools.jar export PATH=$JAVA_HOME/bin:$PATH |
完毕这项操作之后,须要又一次登录。或source一下profile文件,以便环境变量生效,当然也能够手工执行一下。以即时生效。假设还不放心。能够执行下java或javac。看看命令是否可执行。假设在安装JDK之前。已经可执行了,则表示不用安装JDK。
下面针对的是ssh2。而不是ssh。也不包含OpenSSH。
配置分两部分:一是对登录机的配置,二是对被登录机的配置。当中登录机为client,被登录机为服务端,也就是解决client到服务端的无password登录问题。下述涉及到的命令。能够直接复制到Linux终端上运行,已所有验证通过,操作环境为SuSE 10.1。
第一步,改动全部被登录机上的sshd配置文件/etc/ssh2/sshd2_config:
1) 将PermitRootLogin值设置为yes,也就是取掉前面的凝视号#
2) 将AllowedAuthentications值设置为publickey,password,也就是取掉前面的凝视号#
3) 重新启动sshd服务:service ssh2 restart
第二步。在全部登录机上。运行下面步骤:
1) 进入到.ssh2文件夹:cd ~/.ssh2
2) ssh-keygen2 -t dsa -P‘‘
-P表示password,-P‘‘就表示空password,也能够不用-P參数。但这样就要敲三次回车键,用-P‘‘就一次回车。
成功之后,会在用户的主文件夹下生成私钥文件id_dsa_2048_a,和公钥文件id_dsa_2048_a.pub。
3) 生成identification文件:echo "IdKey id_dsa_2048_a" >> identification,请注意IdKey后面有一个空格,确保identification文件内容例如以下:
# cat identification IdKey id_dsa_2048_a |
4) 将文件id_dsa_2048_a.pub,上传到全部被登录机的~/.ssh2文件夹:scp id_dsa_2048_a.pub root@192.168.0.1:/root/.ssh2。这里如果192.168.0.1为当中一个被登录机的IP。
在运行scp之前。请确保192.168.0.1上有/root/.ssh2这个文件夹,而/root/须要改动为root用户的实际HOME文件夹,通常环境变量$HOME为用户主文件夹,~也表示用户主文件夹,不带不论什么參数的cd命令也会直接切换到用户主文件夹。
第三步,在全部被登录机上,运行下面步骤:
1) 进入到.ssh2文件夹:cd ~/.ssh2
2) 生成authorization文件:echo "Key id_dsa_2048_a.pub" >> authorization,请注意Key后面有一个空格,确保authorization文件内容例如以下:
# cat authorization Key id_dsa_2048_a.pub |
完毕上述工作之后,从登录机到被登录机的ssh登录就不须要password了。
假设没有配置好免password登录。在启动时会遇到例如以下错误:
Starting namenodes on [172.25.40.171] 172.25.40.171: Host key not found from database. 172.25.40.171: Key fingerprint: 172.25.40.171: xofiz-zilip-tokar-rupyb-tufer-tahyc-sibah-kyvuf-palik-hazyt-duxux 172.25.40.171: You can get a public key‘s fingerprint by running 172.25.40.171: % ssh-keygen -F publickey.pub 172.25.40.171: on the keyfile. 172.25.40.171: warning: tcgetattr failed in ssh_rl_set_tty_modes_for_fd: fd 1: Invalid argument |
或下列这种错误:
Starting namenodes on [172.25.40.171] 172.25.40.171: hadoop‘s password: |
建议生成的私钥和公钥文件名称都带上自己的IP,否则会有些混乱。
依照中免password登录范围的说明。配置好全部的免password登录。很多其它关于免password登录说明,请浏览技术博客:
1) http://blog.chinaunix.net/uid-20682147-id-4212099.html(两个SSH2间免password登录)
2) http://blog.chinaunix.net/uid-20682147-id-4212097.html(SSH2免password登录OpenSSH)
3) http://blog.chinaunix.net/uid-20682147-id-4212094.html(OpenSSH免password登录SSH2)
本部分仅包含HDFS、MapReduce和Yarn的安装,不包含HBase、Hive等的安装。
Hadoop二进制安装包下载网址:http://hadoop.apache.org/releases.html#Download(或直接进入http://mirror.bit.edu.cn/apache/hadoop/common/进行下载),本文下载的是hadoop-2.4.0版本号(安装包:http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.4.0/hadoop-2.4.0.tar.gz,源代码包:http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.4.0/hadoop-2.4.0-src.tar.gz),并非稳定版本号,最新的稳定版本号是hadoop-2.2.0。
官方的安装说明请浏览Cluster Setup:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html。
1) 将Hadoop安装包hadoop-2.4.0.tar.gz上传到/data/hadoop文件夹下
2) 进入/data/hadoop文件夹
3) 在/data/hadoop文件夹下。解压安装包hadoop-2.4.0.tar.gz:tar xzf hadoop-2.4.0.tar.gz
4) 建立软件链接:ln -s /data/hadoop/hadoop-2.4.0 /data/hadoop/current
5) 改动用户主文件夹下的文件.profile(当然也能够是/etc/profile),设置Hadoop环境变量:
export HADOOP_HOME=/data/hadoop/current export PATH=$HADOOP_HOME/bin:$PATH |
须要又一次登录以生效,或者在终端上运行:export HADOOP_HOME=/data/hadoop/current也能够即时生效。
改动全部节点上的$HADOOP_HOME/etc/hadoop/hadoop-env.sh文件。在靠近文件头部分增加:export JAVA_HOME=/data/jdk
特别说明一下:尽管在/etc/profile已经加入了JAVA_HOME。但仍然得改动全部节点上的hadoop-env.sh,否则启动时,报例如以下所看到的的错误:
10.12.154.79: Error: JAVA_HOME is not set and could not be found. 10.12.154.77: Error: JAVA_HOME is not set and could not be found. 10.12.154.78: Error: JAVA_HOME is not set and could not be found. 10.12.154.78: Error: JAVA_HOME is not set and could not be found. 10.12.154.77: Error: JAVA_HOME is not set and could not be found. 10.12.154.79: Error: JAVA_HOME is not set and could not be found. |
为省去不必要的麻烦,建议在全部节点的/etc/hosts文件。都做例如以下所配置:
172.25.40.171 VM-40-171-sles10-64 # NameNode 172.25.39.166 VM-39-166-sles10-64 # SecondaryNameNode 10.12.154.77 DEVNET-154-77 # DataNode 10.12.154.78 DEVNET-154-70 # DataNode 10.12.154.79 DEVNET-154-79 # DataNode |
注意不要为一个IP配置多个不同主机名,否则HTTP页面可能无法正常运作。
主机名,如VM-39-166-sles10-64,可通过hostname命令取得。因为都配置了主机名,在启动HDFS或其他之前,须要确保针对主机名进行过ssh,否则启动时,会遇到例如以下所看到的的错误:
VM-39-166-sles10-64: Host key not found from database. VM-39-166-sles10-64: Key fingerprint: VM-39-166-sles10-64: xofiz-zilip-tokar-rupyb-tufer-tahyc-sibah-kyvuf-palik-hazyt-duxux VM-39-166-sles10-64: You can get a public key‘s fingerprint by running VM-39-166-sles10-64: % ssh-keygen -F publickey.pub VM-39-166-sles10-64: on the keyfile. VM-39-166-sles10-64: warning: tcgetattr failed in ssh_rl_set_tty_modes_for_fd: fd 1: Invalid argument |
上述错误表示没有以主机名ssh过一次VM-39-166-sles10-64。按下列方法修复错误:
ssh hadoop@VM-39-166-sles10-64 Host key not found from database. Key fingerprint: xofiz-zilip-tokar-rupyb-tufer-tahyc-sibah-kyvuf-palik-hazyt-duxux You can get a public key‘s fingerprint by running % ssh-keygen -F publickey.pub on the keyfile. Are you sure you want to continue connecting (yes/no)? yes Host key saved to /data/hadoop/.ssh2/hostkeys/key_36000_137vm_13739_137166_137sles10_13764.pub host key for VM-39-166-sles10-64, accepted by hadoop Thu Apr 17 2014 12:44:32 +0800 Authentication successful. Last login: Thu Apr 17 2014 09:24:54 +0800 from 10.32.73.69 Welcome to SuSE Linux 10 SP2 64Bit Nov 10,2010 by DIS Version v2.6.20101110 No mail. |
改动NameNode和SecondaryNameNode上的$HADOOP_HOME/etc/hadoop/slaves文件,将slaves的节点IP(也能够是对应的主机名)一个人加进去,一行一个IP,例如以下所看到的:
> cat slaves 10.12.154.77 10.12.154.78 10.12.154.79 |
配置文件放在$HADOOP_HOME/etc/hadoop文件夹下,对于Hadoop 2.3.0和Hadoop 2.4.0版本号,该文件夹下的core-site.xml、yarn-site.xml、hdfs-site.xml和mapred-site.xml都是空的。假设不配置好就启动,如运行start-dfs.sh。则会遇到各种错误。
可从$HADOOP_HOME/share/hadoop文件夹下拷贝一份到/etc/hadoop文件夹,然后在此基础上进行改动(下面内容能够直接拷贝运行,2.3.0版本号中各default.xml文件路径不同于2.4.0版本号):
# 进入$HADOOP_HOME文件夹 cd $HADOOP_HOME cp ./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml ./etc/hadoop/core-site.xml cp ./share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml ./etc/hadoop/hdfs-site.xml cp ./share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml ./etc/hadoop/yarn-site.xml cp ./share/doc/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml ./etc/hadoop/mapred-site.xml |
接下来,须要对默认的core-site.xml、yarn-site.xml、hdfs-site.xml和mapred-site.xml进行适当的改动。否则仍然无法启动成功。
对core-site.xml文件的改动,涉及下表中的属性:
属性名 |
属性值 |
涉及范围 |
fs.defaultFS |
hdfs://172.25.40.171:9000 |
全部节点 |
hadoop.tmp.dir |
/data/hadoop/current/tmp |
全部节点 |
dfs.datanode.data.dir |
/data/hadoop/current/data |
全部DataNode,在hdfs-site.xml也有这个属性 |
注意启动之前。须要将配置的文件夹创建好,如创建好/data/hadoop/current/tmp文件夹。具体可參考:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xm。
对hdfs-site.xml文件的改动,涉及下表中的属性:
属性名 |
属性值 |
涉及范围 |
dfs.namenode.rpc-address |
172.25.40.171:9001 |
全部节点 |
dfs.namenode.secondary.http-address |
172.25.39.166:50090 |
NameNode SecondaryNameNode |
dfs.namenode.name.dir |
/data/hadoop/current/dfs/name |
NameNode SecondaryNameNode |
dfs.datanode.data.dir |
/data/hadoop/current/data |
全部DataNode |
具体配置可參考:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml。
假设没有配置,则启动时报例如以下错误:
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured. |
这里须要指定IP和port,假设仅仅指定了IP。如<value>172.25.40.171</value>。则启动时输出例如以下:
Starting namenodes on [] |
改成“<value>172.25.40.171:9001</value>”后,则启动时输出为:
Starting namenodes on [172.25.40.171] |
对hdfs-site.xml文件的改动,涉及下表中的属性:
属性名 |
属性值 |
涉及范围 |
mapreduce.framework.name |
yarn |
|
具体配置可參考:
对yarn-site.xml文件的改动。涉及下表中的属性:
属性名 |
属性值 |
涉及范围 |
yarn.resourcemanager.hostname |
172.25.40.171 |
ResourceManager NodeManager |
yarn.nodemanager.hostname |
0.0.0.0 |
全部的NodeManager |
yarn.nodemanager.hostname假设配置成具体的IP。如10.12.154.79,则会导致每一个NamoManager的配置不同。具体配置可參考:
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml。
在启动HDFS之前,须要先完毕对NameNode的格式化。
1) 进入$HADOOP_HOME/bin文件夹
2) 进行格式化:./hdfs namenode -format
假设完毕有。输出包括“INFO util.ExitUtil: Exiting with status 0”。则表示格式化成功。
在进行格式化时。假设没有在/etc/hosts文件里加入主机名和IP的映射:“172.25.40.171 VM-40-171-sles10-64”,则会报例如以下所看到的错误:
14/04/17 03:44:09 WARN net.DNS: Unable to determine local hostname -falling back to "localhost" java.net.UnknownHostException: VM-40-171-sles10-64: VM-40-171-sles10-64: unknown error at java.net.InetAddress.getLocalHost(InetAddress.java:1484) at org.apache.hadoop.net.DNS.resolveLocalHostname(DNS.java:264) at org.apache.hadoop.net.DNS.<clinit>(DNS.java:57) at org.apache.hadoop.hdfs.server.namenode.NNStorage.newBlockPoolID(NNStorage.java:945) at org.apache.hadoop.hdfs.server.namenode.NNStorage.newNamespaceInfo(NNStorage.java:573) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:144) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:845) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1256) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1370) Caused by: java.net.UnknownHostException: VM-40-171-sles10-64: unknown error at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:907) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1302) at java.net.InetAddress.getLocalHost(InetAddress.java:1479) ... 8 more |
1) 进入$HADOOP_HOME/sbin文件夹
2) 启动HDFS:./start-dfs.sh
启动时,遇到例如以下所看到的的错误。则表示NameNode不能免password登录自己。假设之前使用IP能够免password登录自己,则原因通常是由于没有使用主机名登录过自己。因此解决的方法是使用主机名SSH一下,比方:ssh hadoop@VM_40_171_sles10_64,然后再启动。
Starting namenodes on [VM_40_171_sles10_64] VM_40_171_sles10_64: Host key not found from database. VM_40_171_sles10_64: Key fingerprint: VM_40_171_sles10_64: xofiz-zilip-tokar-rupyb-tufer-tahyc-sibah-kyvuf-palik-hazyt-duxux VM_40_171_sles10_64: You can get a public key‘s fingerprint by running VM_40_171_sles10_64: % ssh-keygen -F publickey.pub VM_40_171_sles10_64: on the keyfile. VM_40_171_sles10_64: warning: tcgetattr failed in ssh_rl_set_tty_modes_for_fd: fd 1: Invalid argument |
1) 使用JDK提供的jps命令,查看对应的进程是否已启动
2) 检查$HADOOP_HOME/logs文件夹下的log和out文件,看看是否有异常信息。
运行jps命令,可看到DataNode进程:
$ jps 18669 DataNode 24542 Jps |
运行jps命令,可看到NameNode进程:
$ jps 18669 NameNode 24542 Jps |
运行jps命令。可看到:
$ jps 24542 Jps 3839 SecondaryNameNode |
执行HDFS命令。以进一步检验是否已经成功安装和配置好。关于HDFS命令的使用方法。直接执行命令hdfs或hdfs dfs,就可以看到相关的使用方法说明。
“hdfs dfs -ls”带一个參数,假设參数以“hdfs://URI”打头表示訪问HDFS。否则相当于ls。当中URI为NameNode的IP或主机名。能够包括port号,即hdfs-site.xml中“dfs.namenode.rpc-address”指定的值。
“hdfs dfs -ls”要求默认port为8020,假设配置成9000,则须要指定port号,否则不用指定port,这一点类似于浏览器訪问一个URL。演示样例:
> hdfs dfs -ls hdfs://172.25.40.171:9001/ |
9001后面的斜杠/是和必须的,否则被当作文件。假设不指定port号9001,则使用默认的8020,“172.25.40.171:9001”由hdfs-site.xml中“dfs.namenode.rpc-address”指定。
不难看出“hdfs dfs -ls”能够操作不同的HDFS集群,仅仅须要指定不同的URI。
文件上传后。被存储在DataNode的data文件夹下(由DataNode的hdfs-site.xml中的属性“dfs.datanode.data.dir”指定),如:
$HADOOP_HOME/data/current/BP-139798373-172.25.40.171-1397735615751/current/finalized/blk_1073741825
文件名称中的“blk”是block,即块的意思。默认情况下blk_1073741825即为文件的一个完整块。Hadoop未对它进额外处理。
上传文件命令。演示样例:
> hdfs dfs -put /etc/SuSE-release hdfs://172.25.40.171:9001/ |
删除文件命令,演示样例:
> hdfs dfs -rm hdfs://172.25.40.171:9001/SuSE-release Deleted hdfs://172.25.40.171:9001/SuSE-release |
1) 进入$HADOOP_HOME/sbin文件夹
2) 运行:start-yarn.sh,即開始启动YARN
若启动成功,则在Master节点运行jps,能够看到ResourceManager:
> jps 24689 NameNode 30156 Jps 28861 ResourceManager |
在Slaves节点运行jps,能够看到NodeManager:
$ jps 14019 NodeManager 23257 DataNode 15115 Jps |
列举YARN集群中的全部NodeManager,如:
> yarn node -list Total Nodes:3 Node-Id Node-State Node-Http-Address Number-of-Running-Containers localhost:45980 RUNNING localhost:8042 0 localhost:47551 RUNNING localhost:8042 0 localhost:58394 RUNNING localhost:8042 0 |
查看指定NodeManager的状态,如:
> yarn node -status localhost:47551 Node Report : Node-Id : localhost:47551 Rack : /default-rack Node-State : RUNNING Node-Http-Address : localhost:8042 Last-Health-Update : 星期五 18/四月/14 01:45:41:555GMT Health-Report : Containers : 0 Memory-Used : 0MB Memory-Capacity : 8192MB CPU-Used : 0 vcores CPU-Capacity : 8 vcores |
在安装文件夹的share/hadoop/mapreduce子文件夹下,有现存的演示样例程序:
hadoop@VM-40-171-sles10-64:~/current> ls share/hadoop/mapreduce hadoop-mapreduce-client-app-2.4.0.jar hadoop-mapreduce-client-jobclient-2.4.0-tests.jar hadoop-mapreduce-client-common-2.4.0.jar hadoop-mapreduce-client-shuffle-2.4.0.jar hadoop-mapreduce-client-core-2.4.0.jar hadoop-mapreduce-examples-2.4.0.jar hadoop-mapreduce-client-hs-2.4.0.jar lib hadoop-mapreduce-client-hs-plugins-2.4.0.jar lib-examples hadoop-mapreduce-client-jobclient-2.4.0.jar sources |
跑一个演示样例程序试试:
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar wordcount ./in ./out |
wordcount执行完毕后,结果会保存在out文件夹下。保存结果的文件名称类似于“part-r-00000”。
另外。跑这个演示样例程序有两个需求注意的点:
1) in文件夹下要有文本文件。或in即为被统计的文本文件。能够为HDFS上的文件或文件夹,也能够为本地文件或文件夹
2) out文件夹不能存在。程序会自己主动去创建它。假设已经存在则会报错。
包hadoop-mapreduce-examples-2.4.0.jar中含有多个演示样例程序。不带參数执行,就可以看到使用方法:
> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar wordcount Usage: wordcount <in> <out>
> hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar An example program must be given as the first argument. Valid program names are: aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi. dbcount: An example job that count the pageview counts from a database. distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files. wordmean: A map/reduce program that counts the average length of the words in the input files. wordmedian: A map/reduce program that counts the median length of the words in the input files. wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files. |
原因可能是指定的port号9000不正确。该port号由hdfs-site.xml中的属性“dfs.namenode.rpc-address”指定。即为NameNode的RPC服务port号。
文件上传后,被存储在DataNode的data(由DataNode的hdfs-site.xml中的属性“dfs.datanode.data.dir”指定)文件夹下,如:
$HADOOP_HOME/data/current/BP-139798373-172.25.40.171-1397735615751/current/finalized/blk_1073741825
文件名称中的“blk”是block。即块的意思,默认情况下blk_1073741825即为文件的一个完整块,Hadoop未对它进额外处理。
hdfs dfs -ls hdfs://172.25.40.171:9000 14/04/17 12:04:02 WARN conf.Configuration: mapred-site.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 14/04/17 12:04:02 WARN conf.Configuration: mapred-site.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/04/17 12:04:02 WARN conf.Configuration: mapred-site.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 14/04/17 12:04:02 WARN conf.Configuration: mapred-site.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/04/17 12:04:02 WARN conf.Configuration: mapred-site.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 14/04/17 12:04:02 WARN conf.Configuration: mapred-site.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /data/hadoop/hadoop-2.4.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It‘s highly recommended that you fix the library with ‘execstack -c <libfile>‘, or link it with ‘-z noexecstack‘. 14/04/17 12:04:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/04/17 12:04:03 WARN conf.Configuration: mapred-site.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 14/04/17 12:04:03 WARN conf.Configuration: mapred-site.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. ls: Call From VM-40-171-sles10-64/172.25.40.171 to VM-40-171-sles10-64:9000 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused |
“Incompatible clusterIDs”的错误原因是在运行“hdfs namenode -format”之前,没有清空DataNode节点的data文件夹。
网上一些文章和帖子说是tmp文件夹,它本身也是没问题的,但Hadoop 2.4.0是data文件夹,实际上这个信息已经由日志的“/data/hadoop/hadoop-2.4.0/data”指出,所以不能死死的參照网上的解决的方法。遇到问题时多细致观察。
从上述描写叙述不难看出。解决的方法就是清空全部DataNode的data文件夹。但注意不要将data文件夹本身给删除了。 data文件夹由core-site.xml文件里的属性“dfs.datanode.data.dir”指定。
2014-04-17 19:30:33,075 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/hadoop/hadoop-2.4.0/data/in_use.lock acquired by nodename 28326@localhost 2014-04-17 19:30:33,078 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool <registering> (Datanode Uuid unassigned) service to /172.25.40.171:9001 java.io.IOException: Incompatible clusterIDs in /data/hadoop/hadoop-2.4.0/data: namenode clusterID = CID-50401d89-a33e-47bf-9d14-914d8f1c4862; datanode clusterID = CID-153d6fcb-d037-4156-b63a-10d6be224091 at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:472) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815) at java.lang.Thread.run(Thread.java:744) 2014-04-17 19:30:33,081 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to /172.25.40.171:9001 2014-04-17 19:30:33,184 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN java.lang.Exception: trace at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143) at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:859) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837) at java.lang.Thread.run(Thread.java:744) 2014-04-17 19:30:33,184 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned) 2014-04-17 19:30:33,184 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN java.lang.Exception: trace at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:861) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837) at java.lang.Thread.run(Thread.java:744) 2014-04-17 19:30:35,185 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode 2014-04-17 19:30:35,187 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0 2014-04-17 19:30:35,189 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at localhost/127.0.0.1 ************************************************************/ |
SecondaryNameNode中的“Inconsistent checkpoint fields”错误原因,可能是由于没有设置好SecondaryNameNode上core-site.xml文件里的“hadoop.tmp.dir”。
2014-04-17 11:42:18,189 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Log Size Trigger :1000000 txns 2014-04-17 11:43:18,365 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint java.io.IOException: Inconsistent checkpoint fields. LV = -56 namespaceID = 1384221685 cTime = 0 ; clusterId = CID-319b9698-c88d-4fe2-8cb2-c4f440f690d4 ; blockpoolId = BP-1627258458-172.25.40.171-1397735061985. Expecting respectively: -56; 476845826; 0; CID-50401d89-a33e-47bf-9d14-914d8f1c4862; BP-2131387753-172.25.40.171-1397730036484. at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:135) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:518) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:383) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:349) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:345) at java.lang.Thread.run(Thread.java:744)
另外,也请配置好SecondaryNameNode上hdfs-site.xml中的“dfs.datanode.data.dir”为合适的值: <property> <name>hadoop.tmp.dir</name> <value>/data/hadoop/current/tmp</value> <description>A base for other temporary directories.</description> </property> |
《HBase-0.98.0分布式安装指南》
《Hive 0.12.0安装指南》
《ZooKeeper-3.4.6分布式安装指南》
《Hadoop 2.3.0源代码反向project》
《在Linux上编译Hadoop-2.4.0》
《Accumulo-1.5.1安装指南》
《Drill 1.0.0安装指南》
《Shark 0.9.1安装指南》
很多其它,敬请关注技术博客:http://aquester.cublog.cn。
标签:mon invalid 目的 lin success oca 登录 direct 常见错误
原文地址:http://www.cnblogs.com/jhcelue/p/6784631.html