标签:支持 进程 新建 namenode 没有 方法 gen core 设置
Hadoop [翻译大象]
广义: 以apache hadoop软件为主的生态圈(hive sqoop spark flink....)
狭义: apache hadoop软件
以后这些网站经常用到
hadoop.apache.org
hive.apache.org
spark.apache.org
hadoop软件:
1.x 企业不用
2.x 主流
3.x 很少敢用 坑 只能自己踩[应为hadoop需要和别的生态圈结合,他们可能是不同的公司开发]
但是CDH(www.cloudera.com) ,CDH它能解决的问题: 1. 版本兼容 2.统一的部署管理 很受企业欢迎[应为它可以通过web页面进行点下一步就部署了],和CDH相同的另一个是华为的HDP
上面的CDH或者HDP是拿apache的hadoop源代码 ,封装成自己的hadoop版本 ,且自己打补丁
CDH有收费版和免费版本,收费的它提供一些技术支持,和一些统计功能, 不过免费的在企业可以用,联通或者移动也在用免费版本.
http://archive.cloudera.com/cdh5/cdh/5/ 这个CDH封装的一些组件地址
hadoop-2.6.0-cdh5.7.0.tar.gz 280M
hadoop-2.6.0-cdh5.16.2.tar.gz 400M
hive-1.1.0-cdh5.16.2.tar.gz
以前在企业中使用过这四个版本 CDH5.4.8 5.8.6 5.12.0 5.16.1
慎用CDH5.11.0 这个版本有bug
##############################################################################################################
hadoop软件:
1.存储 hdfs[分布式文件系统] 需要部署, 常用的存储有(hdfs、hive 、hbase、 kudu)
2.计算 mapreduce[分布式计算] job ,mapreduce企业中不用,用后面的来计算 (hive sql它和mysql语法及其相似、spark、flink)
3.资源(memory cpu)和作业调度[比如让一个作业运行在第55台机子上] 需要用到yarn 需要部署
普及:
1台 我们实验在一台服务器上面安装hadoop
就业班我们用集群【多台服务器】(CDH Hadoop集群 Kafka集群 +高级班、线下班项目) 按小时付费5毛钱
目标: 大数据开发 idea 自己笔记本安装的
一般企业中的服务器在8W~12W以上
##############################################################################################################
下面来安装hadoop【我们用二进制解压直接使用】
1.wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0.tar.gz #这是是二进制包,解压就可以用了
或 rz上传[软件在我们的目录中了]
2.创建专门维护的用户 hadoop
[root@hadoop001 ~]# useradd hadoop
[root@hadoop001 ~]# su - hadoop
[hadoop@hadoop001 ~]$ pwd
/home/hadoop
[hadoop@hadoop001 ~]$
3.JDK部署
1、源码包准备:
首先从官网上下载jdk-8u45-linux-x64.gz。
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
2、解压源码包
2.1 通过终端在/usr/目录下新建java文件夹,命令行:
1
|
mkdir /usr/java |
2.2 然后进入java目录,命令行:
1
|
cd /usr/java |
2.3 将jdk-8u45-linux-x64.gz文件下载到/usr/local/java 目录下。
2.4 解压压缩包,命令行:
1
|
tar -zxvf jdk-8u45-linux-x64.gz |
然后可以把压缩包删除,命令行:
1
|
rm -f jdk-8u45-linux-x64.gz |
3、设置jdk环境变量
这里采用全局设置方法,就是修改etc/profile,它是所有用户的共用的环境变量。
1
|
[root@hadoop000 bin]#vi /etc/profile |
打开之后在末尾添加如下配置:
1
2
|
export JAVA_HOME=/usr/java/jdk1.8.0_45 export PAHT=$JAVA_HOME/bin:$PATH |
使环境变量生效:
1
|
source /etc/profile |
看看配置是否都正确:
1
2
|
[root@hadoop000 bin]# echo $JAVA_HOME /usr/java/jdk1.8.0_45
|
4、检验是否安装成功
1
2
3
4
|
[root@hadoop000 bin]# java -version java version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) |
[hadoop@hadoop001 ~]$ which java
/usr/java/jdk1.8.0_45/bin/java
[hadoop@hadoop001 ~]$
目录:务必JDK安装这个目录 /usr/java/ [应为CDH的一些脚本在这个目录下]
用户用户组修正
全局环境变量配置 ,置前
4.生产上规划目录
~/app/
~/software/
~/data/
~/logs/
J哥的公司目录是在如下的目录里
/opt/software/
5.解压,自行创建一个目录/home/hadoop/software,把上面的hadoop解压到这个目录,然后做个软连接在software目录下,叫做hadoop,然后把这个目录的属主和属租为hadoop
[hadoop@hadoop001 hadoop]$ ll
total 76
drwxr-xr-x. 2 hadoop hadoop 4096 Mar 24 2016 bin 可执行脚本文件夹
drwxr-xr-x. 2 hadoop hadoop 4096 Mar 24 2016 bin-mapreduce1
drwxr-xr-x. 3 hadoop hadoop 4096 Mar 24 2016 cloudera
drwxr-xr-x. 6 hadoop hadoop 4096 Mar 24 2016 etc 配置文件夹
drwxr-xr-x. 5 hadoop hadoop 4096 Mar 24 2016 examples
drwxr-xr-x. 3 hadoop hadoop 4096 Mar 24 2016 examples-mapreduce1
drwxr-xr-x. 2 hadoop hadoop 4096 Mar 24 2016 include
drwxr-xr-x. 3 hadoop hadoop 4096 Mar 24 2016 lib jar包存储文件夹
drwxr-xr-x. 2 hadoop hadoop 4096 Mar 24 2016 libexec
-rw-r--r--. 1 hadoop hadoop 17087 Mar 24 2016 LICENSE.txt
-rw-r--r--. 1 hadoop hadoop 101 Mar 24 2016 NOTICE.txt
-rw-r--r--. 1 hadoop hadoop 1366 Mar 24 2016 README.txt
drwxr-xr-x. 3 hadoop hadoop 4096 Mar 24 2016 sbin hadoop组件 启动 停止
drwxr-xr-x. 4 hadoop hadoop 4096 Mar 24 2016 share
drwxr-xr-x. 17 hadoop hadoop 4096 Mar 24 2016 src
[hadoop@hadoop001 hadoop]$ pwd
hadoop-2.6.0-cdh5.7.0-src.tar.gz #源代码 需要编译
hadoop-2.6.0-cdh5.7.0.tar.gz #他后面什么都不跟,或者bin.tar.gz这种类型的就是编译后的包,我们下载的就是这个二进制包
6.官网
CDH 需要依赖jdk
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0/
Now you are ready to start your Hadoop cluster in one of the three supported modes:
Local (Standalone) Mode 0进程 1台
Pseudo-Distributed Mode 伪分布式 多个进程 部署在1台 ,基础班我们这种方式部署
Fully-Distributed Mode 集群 多个进程 部署在n台
7.配置
Configuration
Use the following:
etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value> #应为单机部署,所以副本是1
</property>
</configuration>
8.生成秘钥对,来实现无密码登录
Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands: #如果上面能链接上去,那么下面不用操作了
$ ssh-keygen -t dsa -P ‘‘ -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
上面的是官网生成的密钥对,太麻烦了,我们用下面的命令
[hadoop@hadoop001 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory ‘/home/hadoop/.ssh‘.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
44:3c:e9:e7:09:51:c1:34:47:e8:b7:12:3b:c2:e8:04 hadoop@hadoop001
The key‘s randomart image is:
+--[ RSA 2048]----+
| ..==+o |
| .= oo |
| ..+ |
| E .o + . |
| . oS+ = . |
| o o * . |
| o . o |
| . |
| |
+-----------------+
[hadoop@hadoop001 .ssh]$ cat id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA2IeNSO6ruio3gXf/t6wtZlDu+Y+zimuuePwVtWB6f8GZZrrHmIysvV/LN8L4tXjJ0vOo2Zv
+IQjrNeqy6tiYGGQEV9xLKaX7bM55HuiBMOw4wOFVJseVR4gfNdV7u7oX349VZ8YRxoD3gw0tFQWZcbpzCcVdUTTj1T2tJULf7a07h0x9WT0+ftMHQ7UZp2V42H39tWuO0JnYJrKRncZ2OVZbu1dGWoItOKS9KwxB
+q1s9jhs/TCjAbXpzvBmK655IQE/6S9nYbpt86gQOkAKllj6EW3OPs7sMk6rW62+v+ZGZeGdTX0xyjDNhmxLpVtkKKXoL/EYjJ87TEerNXvgjQ== hadoop@hadoop001
[hadoop@hadoop001 .ssh]$ cat id_rsa.pub >> authorized_keys
[hadoop@hadoop001 .ssh]$ ssh localhost date #会失败的
The authenticity of host ‘localhost (::1)‘ can‘t be established.
RSA key fingerprint is 89:72:a6:6b:e2:b2:5a:4f:a4:0e:ec:ce:09:05:ec:02.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘localhost‘ (RSA) to the list of known hosts.
hadoop@localhost‘s password:
Permission denied, please try again.
hadoop@localhost‘s password:
Permission denied, please try again.
hadoop@localhost‘s password:
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
[hadoop@hadoop001 .ssh]$ chmod 600 authorized_keys
[hadoop@hadoop001 .ssh]$ ssh localhost ssh登录localhost机器
[hadoop@hadoop001 .ssh]$ ssh localhost date ssh登录localhost机器执行date命令 返回结果 ,不会做机器切换,也就是在远程上面执行命令,然后退回来
Sun Jun 30 21:36:58 CST 2019
9.正式开始
Format the filesystem:
$ bin/hdfs namenode -format 这是格式化文件系统为hdfs,通常我们是ex4,或者xfs文件系统,然后hdfs文件系统在它里面,然后把hdfs给它独立出来一个文件系统
19/06/30 21:45:52 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
Start NameNode daemon and DataNode daemon:
$ sbin/start-dfs.sh
The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).
Browse the web interface for the NameNode; by default it is available at:
NameNode - http://localhost:50070/
http://localhost:50070/
Make the HDFS directories required to execute MapReduce jobs:
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/<username>
10.使用jps命令
[hadoop@hadoop001 sbin]$ jps
4887 DataNode
5208 Jps
4795 NameNode
5100 SecondaryNameNode
11.hdfs命令操作格式 hdfs dfs -xxx
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -mkdir /user 在hdfs文件系统中创建一个文件夹
19/06/30 21:58:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -ls /user
19/06/30 21:59:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -ls /
19/06/30 21:59:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2019-06-30 21:58 /user
[hadoop@hadoop001 hadoop]$
Copy the input files into the distributed filesystem:
$ bin/hdfs dfs -put etc/hadoop input 如果input前面什么都不跟,那么相当于传到/user/hadoop/input里面,其中hadoop是当前的用户
Run some of the examples provided:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar grep input output ‘dfs[a-z.]+‘
Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:
$ bin/hdfs dfs -get output output 第一个output也没有根,那么其实也是/user/hadoop/output目录
$ cat output/*
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -ls /user/hadoop/output
19/06/30 22:07:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2019-06-30 22:05 /user/hadoop/output/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 197 2019-06-30 22:05 /user/hadoop/output/part-r-00000
[hadoop@hadoop001 hadoop]$
View the output files on the distributed filesystem:
$ bin/hdfs dfs -cat output/*
When you‘re done, stop the daemons with:
$ sbin/stop-dfs.sh
------------------------------------
1.hdfs伪分布式部署 写博客
2.总结ssh 写博客 尤其注意权限
标签:支持 进程 新建 namenode 没有 方法 gen core 设置
原文地址:https://www.cnblogs.com/python8/p/11927606.html