标签:hadoop cdh cloudera manager
Hadoop是一个复杂的系统组合,搭建一个用于生产的Hadoop环境是一件非常麻烦的事情。但这个世界上总有一些牛人会帮你解决一些看似痛苦的问题,如果现在没有,那也是早晚的事。CDH是Cloudera旗下的Hadoop套装环境,CDH的相关介绍请各位亲自己查阅www.cloudera.com,我就不再多说了。这里主要是介绍使用CDH5.3安装一个可以用于生产的Hadoop环境。虽然人家Cloudera牛人帮你解决了hadoop安装的问题,但随之而来的是:Cloudera Manager的安装不比hadoop的安装来得简单,而且有很多坑,后面的文章里我们将一一踩过去。
第一篇 环境准备
一、服务器准备:
我们准备一个12台的小集群,所有服务器安装Redhat 6.4 server x64 操作系统。服务器的hostname统一命名为server[1-12].cdhwork.org,内网ip地址为192.168.10.[1-12],所有服务器都必须设置DNS服务器(可以用202.96.209.5或者8.8.8.8),所有服务器的root密码必须设置成一样的。
服务器(cdhwork.org) | ip地址 | 安装的角色 |
---|---|---|
server1 | 192.168.10.1 | CDH本地镜像,cloudera manager,时间服务器 |
server2 | 192.168.10.2 | Cloudera Management Service Host Monitor Cloudera Management Service Service Monitor |
server3 | 192.168.10.3 | HDFS NameNode Hive Gateway Impala Catalog Server Cloudera Management Service Alert Publisher Spark Gateway ZooKeeper Server |
server4 | 192.168.10.4 | HDFS SecondaryNameNode Hive Gateway Impala StateStore Solr Server Spark Gateway YARN (MR2 Included) ResourceManager ZooKeeper Server |
server5 | 192.168.10.5 | HDFS Balancer Hive Gateway Hue Server Cloudera Management Service Activity Monitor Oozie Server Spark Gateway Sqoop 2 Server ZooKeeper Server |
server6 | 192.168.10.6 | HBase Master Hive Gateway MapReduce JobTracker Solr Server Spark Gateway YARN (MR2 Included) JobHistory Server ZooKeeper Server |
server7 | 192.168.10.7 | HBase REST Server HBase Thrift Server Hive Metastore Server HiveServer2 Key-Value Store Indexer Lily HBase Indexer Cloudera Management Service Event Server Spark History Server |
server8 | 192.168.10.8 | HBase RegionServer HDFS DataNode Impala Daemon MapReduce TaskTracker YARN (MR2 Included) NodeManager |
server9 | 192.168.10.9 | HBase RegionServer HDFS DataNode Impala Daemon MapReduce TaskTracker YARN (MR2 Included) NodeManager |
server10 | 192.168.10.10 | HBase RegionServer HDFS DataNode Impala Daemon MapReduce TaskTracker YARN (MR2 Included) NodeManager |
server11 | 192.168.10.11 | HBase RegionServer HDFS DataNode Impala Daemon MapReduce TaskTracker YARN (MR2 Included) NodeManager |
server12 | 192.168.10.12 | HBase RegionServer HDFS DataNode Impala Daemon MapReduce TaskTracker YARN (MR2 Included) NodeManager |
以下操作请用root账户在所有服务器上执行相同操作。
1、关闭防火墙
/etc/init.d/iptables stop #关闭防火墙 chkconfig iptables off #设置启动时关闭防火墙服务
命令行执行: setenforce 0编辑配置文件以便重启后保持设置:
vi /etc/selinux/config
# This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - No SELinux policy is loaded. SELINUX=disabled修改SELINUX=disabled,保存退出。
3、加快内存释放
执行命令: sysctl vm.swappiness=0
vi /etc/sysctl.conf
# Controls the maximum shared segment size, in bytes kernel.shmmax = 68719476736 # Controls the maximum number of shared memory segments, in pages kernel.shmall = 4294967296 vm.swappiness = 0增加vm.swappiness = 0,保存退出。
4、关闭redhat的内存hugepage
执行命令: echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag编辑配置文件以便重启后保持设置:
vi /etc/rc.local
#!/bin/sh # # This script will be executed *after* all the other init scripts. # You can put your own initialization stuff in here if you don't # want to do the full Sys V style init stuff. echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag touch /var/lock/subsys/local增加echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag,保存退出。
5、修改hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 # CDH本地镜像 192.168.10.1 archive.cloudera.com # ClouderaManager 192.168.10.1 server1.cdhwork.org # Cloudera Management Service Host Monitor,Cloudera Management Service Service Monitor 192.168.10.2 server2.cdhwork.org # HDFS NameNode,Hive Gateway,Impala Catalog Server,Cloudera Management Service Alert Publisher,Spark Gateway,ZooKeeper Server 192.168.10.3 server3.cdhwork.org # HDFS SecondaryNameNode,Hive Gateway,Impala StateStore,Solr Server,Spark Gateway,YARN (MR2 Included) ResourceManager,ZooKeeper Server 192.168.10.4 server4.cdhwork.org # HDFS Balancer,Hive Gateway,Hue Server,Cloudera Management Service Activity Monitor,Oozie Server,Spark Gateway,Sqoop 2 Server,ZooKeeper Server 192.168.10.5 server5.cdhwork.org # HBase Master,Hive Gateway,MapReduce JobTracker,Solr Server,Spark Gateway,YARN (MR2 Included) JobHistory Server,ZooKeeper Server,Postgresql-9.2 192.168.10.6 server6.cdhwork.org # HBase REST Server,HBase Thrift Server,Hive Metastore Server,HiveServer2,Key-Value Store Indexer Lily HBase Indexer,Cloudera Management Service Event Server,Spark History Server 192.168.10.7 server7.cdhwork.org # HBase RegionServer,HDFS DataNode,Impala Daemon,MapReduce TaskTracker,YARN (MR2 Included) NodeManager 192.168.10.8 server8.cdhwork.org 192.168.10.9 server9.cdhwork.org 192.168.10.10 server10.cdhwork.org 192.168.10.11 server11.cdhwork.org 192.168.10.12 server12.cdhwork.org
6、配置yum源
cd /etc/yum.repos.d/ mv rhel-source.repo rhel-source.repo.bak vi rhel-source.repo
[base] name=CentOS-6.6 - Base baseurl=http://mirrors.163.com/centos/6.6/os/x86_64/ gpgcheck=1 gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-6 exclude=postgresql* #released updates [updates] name=CentOS-$releasever - Updates baseurl=http://mirrors.163.com/centos/6.6/updates/x86_64/ gpgcheck=1 gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-6 exclude=postgresql* #packages used/produced in the build but not released #[addons] #name=CentOS-$releasever - Addons #baseurl=http://mirrors.163.com/centos/6.6/addons/x86_64/ #gpgcheck=1 #gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-6 #additional packages that may be useful [extras] name=CentOS-$releasever - Extras baseurl=http://mirrors.163.com/centos/6.6/extras/x86_64/ gpgcheck=1 gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-6 #additional packages that extend functionality of existing packages [centosplus] name=CentOS-$releasever - Plus baseurl=http://mirrors.163.com/centos/6.6/centosplus/x86_64/ gpgcheck=1 enabled=0
保存修改退出。修改yum源是为了让后面的安装速度更快些。当然,有个前提是你所有的服务器都可以访问外网,如果不能,要么安装一个代理服务器,代理访问外网;要么直接自己做一个yum源的镜像来提供服务。建议自建yum镜像,这样既方便又省事,唯一缺陷就是占一点磁盘空间。
自建镜像站点可以用wget -r <target>命令复制目标站点下所有内容,用httpd服务来一共web访问,如果你的复制站点在/usr/site,你可以直接在/var/www/html/下创建一个软连接:
ln -s /usr/site /var/www/html/site
7、更新服务器环境到最新设置
yum update
这样做是为了让后面的Cloudera安装时尽量不出错,因为很多时候Cloudera会莫名其妙的报依赖的rpm资源包不存在,或者版本太旧啥的。才开始一直不知道如何解决,后来一狠心做了一次系统更新,竟然解决了!虽然不知道为啥会这样,但总算解决了问题不是?所以大家辛苦点更新一下吧,如果网络速度够快,花不了多长时间。
标签:hadoop cdh cloudera manager
原文地址:http://blog.csdn.net/duxu2004/article/details/42562249