码迷,mamicode.com
首页 > 数据库 > 详细

datacell&impala数据库群集搭建部署安装手册

时间:2015-11-02 18:57:14      阅读:1188      评论:0      收藏:0      [点我收藏+]

标签:

============= 基础环境准备 ==========

1. 节点规划:
集群环境为3节点
主节点:dc1 --- 172.16.100.165
从节点:dc2 --- 172.16.100.166
从节点:dc3 --- 172.16.100.167

2.  主机名修改 为 dc1/dc2/dc3 (仅修改HOSTNAME= 这行)
dc1上:
vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=dc1.com
 
dc2上:
vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=dc2.com
 
dc3上:
vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=dc3.com
 
-- vim /etc/hosts  
172.16.100.165 dc1 dc1.com
172.16.100.166 dc2 dc2.com
176.16.100.167 dc3 dc3.com

3. 重启liunx生效 (以上修改完后 3个节点机器均重启)
  #shutdown -r now

 
4. 目录创建 (dc1/dc2/dc3要都要创建)(因当前环境存储空间所限,目录建在了/home下,实际生产上应该规划好软件目录和数据目录)
--datacell和impala的存放目录
mkdir -p /home/geedata
 
-- 建立datacell的数据存放目录 (有多个块磁盘,就应该建多少个目录,每块盘mount到对应的目录上,以实现数据均衡冗余)
mkdir /home/geedata/data   # datacell的数据及元数据的存放目录

-- datacell的数据存放目录
mkdir -p /home/geedata/data/sdb  #对应DataCell.xml的  <Id>yy_webmedia_detail_0_volume</Id>
mkdir -p /home/geedata/data/sdc  #对应DataCell.xml的  <Id>yy_webmedia_detail_1_volume</Id>
mkdir -p /home/geedata/data/sdd  #对应DataCell.xml的  <Id>yy_webmedia_detail_2_volume</Id>
mkdir -p /home/geedata/data/sde  #对应DataCell.xml的  <Id>yy_webmedia_detail_3_volume</Id>
mkdir -p /home/geedata/data/sdf  #对应DataCell.xml的  <Id>yy_webmedia_detail_4_volume</Id>

-- 建立datacell元数据目录
mkdir -p /home/geedata/data/meta4dc
 
 
5. ssh互信配置 (要关闭selinux,否则会把 .ssh目录权限自动修改,还要关闭iptables)
[root@dc1 ~]#
 
-- dc1/dc2/dc3 3个节点都执行
 
mkdir ~/.ssh
chmod 755 ~/.ssh  ## 700会有问题,必须755权限
 
cd ~/.ssh
ssh-keygen -t rsa  ##会有3次提示 (3次提示全部直接回车)
 
ssh-keygen -t dsa  ##会有3次提示 (3次提示全部直接回车)
 
 
--在dc1上执行
--把dc1的公钥输入到公共文件authorized_keys中
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
 
--把dc2的公钥输入到公共文件authorized_keys中
ssh dc2 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys   ##输入当前用户(root)密码 www.geedata.com
ssh dc2 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys   ##输入当前用户(root)密码 www.geedata.com
 
--把dc3的公钥输入到公共文件authorized_keys中
ssh dc3 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys   ##输入当前用户(root)密码 www.geedata.com
ssh dc3 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys   ##输入当前用户(root)密码 www.geedata.com
 
--把输入好的公共文件authorized_keys复制到dc2、dc3中
scp ~/.ssh/authorized_keys dc2:~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys dc3:~/.ssh/authorized_keys
 
 
-- 验证 在各节点上执行如下,如能正常显示时间(date),说明ssh互信正常(第1次有的会提示输入口令,记录到known_hosts文件里,之后便不会提示 )
ssh dc1 date
ssh dc2 date
ssh dc3 date
 
PS:
1) 环境搭建分2个大环节:datacell搭建和impala搭建
2)问题一例 (注意ip和hosts文件等要配置正确)
[root@dc1 ~]# ssh dc3
ssh: connect to host dc3 port 22: Connection refused
 
[root@dc3 ~]# ping dc3  (延迟高,因为ip把172错写成了176了,这样连接的ip是外网了)
PING dc3 (176.16.100.167) 56(84) bytes of data.
64 bytes from dc3 (176.16.100.167): icmp_seq=1 ttl=39 time=3927 ms
64 bytes from dc3 (176.16.100.167): icmp_seq=3 ttl=39 time=2093 ms
 


============== datacell集群安装部署 ============
 
---------------------- 拷贝"安装包" ---------------------
-- 安装方式:这里"安装包"指拷贝一个安装好的环境下的目录文件,了解到还没有提供正式的安装光盘包
-- 拷贝"安装包" (开发环境下的impala目录、datacell目录文件scp到对应的机器上) (注意机器之间在对应)
dc1上:
mkdir -p /home/geedata
scp -r 172.16.100.146:/application/yoyosys/datacell /home/geedata/
scp -r 172.16.100.146:/application/yoyosys/impala /home/geedata/
 
dc2上:
mkdir -p /home/geedata
scp -r 172.16.100.147:/application/yoyosys/datacell /home/geedata/
scp -r 172.16.100.147:/application/yoyosys/impala /home/geedata/
 
dc3上:
mkdir -p /home/geedata
scp -r 172.16.100.148:/application/yoyosys/datacell /home/geedata/
scp -r 172.16.100.148:/application/yoyosys/impala /home/geedata/
 
 
--------------------- 修改datacell配置文件 ----------------------
0. DataCell.xml
配置文件/home/geedata/datacell/conf/DataCell.xml
如果 datacell集群有3个节点,需要修改DataCell.xml中如下为3,如果是4个节点就改成4. 否则无法正常启停,会认为节点数不完整。
 
<Bootstrap-Pending-Threshold>3</Bootstrap-Pending-Threshold>
<Bootstrap-Start-Threshold>3</Bootstrap-Start-Threshold>
 
1. DataCell.xml
dc1上修改后,scp到dc2、dc3 (各节点该步骤信息都是一样的)
 找到以下内容修改为:
 
原来:<Root>/application/data/meta4dc</Root>
改为:<Root>/home/geedata/data/meta4dc</Root>
 
原来:<Name>172.16.100.146</Name>
改为:<Name>172.16.100.165</Name>
 
当前,datacell里有3张表,这3张表的对应区域都要修改,即修改3处,如下
 
原(yy_webmedia_detail):
<!-- Volume use to generate from console-->
<Volumes>
<Volume>yy_webmedia_detail_0_volume</Volume>
<Volume>yy_webmedia_detail_1_volume</Volume>
<Volume>yy_webmedia_detail_2_volume</Volume>
<Volume>yy_webmedia_detail_3_volume</Volume>
<Volume>yy_webmedia_detail_4_volume</Volume>
</Volumes>
 
 
<!-- Structured conf-->
<Enable-Structured-Storage>true</Enable-Structured-Storage>
<Structured-Storage>
<Schema>
<Id>1</Id>
<Name>yy_webmedia_detail</Name>
改(yy_webmedia_detail):
<!-- Volume use to generate from console-->
<Volumes>
<Volume>yy_webmedia_detail_0_volume</Volume>
</Volumes>
 
 
<!-- Structured conf-->
<Enable-Structured-Storage>true</Enable-Structured-Storage>
<Structured-Storage>
<Schema>
<Id>1</Id>
<Name>yy_webmedia_detail</Name>
 
 
原(yy_news):
<!-- Volume use to generate from console-->
<Volumes>
<Volume>yy_webmedia_detail_0_volume</Volume>
</Volumes>
 
 
<!-- Structured conf-->
<Enable-Structured-Storage>true</Enable-Structured-Storage>
<Structured-Storage>
<Schema>
<Id>11</Id>
<Name>yy_news</Name>
 
改(yy_news):
<!-- Volume use to generate from console-->
<Volumes>
<Volume>yy_webmedia_detail_0_volume</Volume>
<Volume>yy_webmedia_detail_1_volume</Volume>
<Volume>yy_webmedia_detail_2_volume</Volume>
<Volume>yy_webmedia_detail_3_volume</Volume>
<Volume>yy_webmedia_detail_4_volume</Volume>
</Volumes>
 
 
<!-- Structured conf-->
<Enable-Structured-Storage>true</Enable-Structured-Storage>
<Structured-Storage>
<Schema>
<Id>11</Id>
<Name>yy_news</Name>
 
原(yy_user_news):
<!-- Volume use to generate from console-->
<Volumes>
<Volume>yy_webmedia_detail_0_volume</Volume>
</Volumes>
 
 
<!-- Structured conf-->
<Enable-Structured-Storage>true</Enable-Structured-Storage>
<Structured-Storage>
<Schema>
<Id>11</Id>
<Name>yy_user_news</Name>
 
改(yy_user_news):
<!-- Volume use to generate from console-->
<Volumes>
<Volume>yy_webmedia_detail_0_volume</Volume>
<Volume>yy_webmedia_detail_1_volume</Volume>
<Volume>yy_webmedia_detail_2_volume</Volume>
<Volume>yy_webmedia_detail_3_volume</Volume>
<Volume>yy_webmedia_detail_4_volume</Volume>
</Volumes>
 
 
<!-- Structured conf-->
<Enable-Structured-Storage>true</Enable-Structured-Storage>
<Structured-Storage>
<Schema>
<Id>11</Id>
<Name>yy_user_news</Name>
 
-- 修改存储位置
原来:
</Storages>
<Volumes>
<Volume>
<Id>yy_webmedia_detail_0_volume</Id>
<Path>/application/data/data4dc</Path>
<Type>HD</Type>
<Num-Dispatchers>1</Num-Dispatchers>
</Volume>
 
</Volumes>
 
 
</DataCell>
</Configuration>
 
改为:
</Storages>
<Volumes>
<Volume>
<Id>yy_webmedia_detail_0_volume</Id>
<Path>/home/geedata/data/sdb</Path>
<Type>HD</Type>
<Num-Dispatchers>1</Num-Dispatchers>
</Volume>
 
<Volume>
<Id>yy_webmedia_detail_1_volume</Id>
<Path>/home/geedata/data/sdc</Path>
<Type>HD</Type>
<Num-Dispatchers>1</Num-Dispatchers>
</Volume>
 
<Volume>
<Id>yy_webmedia_detail_2_volume</Id>
<Path>/home/geedata/data/sdd</Path>
<Type>HD</Type>
<Num-Dispatchers>1</Num-Dispatchers>
</Volume>
 
<Volume>
<Id>yy_webmedia_detail_3_volume</Id>
<Path>/home/geedata/data/sde</Path>
<Type>HD</Type>
<Num-Dispatchers>1</Num-Dispatchers>
</Volume>
 
<Volume>
<Id>yy_webmedia_detail_4_volume</Id>
<Path>/home/geedata/data/sdf</Path>
<Type>HD</Type>
<Num-Dispatchers>1</Num-Dispatchers>
</Volume>
</Volumes>
 
 
</DataCell>
</Configuration>
 
2. dc2/dc3上:修改DataCell.xml:(scp dc1的后,还要修改下ip如下)
[root@dc1 conf]# scp /home/geedata/datacell/conf/DataCell.xml 172.16.100.166:/home/geedata/datacell/conf/
[root@dc1 conf]# scp /home/geedata/datacell/conf/DataCell.xml 172.16.100.167:/home/geedata/datacell/conf/
   还需要修改如下:
   dc2: DataCell.xml修改ip为dc2的ip
   原:<Name>172.16.100.165</Name>
   改:<Name>172.16.100.166</Name>
 
   dc3:DataCell.xml修改ip为dc3的ip
   原:<Name>172.16.100.165</Name>
   改:<Name>172.16.100.167</Name>
 
 
3. agent.xml 主要是修改ip地址
   路径:/home/geedata/datacell/conf/agent.xml
   dc1上:
   原:   <Network-Interface>172.16.100.146</Network-Interface>
   改为:<Network-Interface>172.16.100.165</Network-Interface>
 
   dc2上:
   原:   <Network-Interface>172.16.100.147</Network-Interface>
   改为:<Network-Interface>172.16.100.166</Network-Interface>
 
   dc3上:
   原:   <Network-Interface>172.16.100.148</Network-Interface>
   改为:<Network-Interface>172.16.100.167</Network-Interface>
 
   #### ps: 还有一项 <Agent-Address>172.16.246.131</Agent-Address>  这个无需关心,是一个跨网段的代理,我们当前环境不需要
4. bf_setenv.sh 修改
   dc1上:
   原: export BITSFLOW_HOME=/application/yoyosys/datacell
   改: export BITSFLOW_HOME=/home/geedata/datacell
 
   dc2/dc3上:(scp dc1的)
   [root@dc1 ~]# scp /home/geedata/datacell/bf_setenv.sh 172.16.100.166:/home/geedata/datacell/
   [root@dc1 ~]# scp /home/geedata/datacell/bf_setenv.sh 172.16.100.167:/home/geedata/datacell/
 
 
5. agent_start.sh 修改
   dc1上:
   原: BITSFLOW_HOME=/application/yoyosys/datacell
   改: BITSFLOW_HOME=/home/geedata/datacell
 
   dc2/dc3上:(scp dc1的)
   [root@dc1 ~]# scp /home/geedata/datacell/agent_start.sh 172.16.100.166:/home/geedata/datacell/
   [root@dc1 ~]# scp /home/geedata/datacell/agent_start.sh 172.16.100.167:/home/geedata/datacell/
 
6. agent_stop.sh 修改  
   dc1上:
   原: BITSFLOW_HOME=/application/yoyosys/datacell
   改: BITSFLOW_HOME=/home/geedata/datacell
 
   dc2/dc3上:(scp dc1的)
   [root@dc1 ~]# scp /home/geedata/datacell/agent_stop.sh 172.16.100.166:/home/geedata/datacell/
   [root@dc1 ~]# scp /home/geedata/datacell/agent_stop.sh 172.16.100.167:/home/geedata/datacell/
 
7. agent_status.sh 修改
   dc1上:
   原:BITSFLOW_HOME=/application/yoyosys/datacell
   改:BITSFLOW_HOME=/home/geedata/datacell
 
   dc2/dc3上:(scp dc1的)
   [root@dc1 ~]# scp /home/geedata/datacell/agent_status.sh 172.16.100.166:/home/geedata/datacell/
   [root@dc1 ~]# scp /home/geedata/datacell/agent_status.sh 172.16.100.167:/home/geedata/datacell/
 
8. rm -rf /home/geedata/datacell/db/groupd*.db (如果想留数据的话不要删除,不想留数据的话可以删除)
 
9. datacell验证
   注意事项:如果该套datacell集群与另一套datacell集群在同一个局域网段,group启动时,会启动不了,报group节点已存在(在同网段确有一台开发环境下的group节点).
            解决方法:修改端口号,把原来的31060改为其他端口如31068)
                    需要修改的文件有: (vim全局替换即可 :%s/31060/31068/g)
                    主节点的DataCell.xml/agent.xml/groupd.xml  3个配置文件
                    从节点的DataCell.xml/agent.xml  2个配置文件
                    
   --启动datacell (顺序如下 是各节点上并行执行)
     cd /home/geedata/datacell
  1) dc1/dc2/dc3上都执行如下命令--启动agent服务
   ./agent_start.sh
   
  2) 仅主节点dc1上启动group服务
   ./start_groupd.sh

  3) dc1/dc2/dc3上都执行如下命令--启动datacell服务    
   ./start_datacell.sh
 

   --查看datacell进程
   dc1上:
   [root@dc1 datacell]# cd /home/geedata/datacell
   [root@dc1 datacell]# ps -ef |grep `pwd`|grep -v grep  # dc1这个主节点上有3个进程,比从节点多一个 group进程
    root     28712     1  0 19:44 ?        00:00:00 /home/geedata/datacell/bin/agent -cfgfile /home/geedata/datacell/conf/agent.xml -logfile /home/geedata/datacell/logs/agent.log -pidfile /home/geedata/datacell/run/agent.pid -daemon -licfile /home/geedata/datacell/creds/yoyo.lic -tokfile /home/geedata/datacell/creds/agent.tok
    root     28733     1  0 19:44 ?        00:00:00 /home/geedata/datacell/bin/groupd -cfgfile /home/geedata/datacell/conf/groupd.xml -logfile /home/geedata/datacell/logs/groupd.log -pidfile /home/geedata/datacell/run/groupd.pid -daemon -dbdir /home/geedata/datacell/db
    root     28745     1  0 19:44 ?        00:00:00 /home/geedata/datacell/bin/DataCell -cfgfile /home/geedata/datacell/conf/DataCell.xml -logfile /home/geedata/datacell/logs/DataCell.log -pidfile /home/geedata/datacell/run/DataCell.pid -daemon
   dc2上:
   [root@dc2 datacell]# cd /home/geedata/datacell
   [root@dc2 datacell]# ps -ef |grep `pwd`|grep -v grep  #dc2这个从节点上有2个进程
   root     26109     1  0 19:41 ?        00:00:00 /home/geedata/datacell/bin/agent -cfgfile /home/geedata/datacell/conf/agent.xml -logfile /home/geedata/datacell/logs/agent.log -pidfile /home/geedata/datacell/run/agent.pid -daemon -licfile /home/geedata/datacell/creds/yoyo.lic -tokfile /home/geedata/datacell/creds/agent.tok
   root     26176     1  0 19:42 ?        00:00:00 /home/geedata/datacell/bin/DataCell -cfgfile /home/geedata/datacell/conf/DataCell.xml -logfile /home/geedata/datacell/logs/DataCell.log -pidfile /home/geedata/datacell/run/DataCell.pid -daemon
 
   dc3上:
   [root@dc3 ~]# cd /home/geedata/datacell
   [root@dc3 datacell]# ps -ef |grep `pwd`|grep -v grep  # dc3这个从节点上有2个进程
   root     22047     1  0 19:36 ?        00:00:00 /home/geedata/datacell/bin/agent -cfgfile /home/geedata/datacell/conf/agent.xml -logfile /home/geedata/datacell/logs/agent.log -pidfile /home/geedata/datacell/run/agent.pid -daemon -licfile /home/geedata/datacell/creds/yoyo.lic -tokfile /home/geedata/datacell/creds/agent.tok
   root     22106     1  0 19:37 ?        00:00:00 /home/geedata/datacell/bin/DataCell -cfgfile /home/geedata/datacell/conf/DataCell.xml -logfile /home/geedata/datacell/logs/DataCell.log -pidfile /home/geedata/datacell/run/DataCell.pid -daemon
 
 
   --停止datacell  
     cd /home/geedata/datacell
  1) dc1/dc2/dc3上都执行如下命令--停止datacell服务
     ./stop_datacell.sh
   
  2) dc1上停止group服务
     ./stop_groupd.sh

  3) dc1/dc2/dc3上都执行如下命令--停止agent服务    
     ./agent_stop.sh
    
 
   --  验证 登录datacell
   
   # DataCellShell -agent localhost:31060 -service 1121  #(注意这个的端口号要与配置文件的一致,如果局域网内有2个group时,端口号是在修改的,默认是31060)
   
   [root@dc1 datacell]#. bf_setenv.sh
   [root@dc1 datacell]# DataCellShell -agent localhost:31068 -service 1121  #这里的端口是已修改了的
   Open MX Channel
 
   Copyright (c) 2007-2013 Yoyo Systems. All rights reserved.
 
   Welcome to the DataCell shell. Commands end with ";"
 
   agent   : localhost:31068
   service : 1121
   login   : 2015-10-27 11:12:19.262550
 
   Type ‘help;‘ for help. Type ‘clear;‘ to clear the buffer.
   DataCellShell> show storages;
   Supported features: FILESYSTEM, OBJECT, STRUCTURED, TIMESERIES
   ------------------------------------------------------------------------------------
   NO    STORAGE
   ------------------------------------------------------------------------------------
   1     yy_webmedia_detail (STRUCTURED | TIMESERIES)
   2     yy_news (STRUCTURED | TIMESERIES)
   3     yy_user_news (STRUCTURED | TIMESERIES)
   ------------------------------------------------------------------------------------
   Done at 2015-10-27 11:12:41.847477, took time: 2300 microseconds
   DataCellShell>
   DataCellShell> use yy_webmedia_detail as STRUCTURED;
   The storage "yy_webmedia_detail" is ready for you to enter commands.
   Type ‘close;‘ to closed this storage. Type ‘quit;‘, ‘bye;‘ or ‘exit;‘ to back to terminal.
   Done at 2015-10-27 11:13:58.880032, took time: 41889 microseconds
   yy_webmedia_detail>
   yy_webmedia_detail> select count(*) from yy_webmedia_detail;
   Count result: 0
   Done at 2015-10-27 11:14:33.844168, took time: 6235 microseconds
   yy_webmedia_detail>
 
=============== impala集群安装部署 ==============
1. hadoop配置文件 core-site.xml 修改hadoop的临时文件的目录
   位置:/home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/etc/hadoop/core-site.xml
   原:    <value>/application/yoyosys/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/tmp</value>
   改:    <value>/home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/tmp</value>
 
   原:    <value>/application/yoyosys/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/hdfs/socket._PORT</value>
   改:    <value>/home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/hdfs/socket._PORT</value>
 
   原:    <value>hdfs://dc146.yoyosys.com:20500</value>
   改:    <value>hdfs://dc1.com:20500</value>
 
2. setenv.sh修改 (3节点保持一样)
   dc1上:
   原:[ -z "$IMPALA_HOME" ] && IMPALA_HOME=/application/yoyosys/impala
   改:[ -z "$IMPALA_HOME" ] && IMPALA_HOME=/home/geedata/impala
 
   原:export DATACELL_HOME=/application/yoyosys/datacell
   改:export DATACELL_HOME=/home/geedata/datacell
 
   原:export MYSQL_SERVER=dc146.yoyosys.com
   改:export MYSQL_SERVER=dc1.com
 
   原:export SERVER_HOST_NAME=dc146.yoyosys.com
   改:export SERVER_HOST_NAME=dc1.com
 
   注意: 如上 mysql_server需要修改成为本地的主机名 而不是mysql数据库的主机名
 
3. dc2/dc3上:scp dc1的setenv.sh
   [root@dc1 impala]# scp /home/geedata/impala/setenv.sh 172.16.100.166:/home/geedata/impala/
   [root@dc1 impala]# scp /home/geedata/impala/setenv.sh 172.16.100.167:/home/geedata/impala/
 
4. 修改mysql数据库需要修改odbc.ini (3节点保持一样)
   路径:/home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbc.ini
   cp -a /home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbc.ini /home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbc.ini.bak
   清空这个文件,填入以下内容:
   
   dc1上:vim odbc.ini
   
   修改帐号、mysql数据库的ip 如下 以gee_business为例,有几个mysql就修改几个
[gee_business]
Description = MySQL connection to ‘***‘ database
Driver=MYSQL-DRIVER
Database=gee_business
Server=172.16.100.31   ---------------- mysql的IP
User=admin             ---------------- mysql的帐号(注意是User 不能是UserName,不过在有的环境下是UserName也正常,所以,如果连接不上mysql可调试这里)
Password=admin         ---------------- mysql的密码
Port=
Socket=/var/lib/mysql/mysql.sock  ----------- 按mysql实际路径
charset = UTF8

[gee_operate]
Description = MySQL connection to ‘***‘ database
Driver=MYSQL-DRIVER
Database=gee_operate
User=admin
Password=admin
Server=172.16.100.31
Port=
Socket=/var/lib/mysql/mysql.sock
charset = UTF8

[gee_person]
Description = MySQL connection to ‘***‘ database
Driver=MYSQL-DRIVER
Database=gee_person
User=admin
Password=admin
Server=172.16.100.31
Port=
Socket=/var/lib/mysql/mysql.sock
charset = UTF8

[gee_crawler]
Description = MySQL connection to ‘***‘ database
Driver=MYSQL-DRIVER
Database=gee_crawler
User=admin
Password=admin
Server=172.16.100.178
Port=
Socket=/var/lib/mysql/mysql.sock

   5. dc2/dc3上: scp dc1的odbc.ini
   [root@dc1 ~]# scp /home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbc.ini 172.16.100.166:/home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/
   [root@dc1 ~]# scp /home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbc.ini 172.16.100.167:/home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/
 
6. odbcinst.ini (将原/application/yoyosys/改为/home/geedata/) 如下(有几处改几处)
   路径:/home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbcinst.ini
   原:Driver=/home/geedata/impala/thirdparty/unixodbc-2.3.2/drivers/mysql/x64/5.3.2/lib/libmyodbc5w.so
   改:Driver=/home/geedata/impala/thirdparty/unixodbc-2.3.2/drivers/mysql/x64/5.3.2/lib/libmyodbc5w.so

7. dc2/dc3上: scp dc1的odbcinst.ini
   scp /home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbcinst.ini 172.16.100.166:/home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/
   scp /home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbcinst.ini 172.16.100.167:/home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/
 
####### 以下几个步骤参见readme文档  路径:/home/geedata/impala/README #######
     readme文档中的如下3步忽略不做,如下
     -- 1:mkdir /opt/yoyosys/impala ## 跳过不做
     -- 2:tar xzf impala-2.5.0-cdh5.2.0.tar.gz -C /opt/yoyosys/impala  ## 跳过不做
 -- 5:拷贝至集群内所有节点 ## 跳过不做 (是指scp impala目录到其他节点)
 
8.  impalactl.sh init  
dc1上:
[root@dc1 impala]# . setenv.sh
[root@dc1 impala]# impalactl.sh init (仅执行这一次) ##(该步骤仅在主节点dc1上执行)
mkdir: cannot create directory `/home/geedata/impala/logs‘: File exists
mkdir: cannot create directory `/home/geedata/impala/thirdparty/hive-1.1.0-cdh5.4.2/logs‘: File exists
mkdir: cannot create directory `/home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/tmp‘: File exists
mkdir: cannot create directory `/home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/hdfs‘: File exists
sed -i "s/_hadoophome_/\/home\/geedata\/impala\/thirdparty\/hadoop-2.6.0-cdh5.4.2/g" /home/geedata/impala/conf/core-site.xml
sed -i "s/_hadoophome_/\\/home\\/geedata\\/impala\\/thirdparty\\/hadoop-2.6.0-cdh5.4.2/g" /home/geedata/impala/conf/hdfs-site.xml
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/conf/core-site.xml
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/conf/hive-site.xml
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/conf/yarn-site.xml
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/conf/mapred-site.xml
sed -i "s/_hadoophome_/\/home\/geedata\/impala\/thirdparty\/hadoop-2.6.0-cdh5.4.2/g" /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/etc/hadoop/core-site.xml
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/etc/hadoop/core-site.xml
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/etc/hadoop/masters
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/etc/hadoop/slaves
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/etc/hadoop/yarn-site.xml
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/etc/hadoop/mapred-site.xml
sed -i "s/_javahome_/\/home\/geedata\/impala\/thirdparty\/jdk1.8.0_25-x64/g" /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/etc/hadoop/hadoop-env.sh
sed -i "s/_hostname_/dc1.com/g" /home/geedata/impala/thirdparty/hive-1.1.0-cdh5.4.2/conf/hive-site.xml
sed -i "s/_hivehome_/\/home\/geedata\/impala\/thirdparty\/hive-1.1.0-cdh5.4.2/g" /home/geedata/impala/thirdparty/hive-1.1.0-cdh5.4.2/conf/hive-log4j.properties
sed -i "s/_unixodbchome_/\/home\/geedata\/impala\/thirdparty\/unixodbc-2.3.2/g" /home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbcinst.ini
sed -i "s/_mysqlserver_/dc1.com/g" /home/geedata/impala/thirdparty/unixodbc-2.3.2/etc/odbc.ini
/home   ....
Change is finished!
Suceed init impala.
[root@dc1 impala]#
 
dc2/dc3上:
[root@dc2 impala]# . setenv.sh
[root@dc2 impala]# impalactl.sh init
 
[root@dc3 impala]# . setenv.sh
[root@dc3 impala]# impalactl.sh init
 
9. impalactl.sh format (仅执行这一次)  ##(该步骤仅在主节点dc1上执行)
dc1上:
[root@dc1 impala]# impalactl.sh format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
 
15/10/26 17:11:33 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
...........
...........
SHUTDOWN_MSG: Shutting down NameNode at dc1/172.16.100.165
************************************************************/
Suceed in format namenode.
 
dc2/dc3上:
[root@dc2 impala]# impalactl.sh format
Suceed in format namenode.
[root@dc3 impala]# impalactl.sh format
Suceed in format namenode.
 
---------------------- 开始启动了 ----------------------
10. 启动hadoop:
    -- dc1上: (仅dc1这个主节点有hadoop服务,其他节点无需启动)
   [root@dc1 impala]# ./impalactl.sh start hadoop
   begin to start hadoop
   Starting namenodes on [dc1.com]
   The authenticity of host ‘dc1.com (172.16.100.165)‘ can‘t be established.
   RSA key fingerprint is e3:19:42:cc:8a:5a:b2:4f:90:e4:f8:c1:f1:19:cc:64.
   Are you sure you want to continue connecting (yes/no)? yes
   dc1.com: Warning: Permanently added ‘dc1.com‘ (RSA) to the list of known hosts.
   dc1.com: starting namenode, logging to /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/logs/hadoop-root-namenode-dc1.com.out
   dc1.com: starting datanode, logging to /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/logs/hadoop-root-datanode-dc1.com.out
   Starting secondary namenodes [0.0.0.0]
   The authenticity of host ‘0.0.0.0 (0.0.0.0)‘ can‘t be established.
   RSA key fingerprint is e3:19:42:cc:8a:5a:b2:4f:90:e4:f8:c1:f1:19:cc:64.
   Are you sure you want to continue connecting (yes/no)? yes
   0.0.0.0: Warning: Permanently added ‘0.0.0.0‘ (RSA) to the list of known hosts.
   0.0.0.0: starting secondarynamenode, logging to /home/geedata/impala/thirdparty/hadoop-2.6.0-cdh5.4.2/logs/hadoop-root-secondarynamenode-dc1.com.out
   Suceed in starting hadoop.
   
   验证:
   [root@dc1 impala]# ps -ef |grep `pwd`|grep hadoop
   
11. 启动hive:
    -- dc1上:(仅dc1这个主节点有hadoop服务,其他节点无需启动)
   [root@dc1 impala]# ./impalactl.sh start hive -d
   begin to start hive yes
   /home/geedata/impala/thirdparty/hive-1.1.0-cdh5.4.2/bin/hive --service metastore 1>/dev/null 2>&1 &
   hive pid:19796, status:0
   Suceed in starting hive.
   
   验证:
   [root@dc1 impala]# ps -ef |grep `pwd`|grep hive
   
   登录hive:
   [root@dc1 impala]# cd /home/geedata/impala
   [root@dc1 impala]# . setenv.sh
   [root@dc1 impala]# ./thirdparty/hive-1.1.0-cdh5.4.2/bin/hive shell
   Logging initialized using configuration in file:/home/geedata/impala/thirdparty/hive-1.1.0-cdh5.4.2/conf/hive-log4j.properties
   WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
   hive> show databases;
   OK
   dc
   default
   gee_business
   gee_operate
   gee_person
   Time taken: 0.669 seconds, Fetched: 5 row(s)
   hive>
 
12. 启动impala: (各节点均要启动,如下)

     -- dc1上:
    [root@dc1 impala]# ./impalactl.sh start all -d
    begin to start statestored yes
    /home/geedata/impala/be/build/release/statestore/statestored -log_dir=/home/geedata/impala/logs -state_store_host=dc1.com -state_store_port=24000 -state_store_subscriber_port=23000 1>/dev/null 2>&1 &
    statestored pid:20121, status:0
    begin to start catalogd yes
    /home/geedata/impala/bin/start-catalogd.sh -log_dir=/home/geedata/impala/logs -use_statestore -state_store_port=24000 -state_store_host=dc1.com -catalog_service_host=dc1.com -catalog_service_port=26000 -state_store_subscriber_port=23020 1>/dev/null 2>&1 &
    catalogd pid:20148, status:0
    begin to start impalad yes
    /home/geedata/impala/bin/start-impalad.sh -log_dir=/home/geedata/impala/logs -use_statestore -state_store_port=24000 -state_store_host=dc1.com -state_store_subscriber_port=23030 -catalog_service_host=dc1.com -catalog_service_port=26000 -be_port=22001 -beeswax_port=21001 -webserver_port=25001 -hs2_port=21051 -default_pool_max_queued=1 -default_pool_max_requests=1 -default_query_options=‘exec_single_node_rows_threshold=0‘ -max_result_cache_size=21474836480 -mem_limit=6G -fe_service_threads=8 -be_service_threads=16 1>/dev/null 2>&1 &
    impalad pid:20217, status:0
    Suceed in starting all.
    [root@dc1 impala]#
   
     -- dc2/dc3上:
     [root@dc2 impala]# ./impalactl.sh -d start impala
     [root@dc3 impala]# ./impalactl.sh -d start impala
 
 
      查看impala进程
      [root@dc1 impala]# ps -ef |grep `pwd`|grep release
 
 
13. 验证impala

    与mysql数据库连通测试:(各节点都要测试)
   [root@dc1 impala]# isql gee_business -v
   +---------------------------------------+
   | Connected!                            |
   |                                       |
   | sql-statement                         |
   | help [tablename]                      |
   | quit                                  |
   |                                       |
   +---------------------------------------+
   SQL>
    -- 登录impala
[root@dc1 impala]# . setenv.sh
[root@dc1 impala]# ./bin/impala-shell.sh  -i dc1.com:21001
    Starting Impala Shell without Kerberos authentication
    Connected to dc1.com:21001
    Server version: impalad version 2.2.0-cdh5.4.2 RELEASE (build a0c23b5c27c4209cc22e138c72173842664fa98a)
    Welcome to the Impala shell. Press TAB twice to see a list of available commands.
   
    Copyright (c) 2012 Cloudera, Inc. All rights reserved.
   
    (Shell build version: build version not available)
    [dc1.com:21001] > show databases;
    Query: show databases
    +------------------+
    | name             |
    +------------------+
    | _impala_builtins |
    | dc               |
    | default          |
    | gee_business     |
    | gee_operate      |
    | gee_person       |
    +------------------+
   
[dc1.com:21001] > use gee_business;
[dc1.com:21001] > Query: select * from sv_recommend
    +----+----------+----------------+--------------+--------+--------------------------------------------------+---------+---------+--------+---------------------+
    | id | tenantid | recommend_type | website_type | isread | uuid                                             | ruserid | suserid | status | create_time         |
    +----+----------+----------------+--------------+--------+--------------------------------------------------+---------+---------+--------+---------------------+
    | 2  | 1        | 2              | 0            | 0      | 7-1432725332639-fb16769d17e10ed1f8175c1e9225109b | 2       | 2       | 1      | 2015-09-01 12:49:54 |
    | 9  | 1        | 2              | 0            | 0      | 1                                                | 2       | 2       | 1      | 2015-09-06 17:21:10 |
    | 10 | 1        | 2              | 0            | 0      | 1                                                | 1       | 2       | 1      | 2015-09-06 17:21:10 |
    | 11 | 1        | 2              | 0            | 0      | 1                                                | 3       | 2       | 1      | 2015-09-06 17:21:10 |
    | 12 | 1        | 2              | 0            | 0      | 2938214647                                       | 2       | 2       | 1      | 2015-09-21 15:01:42 |
    | 13 | 1        | 2              | 0            | 0      | f92d9515-3bfd-443b-84dc-681771f15afa             | 2       | 2       | 1      | 2015-09-22 16:31:43 |
    | 14 | 1        | 2              | 0            | 0      | 5e3f2f4a-46a5-4bc5-ae96-41338224921d             | 2       | 2       | 1      | 2015-09-25 16:37:57 |
    +----+----------+----------------+--------------+--------+--------------------------------------------------+---------+---------+--------+---------------------+
    Fetched 7 row(s) in 4.643058s, Rpc time:4.637681s
14. mysql表的映射关系重建
    如果映射关系不可用,可删除原映射关系,再重建
    [dc1.com:21001] > show create table yy_webmedia_detail; #主要原因是之前的表结构中‘backend.odbc.executor.host‘=‘dc147.yoyosys.com‘的指定要指向新的数据库IP
    Query: show create table yy_webmedia_detail
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | result                                                                                                                                                                                                                         |
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | CREATE TABLE gee_business.yy_webmedia_detail (                                                                                                                                                                                 |
    |   id INT,                                                                                                                                                                                                                      |
    |   title VARCHAR(300),                                                                                                                                                                                                          |
    .....................
    .....................                                                                                                                                                                                                 |
    |   version INT                                                                                                                                                                                                                  |
    | )                                                                                                                                                                                                                              |
    | STORED AS TEXTFILE                                                                                                                                                                                                             |
    | LOCATION ‘hdfs://dc146.yoyosys.com:20500/hive/warehouse/gee_business.db/yy_webmedia_detail‘                                                                                                                                    |
    | TBLPROPERTIES (‘transient_lastDdlTime‘=‘1443598444‘, ‘backend.odbc.dsn‘=‘dsn=gee_business;uid=admin;pwd=admin‘, ‘backend.odbc.executor.host‘=‘dc147.yoyosys.com‘, ‘backend.type‘=‘odbc‘, ‘column.type.mapping‘=‘content:TEXT‘) |
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   
    登录impala:
    use各库:drop掉所有表   
    drop table sv_app_version;  
    drop table sv_classification;
    drop table sv_classification_type;
    .................
    .................
    drop table sv_collect;
    如果在impala里drop表报错如下: 这里进入hive去drop,drop完后,重启impala
    [dc1.com:21001] > drop table sv_feedback                            ;
    Query: drop table sv_feedback
    ERROR:
    ImpalaRuntimeException: Error making ‘dropTable‘ RPC to Hive Metastore:
    CAUSED BY: MetaException: java.lang.IllegalArgumentException: java.net.UnknownHostException: dc146.yoyosys.com
    
    登录hive:
    [root@dc1 impala]# ./thirdparty/hive-1.1.0-cdh5.4.2/bin/hive shell
    
    
    重建映射关系,以一个表为例:
 
15. datacell库的表映射关系重建
    -- 登录impala,删除原表,重建
    如果在impala里无法把datacell表删除,就登录hive来删除,如下:
    登录hive:
    [root@dc1 impala]# . setenv.sh
    [root@dc1 impala]# ./thirdparty/hive-1.1.0-cdh5.4.2/bin/hive shell
   
    Logging initialized using configuration in file:/home/geedata/impala/thirdparty/hive-1.1.0-cdh5.4.2/conf/hive-log4j.properties
    WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
    hive> show databases;
    OK
    dc
    default
    gee_business
    gee_operate
    gee_person
    Time taken: 0.669 seconds, Fetched: 5 row(s)
    hive>
        > drop database dc;
    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database dc is not empty. One or more tables exist.)
    hive>
        > use dc;
    OK
    Time taken: 0.01 seconds
    hive> show tables;
    OK
    yy_news
    yy_user_news
    yy_webmedia_detail
    Time taken: 0.025 seconds, Fetched: 3 row(s)
    hive> drop table yy_news;
    OK
    Time taken: 8.812 seconds
    hive> drop table yy_user_news;
    OK
    Time taken: 7.091 seconds
    hive> drop table yy_webmedia_detail;
    OK
    Time taken: 7.098 seconds
    hive>
        > show tables;
    OK
    Time taken: 0.01 seconds
    hive>
        > drop database dc;
    OK
    Time taken: 7.07 seconds
    hive> show databases;
    OK
    default
    gee_business
    gee_operate
    gee_person
    Time taken: 0.009 seconds, Fetched: 4 row(s)
    hive>         
 
   然后,登录impala重建datacell数据库(dc库表结构)
   [root@dc1 impala]# . setenv.sh
    [root@dc1 impala]# ./bin/impala-shell.sh  -i dc1.com:21001
 
   建库:dc
   [dc1.com:21001] > create database dc;  
    Query: create database dc
 
   建表:以一个表为为例: ## 注意sql中的端口号应与配置文件一致 ‘127.0.0.1:31060‘
    CREATE TABLE dc.yy_news (
    id BIGINT,
    create_time TIMESTAMP,
    webmedia_area_type TINYINT,
    website_type TINYINT,
    update_time TIMESTAMP,
    publish_time TIMESTAMP,
    version INT,
    title VARCHAR(300),
    digest VARCHAR(500),
    dynamic_abstract VARCHAR(1000),
    content VARCHAR(5000),
    url VARCHAR(2049),
    imageurl VARCHAR(2049),
    webmedia_author VARCHAR(50),
    webmedia_original_source VARCHAR(255),
    website_icp VARCHAR(20),
    websitename VARCHAR(20),
    keywords VARCHAR(255)
    )
    TBLPROPERTIES (‘backend.datacell.service‘=‘1121‘, ‘backend.type‘=‘datacell‘, ‘backend.datacell.schema‘=‘yy_news‘, ‘backend.datacell.agent‘=‘127.0.0.1:31060‘, ‘backend.datacell.storage‘=‘yy_news‘);
 
   建完后,验证
   [dc1.com:21001] > show tables;
    Query: show tables
    +--------------------+
    | name               |
    +--------------------+
    | yy_news            |
    | yy_user_news       |
    | yy_webmedia_detail |
    +--------------------+
    Fetched 3 row(s) in 0.004810s, Rpc time:0.003613s
    [dc1.com:21001] > select count(*) from yy_webmedia_detail;
    Query: select count(*) from yy_webmedia_detail
    +----------+
    | count(*) |
    +----------+
    | 0        |
    +----------+
    Fetched 1 row(s) in 3.508856s, Rpc time:3.507191s
    [dc1.com:21001] >

16. 总体验证
    登录impala后,连接mysql各库,连接datacell库,如果ddl(create/drop),dml(select/update/delete)语句都正常,说明整体部署成功.        


17. 变量自动设置
dc1/dc2/dc3上均执行如下命令 使变量自动设置 (这样,不必每次都要手工指定 . setenv.sh 和 . bf_setenv.sh了)
[root@dc1 ~]# cp -a ~/.bash_profile ~/.bash_profile.bak
[root@dc1 ~]# echo "source /geedata/application/impala/setenv.sh" >> ~/.bash_profile
[root@dc1 ~]# echo "source /geedata/application/datacell/bf_setenv.sh" >> ~/.bash_profile
--------------------------------------------------      
---- ps:相关测试命令 ----
--------------------------------------------------
--odbc连接
[root@dc1 impala]# cd /home/geedata/impala/thirdparty/unixodbc-2.3.2/bin
[root@dc1 bin]# ./odbcinst -q -s
[odbc-mysql]
[odbc-oracle]
[odbc-iq]
 
-- 测试与mysql连通性
[root@dc1 impala]# isql gee_business -v
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL>
 
-- impala启停

   //启动impala方法
   dc1:
   [root@dc1 impala]# ./impalactl.sh start all -d
    
   dc2/dc3:
   [root@dc2 impala]#  ./impalactl.sh -d start impala
   [root@dc3 impala]#  ./impalactl.sh -d start impala

    
   //登录impala
   [root@dc1 impala]# . setenv.sh
   [root@dc1 impala]# ./bin/impala-shell.sh  -i dc1.com:21001
   
   //停止impala方法:
   dc1:
   [root@dc1 impala]# ./impalactl.sh stop all
    
   dc2:
   [root@dc2 impala]# ./impalactl.sh -d stop impala
    
   dc3:
   [root@dc3 impala]# ./impalactl.sh -d stop impala

-- hive启停

    //启动hive:
   [root@dc1 impala]# ./impalactl.sh start hive -d

    //登录hive:
   [root@dc1 impala]# cd /home/geedata/impala
   [root@dc1 impala]# . setenv.sh
   [root@dc1 impala]# ./thirdparty/hive-1.1.0-cdh5.4.2/bin/hive shell

    // 停止hive
   [root@dc1 impala]# ./impalactl.sh stop hive

-- hadoop启停
   
    // 启动hadoop:
   [root@dc1 impala]# ./impalactl.sh start hadoop
   
    // 停止hadoop
   [root@dc1 impala]# ./impalactl.sh stop hadoop


-- 启停datacell

   //启动datacell (顺序如下 是各节点上并行执行)
     cd /home/geedata/datacell
     1) dc1/dc2/dc3上都执行如下命令--启动agent服务
      ./agent_start.sh
      
     2) 仅主节点dc1上启动group服务
      ./start_groupd.sh
    
     3) dc1/dc2/dc3上都执行如下命令--启动datacell服务    
      ./start_datacell.sh
 

   //查看datacell进程
    [root@dc1 datacell]# cd /home/geedata/datacellps -ef |grep `pwd`|grep -v grep
    [root@dc1 datacell]# ps -ef |grep `pwd`|grep -v grep
 
 
   //停止datacell  
     cd /home/geedata/datacell
     1) dc1/dc2/dc3上都执行如下命令--停止datacell服务
        ./stop_datacell.sh
      
     2) 仅dc1上停止group服务
        ./stop_groupd.sh
     
     3) dc1/dc2/dc3上都执行如下命令--停止agent服务    
        ./agent_stop.sh

datacell&impala数据库群集搭建部署安装手册

标签:

原文地址:http://www.cnblogs.com/junrong624/p/4930866.html

(1)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!