hadoop(十) - hive安装与自定义函数

时间：2014-12-27 21:49:21 阅读：181 评论：0 收藏：0 [点我收藏+]

标签：

一. Hive安装

Hive只在一个节点上安装即可

1. 上传tar包
2. 解压 tar -zxvf hive-0.9.0.tar.gz -C /cloud/
3. 配置mysql metastore（切换到root用户）

3.1 配置HIVE_HOME环境变量

3.2 安装mysql

查询以前安装的mysql相关包: rpm -qa | grep mysql
暴力删除这个包: rpm -e mysql-libs-5.1.66-2.el6_3.i686 --nodeps
安装mysql: rpm -ivh MySQL-server-5.1.73-1.glibc23.i386.rpm
rpm -ivh MySQL-client-5.1.73-1.glibc23.i386.rpm
执行命令设置mysql: /usr/bin/mysql_secure_installation （注意：删除匿名用户，允许用户远程连接）
登陆mysql: mysql -u root -p

4. 配置hive
cp hive-default.xml.template hive-site.xml
修改hive-site.xml（删除所有内容，只留一个<property></property>）

添加如下内容：

<property>
	  <name>javax.jdo.option.ConnectionURL</name>
	  <value>jdbc:mysql://hadoop00:3306/hive?createDatabaseIfNotExist=true</value>
	  <description>JDBC connect string for a JDBC metastore</description>
	</property>

	<property>
	  <name>javax.jdo.option.ConnectionDriverName</name>
	  <value>com.mysql.jdbc.Driver</value>
	  <description>Driver class name for a JDBC metastore</description>
	</property>

	<property>
	  <name>javax.jdo.option.ConnectionUserName</name>
	  <value>root</value>
	  <description>username to use against metastore database</description>
	</property>

	<property>
	  <name>javax.jdo.option.ConnectionPassword</name>
	  <value>123</value>
	  <description>password to use against metastore database</description>
	</property>

5. 安装hive和mysq完成后，将mysql的连接jar包拷贝到$HIVE_HOME/lib目录下
如果出现没有权限的问题，在mysql授权(在安装mysql的机器上执行)
mysql -uroot -p
#(执行下面的语句 *.*:所有库下的所有表 %：任何IP地址或主机都可以连接)
GRANT ALL PRIVILEGES ON *.* TO ‘root‘@‘%‘ IDENTIFIED BY ‘123‘ WITH GRANT OPTION;
FLUSH PRIVILEGES;

6. 建表(默认是内部表)
create table trade_detail(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by ‘\t‘;

建分区表:

create table td_part(id bigint, account string, income double, expenses double, time string) partitioned by (logdate string) row format delimited fields terminated by ‘\t‘;

建外部表:

create external table td_ext(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by ‘\t‘ location ‘/td_ext‘;

7. 创建分区表
普通表和分区表区别：有大量数据增加的需要建分区表
create table book (id bigint, name string) partitioned by (pubdate string) row format delimited fields terminated by ‘\t‘;

分区表加载数据:
load data local inpath ‘./book.txt‘ overwrite into table book partition (pubdate=‘2010-08-22‘);
load data local inpath ‘/root/data.am‘ into table beauty partition (nation="USA");

select nation, avg(size) from beauties group by nation order by avg(size);

二. UDF

自定义UDF要继承org.apache.hadoop.hive.ql.exec.UDF类实现evaluate

public class AreaUDF extends UDF{

	private static Map<Integer, String> areaMap = new HashMap<Integer, String>();
	
	static {
		areaMap.put(1, "北京");
		areaMap.put(2, "上海");
		areaMap.put(3, "广州");
	}
	
	public Text evaluate(Text in){
		String result = areaMap.get(Integer.parseInt(in.toString()));
		if(result == null){
			result = "其他";
		}
		return new Text(result);
	}
}

自定义函数调用过程：
1. 添加jar包（在hive命令行里面执行）
hive> add jar /root/NUDF.jar;

2. 创建临时函数
hive> create temporary function getNation as ‘cn.itcast.hive.udf.NationUDF‘;

3. 调用
hive> select id, name, getNation(nation) from beauty;

4. 将查询结果保存到HDFS中
hive> create table result row format delimited fields terminated by ‘\t‘ as select * from beauty order by id desc;
hive> select id, getAreaName(id) as name from tel_rec;
create table result row format delimited fields terminated by ‘\t‘ as select id, getNation(nation) from beauties;

hadoop(十) - hive安装与自定义函数

标签：

原文地址：http://blog.csdn.net/zdp072/article/details/42197905

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行