标签:hdfs between 关系数据库 lin unix datetime 结构化 style index
CREATE TABLE t1(name string,id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘; LOAD DATA LOCAL INPATH ‘/Users/***/Desktop/test.txt‘ INTO TABLE t1;
然后在hdfs上查看, port 50070
dfs -ls /user/wyq/hive;
---------------------------------------------------
eclipse java(jar cvf demoudf.jar ///.java)
import java.util.Data; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; import java.next.DataFormat; public class UnixTodate extends UDF{ public Text evaluate(Text text){ if (text ==null) return null; long timestamp = Long.parseLong(text.toString()); return new Text(toDate(timestamp)); } private String toDate(long timestamp){ Date date = new Date(timestamp*1000); return DateFormat.getInstance().format(date).toString(); } }
ADD jar /Users/wyq/Desktop/demoudf.jar; create temporary function userdate as ‘demoudf.UnixTodate‘; create table test(id string, unixtime string) row format delimited fields terminated by ‘,‘; load data local inpath ‘/Users/wyq/Desktop/udf_test.txt‘ into table test; select * from test; select id,userdate(unixtime) from test;
cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
-----------------------------------------------------
用python将表列进行转换
row format delimited fields terminated by ‘\t‘ load data local inpath ‘///‘ into table test
add File ///.py;
insert overwrite table u_data_new select transform (col1,col2,unixtime) using ‘python ...py‘ as (col1,col2,unixtime) from u_data
python:
for line in sys.std: line = line.strip() col1,col2,unixtime = line.split(‘\t‘) weekday=datetime.datetime.formtimestamp(float(unixtime)).isoweekday() print ‘\t‘.join(col1,col2,string(weekday))
-------------------------------------------------------
hive :
1组件架构: hiveserver2(beeline),hive,metadb
其中Execution Engine – The component which executes the execution plan created by the compiler. The plan is a DAG of stages. The execution engine manages the dependencies between these different stages of the plan and executes these stages on the appropriate system components.
2连接方式到hiveserver2 :GUI CLI JDBC(beeline)
3数据源:用kafka,sqoop等获得data,放入hdfs,这些数据各种结构都有。关系数据库的表,MongoDB 或json数据,或日志
4怎么执行hql的?背后运行的是mapreduce or Tez jobs(类似于pig latin脚本执行pig)(tracking url)insert into test values("wangyuq","123");
stage?将你的数据移到目的位置之前,将会staing 那儿一段时间 staging文件没了。
5优劣与评价。pig是对非结构化数据处理的好的etl。
hive不是关系数据库,只是维护存储在HDFS的数据的metadata,使得对大数据操作就像sql操作表一样,只不过hql和sql稍有出入。hive使用metastore存表。hive默认derby但是可自定义更换。
使我们能用sql来执行mr。可以对hdfs数据进行query。
---但是:
hive不能承诺优化,只是简单,因此hive性能不能支持实时
index view,有限制(partition bucket)
read only 不支持update
和sql 的datatype不完全一样
新的partition可以被插入但不能
6与hdfs?hdfs里有hive
7那么如何处理数据?(partition bucket semidata->structured)
load语句: 将hdfs搬运到hive,hdfs不再有该数据。只是将真正的data转到了hive目录下。
8那么怎么存数据的? data在hdfs上,schema在metastore里。
9安装及error
mysql:(用户管理问题)
step 1: SET PASSWORD = PASSWORD(‘your new password‘);
step 2: ALTER USER ‘root‘@‘localhost‘ PASSWORD EXPIRE NEVER;
step 3: flush privileges;
1.$mysql -u root -p
2.mysql> create user ‘hive‘ identified by ‘123456‘;
Query OK, 0 rows affected (0.00 sec)
3.mysql> grant all privileges on *.* to ‘hive‘ with grant option;
Query OK, 0 rows affected (0.00 sec)
4.mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)
create user ‘hive‘@‘%‘ identified by ‘hive‘;
grant all privileges on *.* to ‘hive‘@‘%‘ with grant option;
flush privileges;
启动hadoop:
hadoop namenode -format; start-all.sh
标签:hdfs between 关系数据库 lin unix datetime 结构化 style index
原文地址:http://www.cnblogs.com/yumanman/p/7745858.html