标签:
There are multiple ways to modify data in Hive:
EXPORT and IMPORT commands are also available (as of Hive 0.8).
Hive does not do any transformation while loading data into tables. Load operations are currently pure copy/move (纯复制,移动) operations that move datafiles into locations corresponding to Hive tables.
LOAD DATA [LOCAL] INPATH ‘filepath‘ [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] |
Load operations are currently pure copy/move operations that move datafiles into locations corresponding to Hive tables.
project/data1
/user/hive/project/data1
hdfs://namenode:9000/user/hive/project/data1
file:///user/hive/project/data1
fs.default.name
that specifies the Namenode URI./user/<username>
Query Results can be inserted into tables by using the insert clause.
Standard syntax: INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement; INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement; Hive extension (multiple inserts): FROM from_statement INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 [INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] [INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...; FROM from_statement INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 [INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] [INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...; Hive extension (dynamic partition inserts): INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement; INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement; |
IF NOT EXISTS
is provided for a partition (as of Hive 0.9.0).In the dynamic partition inserts, users can give partial partition specifications, which means just specifying the list of partition column names in the PARTITION clause. The column values are optional. If a partition column value is given, we call this a static partition, otherwise it is a dynamic partition. Each dynamic partition column has a corresponding input column from the select statement. This means that the dynamic partition creation is determined by the value of the input column. The dynamic partition columns must be specified last among the columns in the SELECT statement and in the same order in which they appear in the PARTITION() clause.
Dynamic Partition inserts are disabled by default. These are the relevant(相关的) configuration properties for dynamic partition inserts:
Configuration property |
Default |
Note |
---|---|---|
|
|
Needs to be set to |
|
|
In |
|
100 |
Maximum number of dynamic partitions allowed to be created in each mapper/reducer node |
|
1000 |
Maximum number of dynamic partitions allowed to be created in total |
|
100000 |
Maximum number of HDFS files created by all mappers/reducers in a MapReduce job |
|
|
Whether to throw an exception if dynamic partition insert generates empty results |
FROM page_view_stg pvs INSERT OVERWRITE TABLE page_view PARTITION(dt= ‘2008-06-08‘ , country) SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null , null , pvs.ip, pvs.cnt |
Here the country
partition will be dynamically created by the last column from the SELECT
clause (i.e. pvs.cnt
). Note that the name is not used. In nonstrict
mode the dt
partition could also be dynamically created.
Query results can be inserted into filesystem directories by using a slight variation (细微的变化)of the syntax above:
Standard syntax: INSERT OVERWRITE [LOCAL] DIRECTORY directory1 [ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11 . 0 ) SELECT ... FROM ... Hive extension (multiple inserts): FROM from_statement INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1 [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ... row_format : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char ]] [COLLECTION ITEMS TERMINATED BY char ] [MAP KEYS TERMINATED BY char ] [LINES TERMINATED BY char ] [NULL DEFINED AS char ] (Note: Only available starting with Hive 0.13 ) |
fs.default.name
that specifies the Namenode URI.The INSERT...VALUES statement can be used to insert data into tables directly from SQL.
Standard Syntax: INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...] Where values_row is: ( value [, value ...] ) where a value is either null or any valid SQL literal |
Means user cannot insert data into
complex datatype [array, map, struct, union] columns using INSERT INTO...VALUES clause.
CREATE TABLE students ( name VARCHAR (64), age INT , gpa DECIMAL (3, 2)) CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC; INSERT INTO TABLE students VALUES ( ‘fred flintstone‘ , 35, 1.28), ( ‘barney rubble‘ , 32, 2.32); CREATE TABLE pageviews (userid VARCHAR (64), link STRING, from STRING) PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC; INSERT INTO TABLE pageviews PARTITION (datestamp = ‘2014-09-23‘ ) VALUES ( ‘jsmith‘ , ‘mail.com‘ , ‘sports.com‘ ), ( ‘jdoe‘ , ‘mail.com‘ , null ); INSERT INTO TABLE pageviews PARTITION (datestamp) VALUES ( ‘tjohnson‘ , ‘sports.com‘ , ‘finance.com‘ , ‘2014-09-23‘ ), ( ‘tlee‘ , ‘finance.com‘ , null , ‘2014-09-21‘ ); |
Standard Syntax: UPDATE tablename SET column = value [, column = value ...] [WHERE expression] |
Standard Syntax: DELETE FROM tablename [WHERE expression] |
[Hive - LanguageManual] Hive Data Manipulation Language
标签:
原文地址:http://www.cnblogs.com/xiejin/p/4248075.html