码迷,mamicode.com
首页 > 其他好文 > 详细

pyhton 操作hive数据仓库

时间:2020-05-27 01:04:26      阅读:60      评论:0      收藏:0      [点我收藏+]

标签:ESS   localhost   min   python   arrays   set   ase   sim   follow   

使用库Pyhive

安装:pip   install Pyhive   -i http://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com

from pyhive import hive   # or import hive
conn = hive.Connection(host=‘****‘, port=****, username=‘****‘, database=‘****‘)
cursor.execute(‘‘SELECT * FROM my_awesome_data LIMIT 10‘‘)
for i in range(****):
    sql = "INSERT INTO **** VALUES ({},‘username{}‘)".format(value, str(username))
    cursor.execute(sql)
 
 
# 下面是官网代码:
from pyhive import presto  # or import hive
cursor = presto.connect(‘localhost‘).cursor()
cursor.execute(‘SELECT * FROM my_awesome_data LIMIT 10‘)
print(cursor.fetchone())
print(cursor.fetchall())

  

impyla

安装:

pip   install Pyhive   -i http://mirrors.aliyun.com/pypi/simple/  --trusted-host mirrors.aliyun.com

from impala.dbapi import connect 
conn = connect(host =‘****‘,port = ****)
cursor = conn.cursor()
cursor.execute(‘SELECT * FROM mytable LIMIT 100‘)
print cursor.description   # 打印结果集的schema 
results = cursor.fetchall()

  impyla交互hive 与pandas

from pyhive import hive
import pandas as pd
def LinkHive(sql_select):
    connection = hive.Connection(host=‘localhost‘)
    cur = connection.cursor()      
    cur.execute(sql_select)
    columns = [col[0] for col in cursor.description]
    result = [dict(zip(columns, row)) for row in cursor.fetchall()]
    Main = pd.DataFrame(result)
    Main.columns = columns 
    return Main
 
sql = "select * from 数据库.表名"
df  = LinkHive(sql)
或者

rom impala.dbapi import connect
from impala.util import as_pandas
conn = connect(host=‘10.161.20.11‘, port=21050)
cur = conn.cursor()
cur.execute(‘SHOW TABLES‘)
cur.execute(‘SELECT * FROM businfo‘)
data = as_pandas(cur)
print (data)
print (type(data))

 

Usage

Impyla implements the Python DB API v2.0 (PEP 249) database interface (refer to it for API details):

from impala.dbapi import connect
conn = connect(host=‘my.host.com‘, port=21050)
cursor = conn.cursor()
cursor.execute(‘SELECT * FROM mytable LIMIT 100‘)
print cursor.description  # prints the result set‘s schema
results = cursor.fetchall()

The Cursor object also exposes the iterator interface, which is buffered (controlled by cursor.arraysize):

cursor.execute(‘SELECT * FROM mytable LIMIT 100‘)
for row in cursor:
    process(row)

Furthermore the Cursor object returns you information about the columns returned in the query. This is useful to export your data as a csv file.

import csv

cursor.execute(‘SELECT * FROM mytable LIMIT 100‘)
columns = [datum[0] for datum in cursor.description]
targetfile = ‘/tmp/foo.csv‘

with open(targetfile, ‘w‘, newline=‘‘) as outcsv:
    writer = csv.writer(outcsv, delimiter=‘,‘, quotechar=‘"‘, quoting=csv.QUOTE_ALL, lineterminator=\n)
    writer.writerow(columns)
    for row in cursor:
        writer.writerow(row)

You can also get back a pandas DataFrame object

from impala.util import as_pandas
df = as_pandas(cur)
# carry df through scikit-learn, for example

 

pyhton 操作hive数据仓库

标签:ESS   localhost   min   python   arrays   set   ase   sim   follow   

原文地址:https://www.cnblogs.com/SunshineKimi/p/12969751.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!