HiveServer2 provides a remote interface for executing Hive queries, built on Thrift RPC, and also supports multi-user concurrency and authentication.
For Python users, the pyhs2 module can be used to connect to HiveServer2, execute queries, and fetch results.
The pyhs2 project is hosted on GitHub:
https://github.com/BradRuderman/pyhs2
It can be installed with:
If installation fails, try installing these dependencies first:
1 2 yum install cyrus-sasl-plain yum install cyrus-sasl-devel
Here is a simple test script. pyhs2 provides the basics quite nicely, and since query results are returned as lists, it is convenient for daily scheduled scripts and lightweight automation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 __author__ = 'knktc' __version__ = '0.1' import pyhs2class HiveClient : def __init__ (self, db_host, user, password, database, port=10000 , authMechanism="PLAIN" ): self.conn = pyhs2.connect(host=db_host, port=port, authMechanism=authMechanism, user=user, password=password, database=database) def query (self, sql ): with self.conn.cursor() as cursor: cursor.execute(sql) return cursor.fetch() def close (self ): self.conn.close() def main (): hive_client = HiveClient(db_host='hiveserver2.hadoop' , port=10000 , user='hdfs' , password='mypass' , database='test_log' , authMechanism='PLAIN' ) result = hive_client.query('select * from t_test limit 10' ) print (result) hive_client.close() if __name__ == '__main__' : main()