利用python程序通过thrift读取hbase中数据一环境介绍:master机器:node26.puppet.comslave机器两台:node27.puppet.comnode28.puppet.commysql数据库:node25.puppet.com二软版本介绍:hadoop版本:hadoop-0.20.2sqoop版本:sqoop-1.2.0-CDH3B4java版本:1.6.0_05mysql版本:5.1.25-rc三将mysql中数据导入到Hbase中首先查看myql数据库中students表内容:将mysql的数据导入hbase的命令格式为:sqoopimport--connectjdbc:mysql://mysqlserver_IP/databaseName--username--passwordpassword--tabledatatable--hbase-create-table--hbase-tablehbase_tablename--column-familycol_fam_name--hbase-row-keykey_col_name说明:databaseName和datatable是mysql的数据库和表名,hbase_tablename是要导成hbase的表名,key_col_name可以指定datatable中哪一列作为hbase新表的rowkey,col_fam_name是除rowkey之外的所有列的列族名例如:可通过如下命令将Mysql中的students表导入到Hbase中:sqoopimport--connectjdbc:mysql://172.16.41.25/sqoop--usernamesqoop--passwordsqoop--tablestudents--hbase-create-table--hbase-tablestudents--column-familystuinfo--hbase-row-keyid查看hbase中数据:[hadoop@node26conf]$hbaseshellHBaseShell;enter'helpRETURN'forlistofsupportedcommands.TypeexitRETURNtoleavetheHBaseShellVersion0.90.5,r1212209,FriDec905:40:36UTC2011hbase(main):001:0scan'students'ROWCOLUMN+CELL1column=stuinfo:age,timestamp=1358927364631,value=291column=stuinfo:name,timestamp=1358927364631,value=abc2column=stuinfo:age,timestamp=1358927364566,value=282column=stuinfo:name,timestamp=1358927364566,value=def3column=stuinfo:age,timestamp=1358927368741,value=263column=stuinfo:name,timestamp=1358927368741,value=aaaa4column=stuinfo:age,timestamp=1358927364563,value=604column=stuinfo:name,timestamp=1358927364563,value=efsaz5column=stuinfo:age,timestamp=1358927364563,value=635column=stuinfo:name,timestamp=1358927364563,value=kiass5row(s)in1.4110seconds四thrift结合python实验thrift安装链接:版本:2.7.3步骤为:1)安装python2.7.3说明:python2.7.3与thrift结合没问题,python2.5版本好像不行!生成的Hbase.py文件中的语法rhel5自带的python2.4不支持tarfvxjPython-2.7.3.tar.bz2./configure&&make&&makeinstallpython2.7.3路径为:/usr/local/bin/python执行python命令,看到的版本是否是2.7.3[hadoop@node26~]$/usr/local/bin/pythonPython2.7.3(default,Jan242013,15:24:19)[GCC4.1.220080704(RedHat4.1.2-46)]onlinux2Typehelp,copyright,creditsorlicenseformoreinformation.2)安装thrifttarfvxzthrift-0.9.0.tar.gzcdthrift-0.9.0./configure&&make&&makeinstall在configure过程中缺什么开发包就安装什么开发包thrift0.9.0BuildingC++Library.........:noBuildingC(GLib)Library....:yesBuildingJavaLibrary........:noBuildingC#Library..........:noBuildingPythonLibrary......:yesBuildingRubyLibrary........:noBuildingHaskellLibrary.....:noBuildingPerlLibrary........:noBuildingPHPLibrary.........:noBuildingErlangLibrary......:noBuildingGoLibrary..........:noBuildingDLibrary...........:noPythonLibrary:UsingPython..............:/usr/local/bin/python因为是给python用,所以这些no就忽略吧makemakeinstall3)通过thrift生成Hbase.py[root@hadoop1thrift-0.9.0]#thrift-versionThriftversion0.9.0截图:[root@hadoop1thrift-0.9.0]#thrift--genpy/home/kyo/hbase/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift[root@hadoop1thrift-0.9.0]#treegen-py/gen-py/|--__init__.py`--hbase|--Hbase-remote|--Hbase.py|--__init__.py|--constants.py`--ttypes.py1directory,6files截图:copyHbase相关文件到python2.7.3包目录下面:cp-rgen-py/hbase//usr/local/lib/python2.7/site-packages/看thrift的python模块是否存在,如不存在就做个软连接:ls/usr/local/lib/python2.7/site-packages/hbaseREADMEthriftthrift-0.9.0-py2.7.egg-infoln-s/usr/lib/python2.7/site-packages/thrift*/usr/local/lib/python2.7/site-packages/在node26节点上启动thrift。前提是hadoop和hbase已经正常启动了!命令为:hbasethrift-p9090start截图:python文件如下:1.#!/usr/local/bin/python2.#coding=utf-83.importsys4.#Hbase.thrift生成的py文件放在这里5.sys.path.append('/usr/local/lib/python2.7/site-packages/hbase')6.fromthriftimportThrift7.fromthrift.transportimportTSocket8.fromthrift.transportimportTTransport9.fromthrift.protocolimportTBinaryProtocol10.fromhbaseimportHbase11.#如ColumnDescriptor等在hbase.ttypes中定义12.fromhbase.ttypesimport*13.#Makesocket14.#此处可以修改地址和端口15.transport=TSocket.TSocket('172.16.41.26',9090)16.#Bufferingiscritical.Rawsocketsareveryslow17.#还可以用TFramedTransport,也是高效传输方式18.transport=TTransport.TBufferedTransport(transport)19.#Wrapinaprotocol20.#传输协议和传输过程是分离的,可以支持多协议21.protocol=TBinaryProtocol.TBinaryProtocol(transport)22.#客户端代表一个用户23.client=Hbase.Client(protocol)24.#打开连接25.transport.open()26.#打印表名27.print(client.getTableNames())复制代码特别注意要注明字符集,要不执行py时,会报如下错误:[hadoop@node26~]$/usr/local/bin/pythontestThrift.pyFiletestThrift.py,line2SyntaxError:Non-ASCIIcharacter'\xe7'infiletestThrift.pyonline2,butnoencodingdeclared;see利用python程序查询hbase中内容:[hadoop@node26~]$/usr/local/bin/pythontestThrift.py['students']截图为: