spark怎么从hbase数据库当中创建rdd_sql

Configuration conf = HBaseConfiguration.create()

String tableName = "testTable"

Scan scan = new Scan()

scan.setCaching(10000)

scan.setCacheBlocks(false)

conf.set(TableInputFormat.INPUT_TABLE, tableName)

ClientProtos.Scan proto = ProtobufUtil.toScan(scan)

String ScanToString = Base64.encodeBytes(proto.toByteArray())

conf.set(TableInputFormat.SCAN, ScanToString)

JavaPairRDD<ImmutableBytesWritable, Result>myRDD = sc

.newAPIHadoopRDD(conf, TableInputFormat.class,

ImmutableBytesWritable.class, Result.class)

在Spark使用如上Hadoop提供的标准接口读取HBase表数据（全表读），读取5亿左右数据，要20M+，而同样的数据保存在Hive中，读取却只需要1M以内，性能差别非常大。

直接看代码

[java] view plain copy

import org.apache.spark.sql.{DataFrame, SQLContext}

import org.apache.spark.{SparkConf, SparkContext}

/**

* wo xi huan xie dai ma

* Created by wangtuntun on 16-5-7.

object clean {

def main(args: Array[String]) {

//设置环境

val conf=new SparkConf().setAppName("tianchi").setMaster("local")

val sc=new SparkContext(conf)

val sqc=new SQLContext(sc)

欢迎分享，转载请注明来源：内存溢出

spark怎么从hbase数据库当中创建rdd