预过滤加载ru.yandex.clickhouse clickhouse-jdbc0.3.1
val tableName = s"(SELECt CAST(longitude AS DOUBLE) longitude , CAST(latitude AS DOUBLE) latitude FROM location_log WHERe acquisition_time BETWEEN '$beginTime' and '$endTime') tempTable" val location: Dataframe = spark.read .format("jdbc") .option("url", "jdbc:clickhouse://172.16.16.111:8123") .option("fetchsize", "500000") .option("driver", "ru.yandex.clickhouse.ClickHouseDriver") .option("user", "default") .option("password", "Z1wXxjYzuRTcgqLm") .option("dbtable", tableName) .load()全表加载
val prop = new Properties prop.setProperty("user", "default") prop.setProperty("password", "default") prop.setProperty("driver", "ru.yandex.clickhouse.ClickHouseDriver") val location = spark .read .jdbc("jdbc:clickhouse://xxx.xx.xx.xxx:8123", "location_log", prop) .where("acquisition_time >= $beginTime AND acquisition_time <= $endTime ")
推荐使用第一种,预过滤减少数据量。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)