高效地将Pandas数据框写入Google BigQuery

高效地将Pandas数据框写入Google BigQuery,第1张

高效地将Pandas数据框写入Google BigQuery

Datalab
使用以下代码对备选方案1和3进行了比较:

from datalab.context import Contextimport datalab.storage as storageimport datalab.bigquery as bqimport pandas as pdfrom pandas import Dataframeimport time# Dataframe to writemy_data = [{1,2,3}]for i in range(0,100000):    my_data.append({1,2,3})not_so_simple_dataframe = pd.Dataframe(data=my_data,columns=['a','b','c'])#Alternative 1start = time.time()not_so_simple_dataframe.to_gbq('TestDataSet.TestTable',       Context.default().project_id,      chunksize=10000,       if_exists='append',      verbose=False      )end = time.time()print("time alternative 1 " + str(end - start))#Alternative 3start = time.time()sample_bucket_name = Context.default().project_id + '-datalab-example'sample_bucket_path = 'gs://' + sample_bucket_namesample_bucket_object = sample_bucket_path + '/Hello.txt'bigquery_dataset_name = 'TestDataSet'bigquery_table_name = 'TestTable'# Define storage bucketsample_bucket = storage.Bucket(sample_bucket_name)# Create or overwrite the existing table if it existstable_schema = bq.Schema.from_dataframe(not_so_simple_dataframe)# Write the Dataframe to GCS (Google Cloud Storage)%storage write --variable not_so_simple_dataframe --object $sample_bucket_object# Write the Dataframe to a BigQuery tabletable.insert_data(not_so_simple_dataframe)end = time.time()print("time alternative 3 " + str(end - start))

这是n = {10000,100000,1000000}的结果:

n       alternative_1  alternative_310000   30.72s         8.14s100000  162.43s        70.64s1000000 1473.57s       688.59s

从结果来看,备选方案3比备选方案1更快。



欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/5666951.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-16
下一篇 2022-12-16

发表评论

登录后才能评论

评论列表(0条)

保存