合并不同时间频率的数据 pandas join实现_python

import pandas as pd
import numpy as np
import datetime

GitHub代码链接

先给数据都加上以月为单位的时间index

times = pd.date_range(periods=165, freq='M', end='2020/5')

times = times.to_list()
times.reverse()

times =pd.to_datetime(times)
times = times.strftime('%Y-%m')

df = pd.read_excel(r'data by month.xlsx')
df = df.set_index(times)

df.to_excel(r'./1.xlsx')

对于合并的基准df，需要新增加一列作为排序的标准

如果不这样的话，join后的结果顺序会出问题

df1 = pd.read_excel(r'./data by week.xlsx', header=1, usecols=range(0,4))

tmp = pd.to_datetime(df1["时间(该日期所在“周”的煤炭价格)年/月/日"])
tmp = tmp.dt.strftime('%Y-%m-%d')

df1["时间(该日期所在“周”的煤炭价格)年/月/日"]= pd.to_datetime(df1["时间(该日期所在“周”的煤炭价格)年/月/日"])
df1["时间(该日期所在“周”的煤炭价格)年/月/日"] = df1["时间(该日期所在“周”的煤炭价格)年/月/日"].dt.strftime('%Y-%m')
df1['num'] = pd.Series(data=np.arange(0,len(df1)), index=df1.index)
df1.set_index("时间(该日期所在“周”的煤炭价格)年/月/日", inplace=True)

df.head(3)

	山西	内蒙古	陕西	发电量
2020-04	8595.4	8527.6	5600.6	190.1
2020-03	9479.4	8906.1	5710.2	201.1
2020-02	7596.0	8422.0	5490.0	225.0

df1.head(3)

	价格低值(元/吨)	价格高值(元/吨)	价格平均值(元/吨)	num
时间(该日期所在“周”的煤炭价格)年/月/日
2020-04	470	480	475.0	0
2020-04	475	485	480.0	1
2020-04	490	495	492.5	2

利用df.join()合并，之后排序

result = df1.join(df)
result.sort_values(by='num', ascending=True, inplace=True)
result.drop(columns='num', axis=1, inplace=True)
result.set_index(tmp, inplace=True)
result.to_excel(r'./2.xlsx')
result.head()

	价格低值(元/吨)	价格高值(元/吨)	价格平均值(元/吨)	山西	内蒙古	陕西	发电量
时间(该日期所在“周”的煤炭价格)年/月/日
2020-04-30	470	480	475.0	8595.4	8527.6	5600.6	190.1
2020-04-24	475	485	480.0	8595.4	8527.6	5600.6	190.1
2020-04-17	490	495	492.5	8595.4	8527.6	5600.6	190.1
2020-04-10	505	515	510.0	8595.4	8527.6	5600.6	190.1
2020-04-03	530	535	532.5	8595.4	8527.6	5600.6	190.1

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/715047.html

合并不同时间频率的数据 pandas join实现

发表评论

评论列表（0条）