如果必须循环,则遍历
groupby对象时需要解压缩键和数据框:
import pandas as pdimport numpy as npimport statsmodels.api as smfrom patsy import dmatricesdf = pd.read_csv('data.csv')df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')df = df.set_index('date')
注意
group_name这里的用法:
for group_name, df_group in df.groupby(pd.Grouper(freq='M')): y,X = dmatrices('value1 ~ value2 + value3', data=df_group, return_type='dataframe')
如果要避免迭代,请看看Paul
H的要旨中的笔记本(请参阅他的评论),但是使用的简单示例
apply是:
def do_regression(df_group, ret='outcome'): """Apply the function to each group in the data and return one result.""" y,X = dmatrices('value1 ~ value2 + value3', data=df_group, return_type='dataframe') if ret == 'outcome': return y else: return Xoutcome = df.groupby(pd.Grouper(freq='M')).apply(do_regression, ret='outcome')
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)