您可以对的唯一值进行随机抽样,将其
df.some_key.unique()用于对进行切片
df,最后
groupby对结果进行切片:
In [337]:df = pd.Dataframe({'some_key': [0,1,2,3,0,1,2,3,0,1,2,3], 'val': [1,2,3,4,1,5,1,5,1,6,7,8]})In [338]:print df[df.some_key.isin(random.sample(df.some_key.unique(),2))].groupby('some_key').mean() valsome_key 0 1.0000002 3.666667
如果有多个groupby键:
In [358]:df = pd.Dataframe({'some_key1':[0,1,2,3,0,1,2,3,0,1,2,3], 'some_key2':[0,0,0,0,1,1,1,1,2,2,2,2], 'val': [1,2,3,4,1,5,1,5,1,6,7,8]})In [359]:gby = df.groupby(['some_key1', 'some_key2'])In [360]:print gby.mean().ix[random.sample(gby.indices.keys(),2)] valsome_key1 some_key2 1 1 53 2 8
但是,如果您只是要获取每个组的值,则甚至不需要
groubpy,它
MultiIndex会做:
In [372]:idx = random.sample(set(pd.MultiIndex.from_product((df.some_key1, df.some_key2)).tolist()), 2)print df.set_index(['some_key1', 'some_key2']).ix[idx] valsome_key1 some_key2 2 0 33 1 5
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)