pandas使用cut分割区间继而用groupby对数据分组_python

ages = np.array([1,5,10,40,36,12,2,2,67,45,90,3,6,8,23,45,12,15,17,22,4,33,28,56,58,62,77,89,100,18,20,25,30,32]) #年龄数据
quartiles=pd.cut(ages, [0,6,12,17,45,69,100], labels=[u"婴幼儿",u"儿童",u"青少年",u"青年",u"中年",u"老年"])
quartiles

输出

['婴幼儿', '儿童', '儿童', '中年', '中年', ..., '青少年', '青年', '青年', '青年', '青年']
Length: 34
Categories (6, object): ['婴幼儿' < '儿童' < '青少年' < '青年' < '中年' < '老年']

分为6个区间(0, 6],(7, 12],(13,17],(18, 45],(46,69],(70,100]
cut返回的Categorical对象可以直接传递给groupby函数

ages_s = pd.Series(ages)
ages_g =ages_s .groupby(quartiles)
for i in ages_g:
    print(i)

输出

('婴幼儿', 0     1
1     5
6     2
7     2
11    3
12    6
20    4
dtype: int32)
('儿童', 2     10
5     12
13     8
16    12
dtype: int32)
('青少年', 17    15
18    17
dtype: int32)
('青年', 3     40
4     36
9     45
14    23
15    45
19    22
21    33
22    28
29    18
30    20
31    25
32    30
33    32
dtype: int32)
('中年', 8     67
23    56
24    58
25    62
dtype: int32)
('老年', 10     90
26     77
27     89
28    100
dtype: int32)

对分组结果进行计算

functions =['min','max','count','mean']
ages_g.agg(functions)

得到每组结果

      min 	max 	count 	mean
婴幼儿 	1 	  6 	    7 	3.285714
儿童 	8 	  12 	    4 	10.500000
青少年 	15 	  17 	    2 	16.000000
青年 	18 	  45 	   13 	30.538462
中年 	56 	  67 	    4 	60.750000
老年 	77 	  100 	    4 	89.000000

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/870683.html

pandas使用cut分割区间继而用groupby对数据分组

发表评论

评论列表（0条）