多索引 | 自定义排序
最后修改日期:2021年11月17日
多索引
- 自定义排序
- 知识点
- 聚合
- 多索引
- 提取索引值
- 添加 / 删改
- 添加
- 删改
- 去重
- 指定列
- 示例数据
- 自定义排序
- 乱序
- 低维索引
- 现排序列表
- 自定义排序列表
- 自定义排序实现
- 高维索引
- 现排序列表
- 自定义排序列表
- 自定义排序实现
- 参考文章
- 添加
- 删改
- 自定义排序实现
###### raw data set, 'data' ###### columns labled with 'A' and 'B' ###### group function, e.g. mean(), sum(), std() data.groupby(['A', 'B'], as_index=False).mean()
多索引 提取索引值
###### raw data set, 'data' ###### multi-index data.index.get_level_values(level=0) # here, level could not exceed index levels
添加 / 删改 添加
###### raw data set, 'data' # 增加名为‘月份’的列,新列的值为int(month) data1['月份']=int(month)
删改 去重
###### raw data set, 'data' ###### subset, 用来指定特定的列,默认所有列 ###### first, 删除重复项并保留第一次出现的项 ###### inplace, 是直接在原来数据上修改还是保留一个副本 data.drop_duplicates(subset='A', keep='first', inplace=False) # keep='first'/'last'指定列
###### raw data set, 'data' # drop columns with labels 'column_name' # axis=0, rows; axis=1, columns # inplace, manipulate raw data; otherwise, return a copy of dataframe data.drop('column_name',axis=1, inplace=True)
示例数据
# example # print(data) ''' creator Num ... priority efforts group 0 Y 962 ... Highest 20.703253 P 1 R 979 ... Highest 15.027863 N ... 560 t 193 ... Low 44.173754 N ''' MultIndex = ['group', 'priority'] groupdata = data.groupby(MultIndex)['Num'].count() ## groupdata = pd.Dataframe(groupdata) groupdata = pd.Dataframe(data.groupby(MultIndex)['Num'].count()) # ''' group priority A High 5 Highest 7 Medium 36 B High 8 Highest 18 Medium 11 N High 94 Highest 88 Low 3 Medium 217 ... Name: Num, dtype: int64 '''
自定义排序 乱序 低维索引 现排序列表
groupVal = groupdata.index.get_level_values(level=0) ''' Index(['A', 'A', 'A', 'B', 'B', 'B', 'N', 'N', 'N', 'N', ...], dtype='object', name='group') ''' groupOrder = groupdata.index.get_level_values(level=0).drop_duplicates() #自定义排序列表''' Index(['A', 'B', 'N', ...], dtype='object', name='group') ''' groupOrder = groupdata.index.get_level_values(level=0).drop_duplicates().to_list() # ''' ['A', 'B', 'N', ...] '''
groupReorder = groupOrder[:2] + groupOrder[3:] + groupOrder[2:3] #自定义排序实现''' ['A', 'B', ..., 'N'] '''
# groupdata['groupOrder'] = groupVal groupdata['groupOrder'] = groupdata.index.get_level_values(level=0) ''' Num groupOrder group priority A High 5 A Highest 7 A Medium 36 A B High 8 B Highest 18 B Medium 11 B N High 94 N Highest 88 N Low 3 N Medium 217 N ... ''' groupdata['groupOrder'] = groupdata['groupOrder'].astype('category') groupdata['groupOrder'].cat.reorder_categories(groupReorder, ordered=True, inplace=True) #groupdata.sort_values('groupOrder', inplace=True, ascending=True) groupdata.drop(columns='groupOrder', inplace=True) ''' Num groupOrder group priority A High 5 A Highest 7 A Medium 36 A B High 8 B Highest 18 B Medium 11 B ... N High 94 N Highest 88 N Low 3 N Medium 217 N '''
高维索引 现排序列表
### 此处index level有两级 groupVal = groupdata.index.get_level_values(level=0) ''' Index(['A', 'A', 'A', 'B', 'B', 'B', 'N', 'N', 'N', 'N', ...], dtype='object', name='group') ''' groupOrder = groupdata.index.get_level_values(level=0).drop_duplicates().to_list() ''' ['A', 'B', 'N', 'P', 'W', 'X'] ''' priorityVal = groupdata.index.get_level_values(level=1) ''' Index(['High', 'Highest', 'Medium', 'High', 'Highest', 'Medium', 'High', 'Highest', 'Low', 'Medium', 'High', 'Highest', 'Medium', 'High', 'Highest', 'Medium', 'High', 'Highest', 'Medium'], dtype='object', name='priority') ''' priorityOrder = groupdata.index.get_level_values(level=1).drop_duplicates().to_list() ''' ['High', 'Highest', 'Medium', 'Low'] '''自定义排序列表
groupReorder = groupOrder[:2] + groupOrder[3:] + groupOrder[2:3] ''' ['A', 'B', 'P', 'W', 'X', 'N'] ''' priorityReorder = priorityOrder[1:2] + priorityOrder[0:1] + priorityOrder[2:4] ''' ['Highest', 'High', 'Medium', 'Low'] '''自定义排序实现
# groupdata['groupOrder'], groupdata['priorityOrder'] = [groupVal, priorityVal] # the same with # groupdata['groupOrder'], groupdata['priorityOrder'] = [groupVal, priorityVal] groupdata = groupdata.assign(groupOrder=groupVal, priorityOrder=priorityVal) ''' Num groupOrder priorityOrder group priority A High 5 A High Highest 7 A Highest Medium 36 A Medium B High 8 B High Highest 18 B Highest Medium 11 B Medium N High 94 N High Highest 88 N Highest Low 3 N Low Medium 217 N Medium ... ''' # type(groupdata):groupdata['groupOrder'] = pd.Categorical(groupdata['groupOrder'], categories=groupReorder) groupdata['priorityOrder'] = pd.Categorical(groupdata['priorityOrder'], categories=priorityReorder) groupdata.sort_values(['groupOrder', 'priorityOrder'], inplace=True, ascending=[True, True]) groupdata.drop(columns=['groupOrder', 'priorityOrder'], axis=1, inplace=True) ''' Num group priority A Highest 7 High 5 Medium 36 B Highest 18 High 8 Medium 11 ... N Highest 88 High 94 Medium 217 Low 3 '''
参考文章
写此文时有借鉴以下资源
添加python pandas同时对多列进行赋值
替换 Pandas Dataframe 中的列值
pandas dataframe 新增单列和多列
pandas 之 groupby 聚合函数
Pandas之drop_duplicates:去除重复项
python进行数据处理——pandas的drop函数
Pandas Dataframe sort by categorical column but by specific class ordering
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)