Pandas的使用_python

一，Python，Numpy基础

二，Pandas基础

三.索引

四，分组

五，变形

六，连接

七，缺失数据的处理

八，文本数据的处理

九，分类数据

十，时间序列数据处理

推荐一个学习Pandas的网站 Joyful Pandas 1.0 documentation

一，Python，Numpy基础

列表推导式

[m+'_'+n for m in ['a', 'b'] for n in ['c', 'd']]
[i if i <= 5 else 5 for i in L]

lambda表达式

[(lambda x: 2*x)(i) for i in range(5)]

map()

list(map(lambda x: 2*x, range(5)))

zip()和enumerate()

L1, L2, L3 = list('abc'), list('def'), list('hij') list(zip(L1, L2, L3))

np.array(),np.linspace(),np.arange() 生成矩阵

np.zeros(),np.eye(),np.full() 生成特殊矩阵

np.random.rand(),np.random.randn(),np.random.randint(),np.random.choice() 生成随机数组

.T转置，r_行拼接，c_列拼接（一位数组和二维数组拼接时视为列向量）

reshape() 维度变换

where(),nonzero(),argmax(),argmin(),any(),all()过滤函数

cumprod(),cumsum(),diff(),max(),min(),mean(),median(),,std(),var(),sum(),quantile(),

cov(),corrcoef(),dot(),@ 常用的计算函数。

二，Pandas基础

read_csv() read_table(),read_excel() 常用的读取文件函数，header=,index_col,usecols=,parse_dates=,nrows=，sep= 重要参数

to_csv(),to_excel() 常用写入函数，sep=,index= 重要参数

value,index,dtype,name,shppe Series数据结构的重要属性

values,index,columns,dtypes,shape DataFrame数据结构的重要属性

head(),tail(),info(),describe(),sum(),mean(),median(),var(),std(),max(),min(),quantile(),count(),idxmax(),unique(),nunique(),drop_duplicates(keep=),replace()where(),mask(),round(),abs(),clip(),sort_values(),sort_index(),apply() 常用函数

rolling()滑动窗口

expanding()扩张窗口

ewm()指数加权窗口

shift(),diff(),pct_change() 类滑窗函数

三.索引

表名[列名]---获取表中的一列，返回Series类型数据

如果列名不包含空格可以用---表名.列名

loc[行选择，列选择]---根据名称进行选择

iloc[行选择，列选择]---根据位置进行选择

query()---根据一个条件查询进行选择

sample()---随机抽样

set_index()---设置索引

四，分组

df.groupby(分组依据)[数据来源].在组上进行的 *** 作

agg() 组聚合

transform() 组变换

filter() 组过滤

apply() 跨列分组

五，变形

pivot() 长表变宽表

pivot_table()

melt() 宽表变长表

wide_to_lone

stack(),unstack()

六，连接

merge() 值连接

join() 索引连接

concat() 方向连接

append()

assign()

七，缺失数据的处理

df.isna(),df.isnull() 统计缺失值

df.isna().mean() 统计缺失值比例

df[sub_set.isna().all(1)] 查看全部缺失的数据

df[sub_set.isna().any(1)] 至少有一个缺失的数据

res=df.dropna(how='any',subset=['height','Weight'] 删除身高，体重列中至少有一个缺失的行

res=df.dropna(1,thresh=df.shape[0]-15) 删除超过15个缺失值的列

fillna(value=,method=,limit=) 缺失值填充

fillna(s.mean()) 均值填充缺失值

interpolate() 插值

八，文本数据的处理

split() 拆分字符串

join(),cat() 连接字符串

contains(),startswith(),endswith(),match() 匹配

replace() 替换

extract() 提取

upper(),lower(),title(),capitalize(),swapcase() 字母型函数

strip(),rstrip(),lstrip() 去除空格

九，分类数据十，时间序列数据处理

date_range(start=,end=,freq=,periods=) 生成连续间隔时间

resample() 重采样

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/799511.html

Pandas的使用

发表评论

评论列表（0条）