python数据分析----pandas计算常用统计值和排序_python

计算常用统计值

describe( )方法：

college.describe()

# count: 样本数据的大小
# mean: 数据的平均值
# std: 数据的标准差
# min: 数据的最小值
# 25%: 1/4位数，数据在25%时的值
# 50%: 中位数
# 75%: 3/4位数
# max: 数据的最大值

定义describe的include参数来决定统计什么类型：

# 统计字符串类型的数据
college.describe(include=object).T
# count: 非空值数量
# unique: 唯一值数量
# top: 频数最高的数量
# freq: 最高频数

# 统计所有类型的数据
college.describe(include='all').T

info( )方法：

常用排序方法

nlargest()方法，从大到小
从大到小排序选出100条数据：

# 从大到小排序，参数1为选出多少个，参数2为依据xx字段排序
new_movie.nlargest(100,'imdb_score')

nsmallest()方法，从小到大
继续调用从小到大排序选出5条数据：

# 从小到大排序，参数1为选出多少个，参数2为依据xx字段排序
new_movie.nlargest(100,'imdb_score').nsmallest(5,'budget')

sort_values()根据值排序：

参数1为以xx字段排序，ascending为升序，当ascending为False时是降序，默认也是降序

movie3.sort_values('title_year',ascending=False)

传入列表，对年份排序之后再对评分进行排序：

movie3.sort_values(['title_year','imdb_score'],ascending=False)

欢迎分享，转载请注明来源：内存溢出

python数据分析----pandas计算常用统计值和排序