pandas模块详解_随笔

pandas模块详解 Pandas模块 1、什么是pandas

pandas是基于numpy构建的，用来做数据分析的

2、pandas能干什么

具备对其功能的数据结构DataFrame，Series
集成时间序列功能
提供丰富的数学运算和 *** 作
灵活处理缺失数据

3、怎么用pandas

安装引用

pip install pandas

import pandas as pd

Series

一种类似于一维数组的对象，由一组数据和一组与之相关的数据标签（索引）组成

#创建方法

pd.Series([1,2,3,4,5])   ##将数组索引以及数组的值打印出来，索引在左，值在右

pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])   

pd.Series({'a':1,'b':2})

pd.Series(0,index=['a','b','c'])

缺失数据

dropna() 过滤掉值为Nan的行
fill() 填充缺失数据
isnull() 返回布尔数组，缺失值对应为True
notnull() 返回布尔数组，缺失值对应为False

Series特性

从ndarray创建Series:Series(arr)

arr=np.arange(10)

sr=pd.Series(arr)

与标量（数字）进行运算

srx=sr*2

两个Series运算

sr*srx

布尔值过滤

sr[sr>3]

统计函数：mean()，sum(),cumsum()

支持字典的特性

从字典创建Series：Series(dic)

dic={'a':1,'b':2,'c':3,'d':4,'e':5}

dic_arr=pd.Series(dic)

in运算

for i in dic_arr:

    print(i)

键索引

dic_arr=[['a','b']]

键切片

dic_arr['a':'c']

其他函数

dic_arr.get('a',default=0)

整数索引

sr=pd.Series(np.arange(10))

sr1=sr[4:].copy()

loc属性以标签解释
iloc属性以下标解释

sr1.iloc[1]

sr1.loc[3]

Series数据对齐

sr1=pd.Series([10,20,30],index=['a','b','c'])

sr2=pd.Series([30,20,10].index=['c','b','a'])

sr1+sr2

#将两个Series对象相加将缺失值设为0

sr1=pd.Series([10,20,30],index=['a','b','c'])

sr2=pd.Series([30,20,10].index=['c','b','a'，'d'])

sr1.add(sr2,fill_value=0)

#灵活的算术方法：add，sub，div，mul

DataFrame

DataFrame是一个表格型的数据结构，相当于一个二维数组，含有一组有序的列。

他可以被看做由Series组成的字典，并且公用一个索引

创建方式

pd.DataFrame({'one':[1,2,3,4],'two':[4,3,2,1]})

data=pd.DataFrame({'one':[1,2,3,4],'two':[4,3,2,1]})

pd.DataFrame(data,columns=['one','two'])

pd.DataFrame({'one':pd.Series([1,2,3],index=['a','b','c']),'two':pd.Series([1,2,3],index=['b','a','c'])})

查看数据

常用属性和方法

index获取行索引
columns获取列索引
T转置
values获取值索引
describe获取快速统计

数组名.index 数组名.columns 数组名.T

数组名.values 数组名.describe

索引和切片

DataFrame有行索引和列索引
DataFrame可以通过标签和位置两张方法进行索引和切片

#两个中括号

import tushare as ts

data =ts.get_k_data('000001')

data['open'][:10]  #先取列再去行

data[:10]['open']

#使用loc、iloc属性

data.loc[:10,'open':'low']  #用标签取值

data.iloc[:10,1:5]   #用下标取值

时间对象处理

处理时间对象可能是我们在进行数据分析的过程中最常见的，我们会遇到各种格式的时间序列，也需要处理各种格式的时间序列

时间序列类型

时间戳：特定时刻
国定时间：如2017年2月
时间间隔：起始时间-结束时间

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/589289.html

pandas模块详解

发表评论

评论列表（0条）