python机器学习之数据集（查看数据，划分训练集、测试集）_随笔

python机器学习之数据集（查看数据，划分训练集、测试集）

一、获取数据集

在一些大数据比赛的网站或者sklearn官方的网站可以下载一些数据集

如：UCI Machine Learning Repositoryhttp://archive.ics.uci.edu/ml/index.phpKaggle: Your Machine Learning and Data Science Communityhttps://www.kaggle.com天池大数据众智平台-阿里云天池天池是阿里云旗下大数据平台，围绕云生态挖掘输送优秀人才。旨在打造“数据众智、众创”平台，欢迎来自世界各地的技术人员来天池参与百万奖金的天池大赛，进行真实业务场景演练，与全球AI人才比拼，挑战世界排名。您还可以在天池创建属于您的天池实验室，享受免费计算资源，探索不同行业真实场景数据，共同探索数据众创新模式。https://tianchi.aliyun.com

scikit-learn: machine learning in Python — scikit-learn 1.0.2 documentationhttps://scikit-learn.org/stable/

等等，我这里用的是最后一个，sklearn的官方数据库，用import就可以导入，具体代码如下：

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

import math, random, time
import threading
#随机抽出训练集80%、测试集20%
def datasets_demo():
    #获取鸢尾花数据集并展示内容
    iris=load_iris()
    print(type(iris))#查看得到一个继承自字典的bunch类型的数据集
    print("iris数据集是n",iris)
    print("iris的数据集描述是：", iris.DESCR)
    print("iris的特征值名字是：", iris["feature_names"])
    print("iris的特征值", iris.data, iris.data.shape)
    print("iris的目标值：", iris.target)
    print("iris的目标值名字：", iris.target_names)
# 将数据集划分为训练集train、测试集test，x是特征值，y是目标值，先出特征值；随机种子为22
# 随机抽出训练集80%、测试集20%，共150组数据
    x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=22)
# 查看划分后的数据集，30组测试数据集
    print("测试集的特征值为", x_test, x_test.shape)
    return None
if __name__=='__main__':
    datasets_demo()

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5705640.html

python机器学习之数据集（查看数据，划分训练集、测试集）

发表评论

评论列表（0条）