2间合赵如何从scikit-learn附带的内置数据集中打印描述?

2间合赵如何从scikit-learn附带的内置数据集中打印描述?,第1张

在scikit-learn中,可以通过使用`load_`函数加载内置数据集,其中包含数据集的描述信息,可以通过打印`DESCR`属性来查看。下面以`load_boston`数据集为例,演示如何打印数据集的描述信息:
```python
from sklearndatasets import load_boston
# 加载数据集
boston = load_boston()
# 打印数据集描述信息
print(bostonDESCR)
```
输出结果如下所示,包括数据集的描述、特征说明和目标变量说明等详细信息:
```
_boston_dataset:
Boston house prices dataset
---------------------------
Data Set Characteristics:
:Number of Instances: 506
:Number of Attributes: 13 numeric/categorical predictive Median Value (attribute 14) is usually the target
:Attribute Information (in order):
- CRIM per capita crime rate by town
- ZN proportion of residential land zoned for lots over 25,000 sqft
- INDUS proportion of non-retail business acres per town
- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX nitric oxides concentration (parts per 10 million)
- RM average number of rooms per dwelling
- AGE proportion of owner-occupied units built prior to 1940
- DIS weighted distances to five Boston employment centres
- RAD index of accessibility to radial highways
- TAX full-value property-tax rate per $10,000
- PTRATIO pupil-teacher ratio by town
- B 1000(Bk - 063)^2 where Bk is the proportion of blacks by town
- LSTAT % lower status of the population
- MEDV Median value of owner-occupied homes in $1000's
:Missing Attribute Values: None
:Creator: Harrison, D and Rubinfeld, DL
This is a copy of UCI ML housing dataset
>点击你想下载的数据,跳到Download :Data folder 。点击,跳到数据描述页面,再右键保存就可以了!>以load_iris为例。
# 导入是必须的
from sklearndatasets import load_iris
iris = load_iris()
iris # iris的所有信息,包括数据集、标签集、各字段名等
这个输出太长太乱,而且后边也有,我就不复制过来了
iriskeys() # 数据集关键字
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])
descr = iris['DESCR']
data = iris['data']
feature_names = iris['feature_names']
target = iris['target']
target_names = iris['target_names']
descr
'Iris Plants Database\n====================\n\nNotes\n-----\nData Set Characteristics:\n :Number of Instances: 150 (50 in each of three classes)\n :Number of Attributes: 4 numeric, predictive attributes and the class\n :Attribute Information:\n - sepal length in cm\n - sepal width in cm\n - petal length in cm\n - petal width in cm\n - class:\n - Iris-Setosa\n - Iris-Versicolour\n - Iris-Virginica\n :Summary Statistics:\n\n ============== ==== ==== ======= ===== ====================\n Min Max Mean SD Class Correlation\n ============== ==== ==== ======= ===== ====================\n sepal length: 43 79 584 083 07826\n sepal width: 20 44 305 043 -04194\n petal length: 10 69 376 176 09490 (high!)\n petal width: 01 25 120 076 09565 (high!)\n ============== ==== ==== ======= ===== ====================\n\n :Missing Attribute Values: None\n :Class Distribution: 333% for each of 3 classes\n :Creator: RA Fisher\n :Donor: Michael Marshall (MARSHALL%PLU@ioarcnasagov)\n :Date: July, 1988\n\nThis is a copy of UCI ML iris datasets\n>

UCI数据库是加州大学欧文分校(Universityof)提出的用于机器学习的数据库,这个数据库目前共有187个数据集,其数目还在不断增加,UCI数据集是一个常用的标准测试数据集。


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/yw/12740747.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2023-05-27
下一篇 2023-05-27

发表评论

登录后才能评论

评论列表(0条)

保存