一、 K均值聚类为Unsupervised learning,默认使用欧氏距离。 from sklearn.cluster import KMeans,k_means
第1.cluster.KMeans([n_clusters, init, n_init, ...]) K均值聚类
第2.cluster.k_means(X, n_clusters[, init, ...]) K均值聚类算法
(1)从目的和源码来看:因为它们目的不一样
第2:k_means为K均值聚类算法只是对数据集进行k簇聚类(即为还原K均值聚类算法),所以要在k_means(X, n_clusters[, init, ...])里直接输入X(数据集)
第1:KMeans为K均值聚类,先在KMeans([n_clusters, init, n_init, ...])确定k簇聚类,然后KMeans源码内含子方法fit(计算k - means聚类)与predict(预测X中每个样本所属的最接近的群集),所以可以说它升华成了一个无监督的machine learning model
二、实例
数据集为
1.
from sklearn.cluster import KMeans from matplotlib import pyplot as plt import numpy as np import pandas as pd data = pd.read_csv("C:/Users/CWY/Desktop/deeplearn/Personalized-recommend-master/test/three_class_data.csv") x = data[["x","y"]] model = KMeans(n_clusters=3) model.fit(x) x_min, x_max = data['x'].min() - 1, data['x'].max() + 1 y_min, y_max = data['y'].min() - 1, data['y'].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, .01), np.arange(y_min, y_max, .01)) result = model.predict(np.c_[xx.ravel(), yy.ravel()]) result = result.reshape(xx.shape) plt.contourf(xx, yy, result, cmap=plt.cm.Greens) plt.scatter(data['x'], data['y'], c=model.labels_, s=15) center = model.cluster_centers_ plt.scatter(center[:, 0], center[:, 1], marker='p', linewidths=2, color='b', edgecolors='w', zorder=20) plt.show()
2.
from sklearn.cluster import k_means from matplotlib import pyplot as plt import numpy as np import pandas as pd data = pd.read_csv("C:/Users/CWY/Desktop/deeplearn/Personalized-recommend-master/test/three_class_data.csv") x = data[["x", "y"]] model = k_means(x, n_clusters=3) plt.scatter(data['x'], data['y'], c=model[1]) plt.show()
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)