有许多方法可以进行聚类分析。一种简单的方法是查看连续数据元素之间的间隙大小:
def cluster(data, maxgap): '''Arrange data into groups where successive elements differ by no more than *maxgap* >>> cluster([1, 6, 9, 100, 102, 105, 109, 134, 139], maxgap=10) [[1, 6, 9], [100, 102, 105, 109], [134, 139]] >>> cluster([1, 6, 9, 99, 100, 102, 105, 134, 139, 141], maxgap=10) [[1, 6, 9], [99, 100, 102, 105], [134, 139, 141]] ''' data.sort() groups = [[data[0]]] for x in data[1:]: if abs(x - groups[-1][-1]) <= maxgap: groups[-1].append(x) else: groups.append([x]) return groupsif __name__ == '__main__': import doctest print(doctest.testmod())
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)