scikit-learn 是不是没有 Apriori,FP-Growth 的 API
Python开源工具包:scikit-learn 是关于机器学习的开发包,主页:>如下:
1基于MapReduce的气候数据的分析
2基于关键词的文本知识的挖掘系统的设计与实现
3基于概率图模型的蛋白质功能预测
4基于第三方库的人脸识别系统的设计与实现
5基于hbase搜索引擎的设计与实现
6基于Spark-Streaming的黑名单实时过滤系统的设计与实现
7客户潜在价值评估系统的设计与实现
8基于神经网络的文本分类的设计与实现
9基于Apriori的商品关联关系分析与挖掘
10基于词频统计的中文分词系统的设计与实现
11 K-means算法在微博数据挖掘中的应用
12图像对象检测分析系统的研究和应用
13基于Apriori关联规则的电子商务潜在客户的数据挖掘
14基于Spark的电商用户行为分析系统的设计与实现
15音乐推荐系统的研究与应用
16基于大数据的高校网络舆情监控引导系统的研究与应用
17基于医疗大数据的肿瘤疾病模式分析与研究
18基于支持向量机的空间数据挖掘及其在旅游地理经济中的应用
19基于深度残差网络的糖尿病视网膜病变分类检测研究
20基于大数据分析的门户信息推荐系统
21 Web数据挖掘及其在电子商务中的研究与应用
from operator import and_from itertools import combinationsclass AprioriAssociationRule:
def __init__(self, inputfile):
selftransactions = []
selfitemSet = set([])
inf = open(inputfile, 'rb')
for line in infreadlines():
elements = set(filter(lambda entry: len(entry)>0, linestrip()split(',')))
if len(elements)>0:
selftransactionsappend(elements)
for element in elements:
selfitemSetadd(element)
infclose()
selftoRetItems = {}
selfassociationRules = []
def getSupport(self, itemcomb):
if type(itemcomb) != frozenset:
itemcomb = frozenset([itemcomb])
within_transaction = lambda transaction: reduce(and_, [(item in transaction) for item in itemcomb])
count = len(filter(within_transaction, selftransactions))
return float(count)/float(len(selftransactions))
def runApriori(self, minSupport=015, minConfidence=06):
itemCombSupports = filter(lambda freqpair: freqpair[1]>=minSupport,
map(lambda item: (frozenset([item]), selfgetSupport(item)), selfitemSet))
currentLset = set(map(lambda freqpair: freqpair[0], itemCombSupports))
k = 2
while len(currentLset)>0:
currentCset = set([iunion(j) for i in currentLset for j in currentLset if len(iunion(j))==k])
currentItemCombSupports = filter(lambda freqpair: freqpair[1]>=minSupport,
map(lambda item: (item, selfgetSupport(item)), currentCset))
currentLset = set(map(lambda freqpair: freqpair[0], currentItemCombSupports))
itemCombSupportsextend(currentItemCombSupports)
k += 1
for key, supportVal in itemCombSupports:
selftoRetItems[key] = supportVal
selfcalculateAssociationRules(minConfidence=minConfidence)
def calculateAssociationRules(self, minConfidence=06):
for key in selftoRetItems:
subsets = [frozenset(item) for k in range(1, len(key)) for item in combinations(key, k)]
for subset in subsets:
confidence = selftoRetItems[key] / selftoRetItems[subset]
if confidence > minConfidence:
selfassociationRulesappend([subset, key-subset, confidence])用Scala也大概六十多行:
import scalaioSourceimport scalacollectionimmutableListimport scalacollectionimmutableSetimport javaioFileimport scalacollectionmutableMapclass AprioriAlgorithm(inputFile: File) {
var transactions : List[Set[String]] = List()
var itemSet : Set[String] = Set()
for (line<-SourcefromFile(inputFile)getLines()) {
val elementSet = linetrimsplit(',')toSet
if (elementSetsize > 0) {
transactions = transactions :+ elementSet
itemSet = itemSet ++ elementSet
}
}
var toRetItems : Map[Set[String], Double] = Map()
var associationRules : List[(Set[String], Set[String], Double)] = List()
def getSupport(itemComb : Set[String]) : Double = {
def withinTransaction(transaction : Set[String]) : Boolean = itemComb
map( x => transactioncontains(x))
reduceRight((x1, x2) => x1 && x2)
val count = transactionsfilter(withinTransaction)size
counttoDouble / transactionssizetoDouble
}
def runApriori(minSupport : Double = 015, minConfidence : Double = 06) = {
var itemCombs = itemSetmap( word => (Set(word), getSupport(Set(word))))
filter( wordSupportPair => (wordSupportPair_2 > minSupport))
var currentLSet : Set[Set[String]] = itemCombsmap( wordSupportPair => wordSupportPair_1)toSet
var k : Int = 2
while (currentLSetsize > 0) {
val currentCSet : Set[Set[String]] = currentLSetmap( wordSet => currentLSetmap(wordSet1 => wordSet | wordSet1))
reduceRight( (set1, set2) => set1 | set2)
filter( wordSet => (wordSetsize==k))
val currentItemCombs = currentCSetmap( wordSet => (wordSet, getSupport(wordSet)))
filter( wordSupportPair => (wordSupportPair_2 > minSupport))
currentLSet = currentItemCombsmap( wordSupportPair => wordSupportPair_1)toSet
itemCombs = itemCombs | currentItemCombs
k += 1
}
for (itemComb<-itemCombs) {
toRetItems += (itemComb_1 -> itemComb_2)
}
calculateAssociationRule(minConfidence)
}
def calculateAssociationRule(minConfidence : Double = 06) = {
toRetItemskeysforeach(item =>
itemsubsetsfilter( wordSet => (wordSetsize<itemsize & wordSetsize>0))
foreach( subset => {associationRules = associationRules :+ (subset, item diff subset,
toRetItems(item)toDouble/toRetItems(subset)toDouble)
}
)
)
associationRules = associationRulesfilter( rule => rule_3>minConfidence)
}}
我不建议用Java,应改用Python或Scala一类的语言。如果用Python,代码大概50行左右,但可以想像用Java便看起来复杂得多。看如下:
关联分析是指如果两个或多个事物之间存在一定的关联,那么其中一个事物就能通过其他事物进行预测它的目的是为了挖掘隐藏在数据间的相互关系 在数据挖掘的基本任务中关联(association)和顺序序贯模型(sequencing)关联分析是指搜索事务数据库(trarisactional databases)中的所有细节或事务,从中寻找重复出现概率很高的模式或规则。 其属于灰色理论中的一种分析方法。
以上就是关于scikit-learn 是不是没有 Apriori,FP-Growth 的 API全部的内容,包括:scikit-learn 是不是没有 Apriori,FP-Growth 的 API、基于python的毕业设计题目是什么、怎么用java实现apriori算法等相关内容解答,如果想了解更多相关内容,可以关注我们,你们的支持是我们更新的动力!
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)