在使用Jupyter练习关联规则挖掘时出现了一些莫名其妙的问题,已解决,记录一下。
给的例子如下:
from efficient_apriori import apriori
import pandas as pd
def data_generator(filename): """ Data generator, needs to return a generator to be called several times. """ def data_gen(): with open(filename) as file: for line in file: yield tuple(k.strip() for k in line.split(',')) #transactions.append(list(line.strip().split(','))) return data_gen
# file_path = "https://github.com/seratch/apriori.js/blob/master/dataset.csv" transactions = data_generator("dataset.csv") itemsets, rules = apriori(transactions, min_support=0.5, min_confidence=1)
itemsets
rules
transactions_2 = data_generator("store_data.csv") itemsets_2, rules_2 = apriori(transactions_2, min_support=0.0045, min_confidence=0.2)
rules
for rule in rules[:10]: print(rule)
然后在第四个代码块下报错:
--------------------------------------------------------------------------- TypeError Traceback (most recent call last)in () 1 # file_path = "https://github.com/seratch/apriori.js/blob/master/dataset.csv" 2 transactions = data_generator("dataset.csv") ----> 3 itemsets, rules = apriori(transactions, min_support=0.5, min_confidence=1) C:ProgramDataAnaconda3libsite-packagesefficient_aprioriapriori.py in apriori(transactions, min_support, min_confidence, max_length, verbosity, output_transaction_ids) 61 max_length, 62 verbosity, ---> 63 output_transaction_ids=True, 64 ) 65 C:ProgramDataAnaconda3libsite-packagesefficient_aprioriitemsets.py in itemsets_from_transactions(transactions, min_support, max_length, verbosity, output_transaction_ids) 276 277 # Store in transaction manager --> 278 manager = TransactionManager(transactions) 279 280 # STEP 1 - Generate all large itemsets of size 1 C:ProgramDataAnaconda3libsite-packagesefficient_aprioriitemsets.py in __init__(self, transactions) 29 30 # Populate ---> 31 for i, transaction in enumerate(transactions): 32 for item in transaction: 33 self._indices_by_item[item].add(i) TypeError: 'function' object is not iterable
查了一下给的例子是官网efficient-apriori的1.0.0以及其他一些版本的 :efficient-apriori · PyPI
最新版本没有这个例子:efficient-apriori · PyPI
研究了好几天,结果发现return那儿加个括号就行,即把return data_gen修改为return data_gen()……
最终稍加修改,包括一些路径和参数,如下:
from efficient_apriori import apriori
import pandas as pd
def data_generator(filename): """ Data generator, needs to return a generator to be called several times. """ def data_gen(): with open(filename) as file: for line in file: yield tuple(k.strip() for k in line.split(',')) #transactions.append(list(line.strip().split(','))) return data_gen() #https://pypi.org/project/efficient-apriori/1.0.0/,运行下一句报错,return data_gen修改为return data_gen()解决
# file_path = "https://github.com/seratch/apriori.js/blob/master/dataset.csv" transactions = data_generator(r"C:UsersuserabDesktop第二次实验dataset.csv") itemsets, rules = apriori(transactions, min_support=0.1, min_confidence=1)
itemsets
{1: {('Brooklyn',): 216, ('',): 1413, ('MBE',): 953, ('WBE',): 678, ('BLACK',): 427, ('ASIAN',): 287, ('New York',): 419, ('HISPANIC',): 233, ('NON-MINORITY',): 426}, 2: {('', 'ASIAN'): 287, ('', 'BLACK'): 423, ('', 'Brooklyn'): 215, ('', 'HISPANIC'): 231, ('', 'MBE'): 946, ('', 'NON-MINORITY'): 426, ('', 'New York'): 418, ('', 'WBE'): 671, ('ASIAN', 'MBE'): 284, ('BLACK', 'MBE'): 427, ('Brooklyn', 'MBE'): 160, ('HISPANIC', 'MBE'): 233, ('MBE', 'New York'): 242, ('MBE', 'WBE'): 240, ('NON-MINORITY', 'New York'): 168, ('NON-MINORITY', 'WBE'): 426, ('New York', 'WBE'): 249}, 3: {('', 'ASIAN', 'MBE'): 284, ('', 'BLACK', 'MBE'): 423, ('', 'Brooklyn', 'MBE'): 159, ('', 'HISPANIC', 'MBE'): 231, ('', 'MBE', 'New York'): 241, ('', 'MBE', 'WBE'): 233, ('', 'NON-MINORITY', 'New York'): 168, ('', 'NON-MINORITY', 'WBE'): 426, ('', 'New York', 'WBE'): 248, ('NON-MINORITY', 'New York', 'WBE'): 168}, 4: {('', 'NON-MINORITY', 'New York', 'WBE'): 168}}
rules
[{ASIAN} -> {}, {NON-MINORITY} -> {}, {BLACK} -> {MBE}, {HISPANIC} -> {MBE}, {NON-MINORITY} -> {WBE}, {ASIAN, MBE} -> {}, {, BLACK} -> {MBE}, {, HISPANIC} -> {MBE}, {NON-MINORITY, New York} -> {}, {NON-MINORITY, WBE} -> {}, {, NON-MINORITY} -> {WBE}, {NON-MINORITY} -> {, WBE}, {NON-MINORITY, New York} -> {WBE}, {NON-MINORITY, New York, WBE} -> {}, {, NON-MINORITY, New York} -> {WBE}, {NON-MINORITY, New York} -> {, WBE}]
transactions_2 = data_generator(r"C:UsersuserabDesktop第二次实验store_data.csv") itemsets_2, rules_2 = apriori(transactions_2, min_support=0.0045, min_confidence=0.2)
rules
[{ASIAN} -> {}, {NON-MINORITY} -> {}, {BLACK} -> {MBE}, {HISPANIC} -> {MBE}, {NON-MINORITY} -> {WBE}, {ASIAN, MBE} -> {}, {, BLACK} -> {MBE}, {, HISPANIC} -> {MBE}, {NON-MINORITY, New York} -> {}, {NON-MINORITY, WBE} -> {}, {, NON-MINORITY} -> {WBE}, {NON-MINORITY} -> {, WBE}, {NON-MINORITY, New York} -> {WBE}, {NON-MINORITY, New York, WBE} -> {}, {, NON-MINORITY, New York} -> {WBE}, {NON-MINORITY, New York} -> {, WBE}]
for rule in rules[:10]: print(rule)
{ASIAN} -> {} (conf: 1.000, supp: 0.202, lift: 1.005, conv: 4929577.465) {NON-MINORITY} -> {} (conf: 1.000, supp: 0.300, lift: 1.005, conv: 4929577.465) {BLACK} -> {MBE} (conf: 1.000, supp: 0.301, lift: 1.490, conv: 328873239.437) {HISPANIC} -> {MBE} (conf: 1.000, supp: 0.164, lift: 1.490, conv: 328873239.437) {NON-MINORITY} -> {WBE} (conf: 1.000, supp: 0.300, lift: 2.094, conv: 522535211.268) {ASIAN, MBE} -> {} (conf: 1.000, supp: 0.200, lift: 1.005, conv: 4929577.465) {, BLACK} -> {MBE} (conf: 1.000, supp: 0.298, lift: 1.490, conv: 328873239.437) {, HISPANIC} -> {MBE} (conf: 1.000, supp: 0.163, lift: 1.490, conv: 328873239.437) {NON-MINORITY, New York} -> {} (conf: 1.000, supp: 0.118, lift: 1.005, conv: 4929577.465) {NON-MINORITY, WBE} -> {} (conf: 1.000, supp: 0.300, lift: 1.005, conv: 4929577.465)
最后吐槽一下,倒数第三个代码块,你这是不是没输出结果啊?
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)