使用scikit-learn分为多个类别_随笔

使用scikit-learn分为多个类别

您想要的就是多标签分类。Scikits-learn可以做到。参见此处：http : //scikit-
learn.org/dev/modules/multiclass.html
。

我不确定您的示例出了什么问题，我的sklearn版本显然没有WordNGramAnalyzer。也许这是使用更多训练示例或尝试使用其他分类器的问题？但是请注意，多标签分类器希望目标是元组/标签列表的列表。

以下对我有用：

import numpy as npfrom sklearn.pipeline import Pipelinefrom sklearn.feature_extraction.text import CountVectorizerfrom sklearn.svm import LinearSVCfrom sklearn.feature_extraction.text import TfidfTransformerfrom sklearn.multiclass import OneVsRestClassifierX_train = np.array(["new york is a hell of a town",         "new york was originally dutch",         "the big apple is great",         "new york is also called the big apple",         "nyc is nice",         "people abbreviate new york city as nyc",         "the capital of great britain is london",         "london is in the uk",         "london is in england",         "london is in great britain",         "it rains a lot in london",         "london hosts the british museum",         "new york is great and so is london",         "i like london better than new york"])y_train = [[0],[0],[0],[0],[0],[0],[1],[1],[1],[1],[1],[1],[0,1],[0,1]]X_test = np.array(['nice day in nyc',        'welcome to london',        'hello welcome to new york. enjoy it here and london too'])   target_names = ['New York', 'London']classifier = Pipeline([    ('vectorizer', CountVectorizer(min_n=1,max_n=2)),    ('tfidf', TfidfTransformer()),    ('clf', oneVsRestClassifier(LinearSVC()))])classifier.fit(X_train, y_train)predicted = classifier.predict(X_test)for item, labels in zip(X_test, predicted):    print '%s => %s' % (item, ', '.join(target_names[x] for x in labels))

对我来说，这产生了输出：

nice day in nyc => New Yorkwelcome to london => Londonhello welcome to new york. enjoy it here and london too => New York, London

希望这可以帮助。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5631434.html

使用scikit-learn分为多个类别

发表评论

评论列表（0条）