考虑这种向量化方法:
from sklearn.feature_extraction.text import CountVectorizervect = CountVectorizer()X = vect.fit_transform(df1.consumption)Y = vect.transform(df2.creature + ' ' + df2.food)res = np.ravel(np.any((X.dot(Y.T) > 1).todense(), axis=1))
结果:
In [67]: resOut[67]: array([ True, False, True, False, False, True, False, True, False], dtype=bool)
说明:
In [68]: pd.Dataframe(X.toarray(), columns=vect.get_feature_names())Out[68]: apple ate badger banana digs eats elephant gets giraffe grass huge in is likes loves monkey squirrel tree0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 01 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 02 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 03 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 04 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 05 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 06 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 07 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 18 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0In [69]: pd.Dataframe(Y.toarray(), columns=vect.get_feature_names())Out[69]: apple ate badger banana digs eats elephant gets giraffe grass huge in is likes loves monkey squirrel tree0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 01 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 02 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 03 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
更新:
In [92]: df1['match'] = np.ravel(np.any((X.dot(Y.T) > 1).todense(), axis=1))In [93]: df1Out[93]: consumption match0 squirrel ate apple True1 monkey likes apple False2 monkey banana gets True3 badger gets banana False4 giraffe eats grass False5 badger apple loves True6elephant is huge False7 elephant eats banana tree True8 squirrel digs in grass False9 squirrel.eats/apple True # <----- NOTE
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)