pd.get_dummies()不需要使用LabelEnprer +
OneHotEnprer,它们可以存储原始值,然后在新数据上使用它们。
像下面那样更改代码将为您提供所需的结果。
import pandas as pdfrom sklearn.preprocessing import OneHotEnprer, LabelEnprerinput_df = pd.Dataframe(dict(fruit=['Apple', 'Orange', 'Pine'], color=['Red', 'Orange','Green'], is_sweet = [0,0,1], country=['USA','India','Asia']))filtered_df = input_df.apply(pd.to_numeric, errors='ignore')# This is what you needle_dict = {}for col in filtered_df.columns: le_dict[col] = LabelEnprer().fit(filtered_df[col]) filtered_df[col] = le_dict[col].transform(filtered_df[col])enc = oneHotEnprer()enc.fit(filtered_df)refreshed_df = enc.transform(filtered_df).toarray()new_df = pd.Dataframe(dict(fruit=['Apple'], color=['Red'], is_sweet = [0], country=['USA']))for col in new_df.columns: new_df[col] = le_dict[col].transform(new_df[col])new_refreshed_df = enc.transform(new_df).toarray()print(filtered_df) color country fruit is_sweet0 2 2 0 01 1 1 1 02 0 0 2 1print(refreshed_df)[[ 0. 0. 1. 0. 0. 1. 1. 0. 0. 1. 0.] [ 0. 1. 0. 0. 1. 0. 0. 1. 0. 1. 0.] [ 1. 0. 0. 1. 0. 0. 0. 0. 1. 0. 1.]]print(new_df) color country fruit is_sweet0 2 2 0 0print(new_refreshed_df)[[ 0. 0. 1. 0. 0. 1. 1. 0. 0. 1. 0.]]
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)