内容为对测试集中声音数据进行性别预测,测试文件夹中有951行数据
数据类别一共两类:female、male
推荐使用SVM分类,也可使用其他如决策树等方法分类
在jupyterlab中将识别结果写入文件 /home/ilab/submission
每行输出id号(第一列数据)+'t'+ female或者是male(不要改变测试集顺序)
例如:
6 female
7 male
8 male
9 female
代码:评分91.3
import pandas as pd import numpy as np from sklearn import svm from sklearn import model_selection import matplotlib.pyplot as plt import matplotlib as mpl train_data = pd.read_csv('/ilab/datasets/local/voice/train.csv')#导入数据集 test = pd.read_csv('/ilab/datasets/local/voice/test.csv')
train = train_data.values.tolist() #转化成列表 test1 = test.values.tolist() ss1 = {'male':0, 'female':1} #设置标签 ss2 = {} a = -1 yy= [] for i in train: #一共二十列变量,逐个分析 a = a + 1 i[21] = ss1[i[21]] yy.append(i[21]) i.pop(21) train = np.mat(train) train2 = train[:, 1:3] test = np.mat(test) test1 = test[:,1:3] yy = np.array(yy) print(train)
def classifier(): clf = svm.SVC(C=0.5, # 误差惩罚系数,默认1 kernel='linear', # 线性核 decision_function_shape='ovr') # 决策函数 return clf clf = classifier() print(yy)
def train1(clf, x_train, y_train): clf.fit(x_train, # 训练集特征向量 y_train.ravel()) # 训练集目标值 # 训练SVM 模型 train2 = train[:, 1:19] test1 = test[:, 1:19] print(test1) train1(clf, train2, yy) ha = clf.predict(test1)
# 读取test.csv文件,输出submission import pandas as pd # 将id读取 test = pd.read_csv('/ilab/datasets/local/voice/test.csv') test_sub = test.iloc[:, [0]] # 训练出来性别 gender = [] for i in range(len(test_sub)): if ha[i] == 1: gender.append('female') else: gender.append('male') print(gender) df_gender = pd.Dataframe(gender) # 将id列和预测出来的性别列拼接并输出submission文件 df=pd.concat([test_sub,df_gender],axis=1) df.to_csv('/home/ilab/submission', sep='t', header=None, index=False)
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)