逻辑回归实战(动手实践)

逻辑回归实战(动手实践),第1张

题目:我们将建立一个逻辑回归模型来预测一个学生是否被大学录取。假设你是一个大学系的管理员,你想根据两次考试的结果来决定每个申请人的录取机会。你有以前的申请人的历史数据,你可以用它作为逻辑回归的训练集。对于每一个培训例子,你有两个考试的申请人的分数和录取决定。为了做到这一点,我们将建立一个分类模型,根据考试成绩估计入学概率。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.read_table('LogiReg_data.txt',delimiter=',',header=None,
                 names=['第一次成绩','第二次成绩','分类'])
df.head()
第一次成绩第二次成绩分类
034.62366078.0246930
130.28671143.8949980
235.84740972.9021980
360.18259986.3085521
479.03273675.3443761
X=df.iloc[:,0:2].values
X.shape
(100, 2)
y=df.iloc[:,-1].values
y.shape
(100,)
#划分数据集和训练集
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=0,stratify=y)
#标准化
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
X_train_std=sc.fit_transform(X_train)
X_test_std=sc.transform(X_test)
#训练模型
from sklearn.linear_model import LogisticRegression
LR=LogisticRegression(C=1.1)
LR.fit(X_train_std,y_train)
LogisticRegression(C=1.1)
#评估模型
from sklearn.metrics import accuracy_score

#训练集评估
pred1=LR.predict(X_train_std)
accuracy1=accuracy_score(y_train,pred1)
print('模型在训练集精确度:'+str(accuracy1))

#测试集评估
pred2=LR.predict(X_test_std)
accuracy2=accuracy_score(y_test,pred2)
print('模型在测试集精确度:'+str(accuracy2))


模型在训练集精确度:0.8666666666666667
模型在测试集精确度:1.0
#边界决策的可视化

from matplotlib.colors import ListedColormap

def plot_decision_regions(X,y,classifier,test_idx=None,resolution=0.02):

##简历颜色产生器和颜色绘图板
    markers=('s','x','o','^','y')
    colors=('red','blue','lightgreen','gray','cyan')
    cmap=ListedColormap(colors[:len(np.unique(y))])
    
##画出决策边界

    x1_min,x1_max=X[:,0].min()-1,X[:,0].max()+2
    x2_min,x2_max=X[:,1].min()-1,X[:,1].max()+2
    xx1,xx2=np.meshgrid(np.arange(x1_min,x1_max,resolution),
                       np.arange(x2_min,x2_max,resolution))
    z=classifier.predict(np.array([xx1.ravel(),xx2.ravel()]).T)
    z=z.reshape(xx1.shape)
    plt.contourf(xx1,xx2,z,alpha=0.2,cmap=cmap)
    plt.xlim(xx1.min(),xx2.max())
    plt.ylim(xx2.min(),xx2.max())
    
    #绘出样例
    for idx,c1 in enumerate(np.unique(y)):
        plt.scatter(x=X[y==c1,0],y=X[y==c1,1],
                   alpha=0.8,c=cmap(idx),
                   marker=markers[idx],label=c1)
        
    #绘出测试样例
    if test_idx:
        X_test,y_test=X[test_idx,:],y[test_idx]
        plt.scatter(X_test[:,0],X_test[:,1],c='',
                   alpha=0.1,linewidth=1,marker='o',label='test set',
                    edgecolors='black',s=150)
X=np.vstack([X_train_std,X_test_std])
y=np.hstack([y_train,y_test])
#coding:utf-8
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签
plt.rcParams['axes.unicode_minus']=False #用来正常显示负号
#有中文出现的情况,需要u'内容'


plt.figure(figsize=(10,8))
plot_decision_regions(X,y,classifier=LR,test_idx=range(75,100))
plt.xlabel('第一次成绩')
plt.ylabel('第二次成绩')
plt.xlim([-2.5,3])
plt.legend(loc='best')
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with *x* & *y*.  Please use the *color* keyword-argument or provide a 2-D array with a single row if you intend to specify the same RGB or RGBA value for all points.
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with *x* & *y*.  Please use the *color* keyword-argument or provide a 2-D array with a single row if you intend to specify the same RGB or RGBA value for all points.
:33: MatplotlibDeprecationWarning: Using a string of single character colors as a color sequence is deprecated since 3.2 and will be removed two minor releases later. Use an explicit list instead.
  plt.scatter(X_test[:,0],X_test[:,1],c='',







欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/langs/722915.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-04-26
下一篇 2022-04-26

发表评论

登录后才能评论

评论列表(0条)

保存