逻辑回归实战（动手实践）_python

题目：我们将建立一个逻辑回归模型来预测一个学生是否被大学录取。假设你是一个大学系的管理员，你想根据两次考试的结果来决定每个申请人的录取机会。你有以前的申请人的历史数据，你可以用它作为逻辑回归的训练集。对于每一个培训例子，你有两个考试的申请人的分数和录取决定。为了做到这一点，我们将建立一个分类模型，根据考试成绩估计入学概率。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df=pd.read_table('LogiReg_data.txt',delimiter=',',header=None,
                 names=['第一次成绩','第二次成绩','分类'])
df.head()

	第一次成绩	第二次成绩	分类
0	34.623660	78.024693	0
1	30.286711	43.894998	0
2	35.847409	72.902198	0
3	60.182599	86.308552	1
4	79.032736	75.344376	1

X=df.iloc[:,0:2].values
X.shape

(100, 2)

y=df.iloc[:,-1].values
y.shape

(100,)

#划分数据集和训练集
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=0,stratify=y)

#标准化
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
X_train_std=sc.fit_transform(X_train)
X_test_std=sc.transform(X_test)

#训练模型
from sklearn.linear_model import LogisticRegression
LR=LogisticRegression(C=1.1)
LR.fit(X_train_std,y_train)

LogisticRegression(C=1.1)

#评估模型
from sklearn.metrics import accuracy_score

#训练集评估
pred1=LR.predict(X_train_std)
accuracy1=accuracy_score(y_train,pred1)
print('模型在训练集精确度：'+str(accuracy1))

#测试集评估
pred2=LR.predict(X_test_std)
accuracy2=accuracy_score(y_test,pred2)
print('模型在测试集精确度：'+str(accuracy2))

模型在训练集精确度：0.8666666666666667
模型在测试集精确度：1.0

#边界决策的可视化

from matplotlib.colors import ListedColormap

def plot_decision_regions(X,y,classifier,test_idx=None,resolution=0.02):

##简历颜色产生器和颜色绘图板
    markers=('s','x','o','^','y')
    colors=('red','blue','lightgreen','gray','cyan')
    cmap=ListedColormap(colors[:len(np.unique(y))])
    
##画出决策边界

    x1_min,x1_max=X[:,0].min()-1,X[:,0].max()+2
    x2_min,x2_max=X[:,1].min()-1,X[:,1].max()+2
    xx1,xx2=np.meshgrid(np.arange(x1_min,x1_max,resolution),
                       np.arange(x2_min,x2_max,resolution))
    z=classifier.predict(np.array([xx1.ravel(),xx2.ravel()]).T)
    z=z.reshape(xx1.shape)
    plt.contourf(xx1,xx2,z,alpha=0.2,cmap=cmap)
    plt.xlim(xx1.min(),xx2.max())
    plt.ylim(xx2.min(),xx2.max())
    
    #绘出样例
    for idx,c1 in enumerate(np.unique(y)):
        plt.scatter(x=X[y==c1,0],y=X[y==c1,1],
                   alpha=0.8,c=cmap(idx),
                   marker=markers[idx],label=c1)
        
    #绘出测试样例
    if test_idx:
        X_test,y_test=X[test_idx,:],y[test_idx]
        plt.scatter(X_test[:,0],X_test[:,1],c='',
                   alpha=0.1,linewidth=1,marker='o',label='test set',
                    edgecolors='black',s=150)

X=np.vstack([X_train_std,X_test_std])
y=np.hstack([y_train,y_test])

#coding:utf-8
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签
plt.rcParams['axes.unicode_minus']=False #用来正常显示负号
#有中文出现的情况，需要u'内容'


plt.figure(figsize=(10,8))
plot_decision_regions(X,y,classifier=LR,test_idx=range(75,100))
plt.xlabel('第一次成绩')
plt.ylabel('第二次成绩')
plt.xlim([-2.5,3])
plt.legend(loc='best')

*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with *x* & *y*.  Please use the *color* keyword-argument or provide a 2-D array with a single row if you intend to specify the same RGB or RGBA value for all points.
*c* argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with *x* & *y*.  Please use the *color* keyword-argument or provide a 2-D array with a single row if you intend to specify the same RGB or RGBA value for all points.
:33: MatplotlibDeprecationWarning: Using a string of single character colors as a color sequence is deprecated since 3.2 and will be removed two minor releases later. Use an explicit list instead.
  plt.scatter(X_test[:,0],X_test[:,1],c='',

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/722915.html

逻辑回归实战（动手实践）

发表评论

评论列表（0条）