吴恩达机器学习作业一：利用线性回归模型+梯度下降算法实现餐车利润预测（python实现）_python

吴恩达机器学习作业一：利用线性回归模型+梯度下降算法实现餐车利润预测（python实现）

该文是针对吴恩达机器学习作业任务一中，利用单变量线性回归模型实现餐车利润预测，对于任务二利用多元线线性回归模型实现房价预测见博客：传送门
当然，本文中的代价函数计算，梯度下降算法的实现也适用于多元线性回归。

文章目录

- 吴恩达机器学习作业一：利用线性回归模型+梯度下降算法实现餐车利润预测（python实现）
- - 任务
  - - 数据读取
    - 绘制数据,看下数据分布情况
    - 初始化值的设置
    - 代价函数计算
    - 运行梯度下降算法
    - 图形绘制
    - 完整代码

任务

In this part of this exercise, you will implement linear regression with one
variable to predict profits for a food truck. Suppose you are the CEO of a
restaurant franchise and are considering different cities for opening a new
outlet. The chain already has trucks in various cities and you have data for
profits and populations from the cities.

利用线性回归模型预测餐车的利润：现在有一组数据，每个城市的人口以及在该城市餐车获得的利润，根据已有的这些数据，当给出人口数时预测餐车的利润。给出的数据第一列是一个城市的人口，第二列是一个城市的快餐车的利润。利润为负数表示亏损。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

数据读取

#读取数据
data=pd.read_csv("../code/ex1-linear regression/ex1data1.txt",delimiter=',',header=None,names=['population','profit'])
print(data)
data.head()

绘制数据,看下数据分布情况

#绘制数据,看下数据分布情况
x=data['population']
y=data['profit']
plt.scatter(x,y)
plt.plot()
plt.xlabel('population')
plt.ylabel('profit')
plt.show()

初始化值的设置

对于x0的值都为1，因此需要在训练数据的第一列添加一列，值为1
学习率设定为0.01,theta初始化值设为0，迭代次数设为1000次，m表示总共有多少个输入行

#在线性回归模型中，x0=1,即训练数据应该添加一列，值为1
data.insert(0, 'ones',1)
#初始化值的设置
alpha=0.01
theta=np.array([[0,0]])
iters=1000
m=len(X)
print(theta)

# X表示（x0,x1,...,xn）;Y表示实际利润
col=data.shape[1]
X=np.array(data.iloc[:,0:col-1])
#print(X)
Y=np.array(data.iloc[:,col-1:col])
#print(Y)
X.shape,Y.shape,theta.shape

代价函数计算

def computeCost(X,Y,theta):
    d=np.power(np.dot(X,(theta.T))-Y,2)
    return np.sum(d)/(2*len(d))
computeCost(X,Y,theta) #当theta值为0时，代价为多少

32.072733877455676

运行梯度下降算法

在进行1000次迭代后，最新的theta值

def gradientDescent(X,Y,theta,iters,alpha):
    for i in range(inters):
        error=np.dot(X,(theta.T))-Y
        temp=np.dot(error.T,X)
        #print(temp)
        theta=theta-alpha*temp/m #每一次迭代后，theta的值都会得到更新
        #print(theta)
    return theta
    
#得到第1000次迭代后的theta值，并计算此时的代价
theta=gradientDescent(X,Y,theta,iters)
print(theta)
cost=computeCost(X,Y,theta)
print(cost)

[[-3.78806857 1.18221277]]
4.4780276098799705

图形绘制

#绘制线性回归模型，以及已有数据人口和利润的分布
def drawFigrue(data,theta):
    fig,ax=plt.subplots(figsize=(12,8))
    x1=data['population']
    y1=data['profit']
    ax.scatter(x1,y1,label='Training data')
    x2=np.linspace(np.min(data['population']),np.max(data['population']),100)
    y2=theta[0,0]+theta[0,1]*x2
    ax.plot(x2,y2,'r',label='Prediction')
    ax.legend(loc=2)
    ax.set_xlabel('population')
    ax.set_ylabel('profit')
    ax.set_title("Predicted Profit vs. Population Size")
    plt.show()
    
drawFigrue(data,theta)

完整代码

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#代价函数计算
def computeCost(X,Y,theta):
    d=np.power(np.dot(X,(theta.T))-Y,2)
    return np.sum(d)/(2*len(d))
    
#梯度下降算法，每次迭代和theta值的变化
def gradientDescent(X,Y,theta,iters,alpha):
    for i in range(iters):
        error=np.dot(X,(theta.T))-Y
        temp=np.dot(error.T,X)
        #print(temp)
        
        theta=theta-alpha*temp/m #每一次迭代后，theta的值都会得到更新
        #print(theta)
    return theta
    
#绘制线性回归模型，以及已有数据人口和利润的分布
def drawFigrue(data,theta):
    fig,ax=plt.subplots(figsize=(12,8))
    x1=data['population']
    y1=data['profit']
    ax.scatter(x1,y1,label='Training data')
    x2=np.linspace(np.min(data['population']),np.max(data['population']),100)
    y2=theta[0,0]+theta[0,1]*x2
    ax.plot(x2,y2,'r',label='Prediction')
    ax.legend(loc=2)
    ax.set_xlabel('population')
    ax.set_ylabel('profit')
    ax.set_title("Predicted Profit vs. Population Size")
    plt.show()

#读取数据
data=pd.read_csv("../code/ex1-linear regression/ex1data1.txt",delimiter=',',header=None,names=['population','profit'])

#绘制数据,看下数据分布情况
x=data['population']
y=data['profit']
plt.scatter(x,y)
plt.plot()
plt.xlabel('population')
plt.ylabel('profit')
plt.show()

#在线性回归模型中，x0=1,即训练数据应该添加一列，值为1
data.insert(0, 'ones',1)
#初始化值的设置
alpha=0.01
theta=np.array([[0,0]])
iters=1000
m=len(X)
# X表示（x0,x1,...,xn）;Y表示实际利润
col=data.shape[1]
X=np.array(data.iloc[:,0:col-1])
Y=np.array(data.iloc[:,col-1:col])

computeCost(X,Y,theta) #当theta值为0时，代价为多少

#得到第1000次迭代后的theta值，并计算此时的代价
theta=gradientDescent(X,Y,theta,iters,alpha)
print(theta)
cost=computeCost(X,Y,theta)
print(cost)
#绘制线性回归模型，以及已有数据人口和利润的分布
drawFigrue(data,theta)

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/717420.html

吴恩达机器学习作业一：利用线性回归模型+梯度下降算法实现餐车利润预测（python实现）

发表评论

评论列表（0条）