前向传播是计算输出值
y
^
=
W
T
x
+
b
\hat y=W^Tx+b
y^=WTx+b与损失函数
L
o
s
s
(
y
^
,
y
)
Loss(\hat y,y)
Loss(y^,y)。
具体而言就在神经网络中是根据输入的数据x与网络的权重w经过激活函数后输出
y
^
\hat y
y^,并根据
y
^
\hat y
y^与
y
y
y计算出损失函数的过程。
方向传播是计算损失函数的梯度,即
Δ
w
=
α
∂
L
∂
w
\Delta w=\alpha\frac{\partial L}{\partial w}
Δw=α∂w∂L,
Δ
b
=
α
∂
L
∂
b
\Delta b=\alpha\frac{\partial L}{\partial b}
Δb=α∂b∂L,$w = w - \Delta w
,
,
,b = b - \Delta b
,
其
中
,其中
,其中\alpha$是学习率。
根据微积分中的梯度方向是函数值增长方向的原理,通过向梯度方向的反方向更新权重与偏置,来使得损失函数的取值最小化的过程。
以下采用简单的一元线性函数
y
=
w
x
+
b
y=wx+b
y=wx+b进行实验,实现感知器。
激活函数选择线性函数,损失函数采用平方误差函数
L
o
s
s
(
y
^
,
y
)
=
(
y
^
−
y
)
2
Loss(\hat y,y)=(\hat y-y)^2
Loss(y^,y)=(y^−y)2,平方误差函数对w的偏导数为
∂
L
∂
w
=
2
(
y
^
−
y
)
×
x
\frac{\partial L}{\partial w}=2(\hat y-y)\times x
∂w∂L=2(y^−y)×x,对b的偏导数为
∂
L
∂
b
=
2
(
y
^
−
y
)
\frac{\partial L}{\partial b}=2(\hat y-y)
∂b∂L=2(y^−y)。
本段代码大约在30epoch后收敛。
import numpy as np
# generate the data and add random noise
a = 3
b = 1
data_size = 1000
train_x = np.random.randn(1, data_size)
train_y = a * train_x + b + 0.1 * np.random.randn(1, data_size)
class Network:
def __init__(self, size, learning_rate):
self.w = np.random.randn(size)
self.b = np.random.randn(1)
self.learning_rate = learning_rate
def y_hat(self, x):
return np.dot(self.w, x)+self.b
def loss(self, x, y):
return np.square(self.y_hat(x) - y)
def update(self, x, y):
gradient = 2*(self.y_hat(x) - y)
self.w -= gradient*self.learning_rate*x
self.b -= gradient*self.learning_rate
def train(self, x, y):
for index in range(y.shape[1]):
self.update(x[:,index], y[:,index])
network = Network(1, 1e-4)
valid_x = np.random.randn(1, data_size//3)
valid_y = a*valid_x + b + 0.1*np.random.randn(1, data_size//3)
epoch = 30
for i in range(epoch):
print("epoch:{}/{}".format(i,epoch))
network.train(train_x, train_y)
print("w:{},b:{},train_loss:{},valid_loss:{}".format(network.w, network.b,
np.sum(network.loss(train_x, train_y))/data_size,
np.sum(network.loss(valid_x, valid_y))/(data_size//3)))
mindspore
参考简单线性函数拟合。
import numpy as np
import matplotlib.pyplot as plt
from mindspore import dataset as ds
from mindspore import nn
from mindspore import Tensor
from mindspore import Model
def generate_data(data_size, w=3.0, b=1.0):
for _ in range(data_size):
x = np.random.randn(1)
y = w * x + b + 0.1 * np.random.randn(1)
yield np.array([x]).astype(np.float32), np.array([y]).astype(np.float32)
def create_dataset(data_size, batch_size=16, repeat_size=1):
input_data = ds.GeneratorDataset(list(generate_data(data_size)), column_names=['data', 'label'])
input_data = input_data.batch(batch_size)
input_data = input_data.repeat(repeat_size)
return input_data
def model_display(net):
model_params = net.trainable_params()
for param in model_params:
print(param, param.asnumpy())
x_model_label = np.array([-10, 10, 0.1])
y_model_label = (x_model_label * Tensor(model_params[0]).asnumpy()[0][0] +
Tensor(model_params[1]).asnumpy()[0])
x_label, y_label = zip(*generate_data(data_number))
plt.axis([-10, 10, -20, 25])
plt.scatter(x_label, y_label, color="red", s=5)
plt.plot(x_model_label, y_model_label, color="blue")
plt.show()
class LinearNet(nn.Cell):
def __init__(self):
super(LinearNet, self).__init__()
self.fc = nn.Dense(1, 1)
def construct(self, x):
x = self.fc(x)
return x
data_number = 100
batch_number = 16
repeat_number = 1
ds_train = create_dataset(data_number, batch_size=batch_number, repeat_size=repeat_number)
net = LinearNet()
model_display(net)
net_loss = nn.loss.MSELoss()
opt = nn.Momentum(net.trainable_params(), learning_rate=0.005, momentum=0.9)
model = Model(net, net_loss, opt)
epoch = 10
model.train(epoch, ds_train, dataset_sink_mode=False)
for net_param in net.trainable_params():
print(net_param, net_param.asnumpy())
model_display(net)
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)