上一节次梯度python实现介绍了python求解次梯度的基本算法,但针对的是无偏置项的求解:
本节我们增加偏置项,推导完整的proximal gradient descent算法的python实现代码,最优化目标函数变为:
目标是预估权重和偏置项,那么权重项的次微分为:
0 \\ \left [ -1,1 \right ], & x =0 \\ -1, & x < 0 \end{array} \right. \end{equation}" src="https://latex.codecogs.com/gif.latex?%5Cbegin%7Bequation%7D%20g%28%5Ctextbf%7Bx%7D%29%20%3D%20%5Ctextbf%7BA%7D%5ET%28%5Ctextbf%7BA%7D%5Ctextbf%7Bx%7D%20+%20%5Ctextbf%7B1%7D%20b-%20%5Ctextbf%7By%7D%20%29%20+%20%5Cmu%20%5Cleft%5C%7B%20%5Cbegin%7Barray%7D%7Blr%7D%201%2C%20%26%20x%20%3E0%20%5C%5C%20%5Cleft%20%5B%20-1%2C1%20%5Cright%20%5D%2C%20%26%20x%20%3D0%20%5C%5C%20-1%2C%20%26%20x%20%3C%200%20%5Cend%7Barray%7D%20%5Cright.%20%5Cend%7Bequation%7D" />
而偏置项的次微分为:
其中,为的元素数,通常是一次梯度算法的样本数,那么基于上节的代码,我们给出完整的proximal gradient descent算法的python实现:
# -*- coding: utf-8 -*-
import numpy as np
import scipy as spy
from scipy.sparse import csc_matrix
import matplotlib.pyplot as plt
import time #用来计算运行时间
#=======模拟数据======================
m = 512
n = 1024
#稀疏矩阵的产生,A使用的是正态稀疏矩阵
u= spy.sparse.rand(n,1,density=0.1,format='csc',dtype=None)
u1 = u.nonzero()
row = u1[0]
col = u1[1]
data = np.random.randn(int(0.1*n))
u = csc_matrix((data, (row, col)), shape=(n,1)).toarray() #1024 * 1
a = np.random.randn(m,n) # 512 * 1024
bias = 2.632 # 偏置
y0 = np.dot(a,u) + bias # a * u + bias, 512 * 1
v = 1e-3 #v为题目里面的miu
def f(x0): #目标函数 1/2*||Ax + b - y0||^2 + mu*||x||1
return 1/2*np.dot((np.dot(a,x0) + bias -y0).T,np.dot(a,x0) + bias -y0)+v*sum(abs(x0))
def S(x1,v):
for i in range(len(x1)):
if np.abs(x1[i]) - v > 0:
x1[i] = np.sign(x1[i]) * (np.abs(x1[i]) - v)
else:
x1[i] = 0
return x1
#==========初始值=============================
#x0 = np.zeros((n,1)) #1024 * 1
x0 = (2.0*np.random.random((n,1)) - 1.0) * 0.01
b0 = np.random.random(1)
y = []
time1 = []
start = time.clock()
print("begin to train!")
oneVecs = np.ones((m,1))
#=========开始迭代==========================
for i in range(3000):
if i %10 == 0:
if len(y) > 0:
print("step " + str(i) + "val: " + str(y[len(y) - 1]))
mid_result = f(x0)
y.append(f(x0)[0,0]) #存放每次的迭代函数值
#g0 = (np.dot(np.dot(a.T,a),x0)-np.dot(a.T,b) + v*np.sign(x0))
#次梯度x: A^T(Ax + 1 * b - y0) + mu * sign(x)
#剃度b: b * 1^T * (Ax - y) + b * m
g0 = np.dot(np.dot(a.T,a),x0) + np.dot(a.T,oneVecs)*b0 -np.dot(a.T,y0)
gb0 = (np.dot(oneVecs.T, np.dot(a, x0)) - np.dot(oneVecs.T,y0) + m * b0)
t = 0.025/np.sqrt(sum(np.dot(g0.T,g0))) #设为0.01效果比0.1好很多,步长
x1 = S(x0 - t[0]*g0, v)
b1 = b0 - t[0]*gb0
x0 = x1
b0 = b1
end = time.clock()
time1.append(end)
y = np.array(y).reshape((3000,1))
time1 = np.array(time1)
time1 = time1 - start
time2 = time1[np.where(y - y[999] < 10e-4)[0][0]]
plt.plot(y)
plt.show()
for val in y:
print(val)
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)