通过对paddle中api的查询,发现paddle的layer对象只在前向传播中支持两种hook,pre_hook和post_hook。
pre_hook可以对层的输入变量进行处理,用函数的返回值作为新的变量参与层的计算。
post_hook则可以对层的输出变量进行处理,将层的输出进行进一步处理后,用函数的返回值作为层计算的输出。
paddle的tensor对象只支持反向传播hook,可以用于获取梯度值或返回新的梯度值。
作用在layer的前向传播前,可以用于修改或抓取输入数据,使用hook1=layer.register_forward_pre_hook(func)注册hook对象,使用hook1.remove卸载hook
def forward_pre_hook(layer, input):
print(input)
return input
x = paddle.ones([10, 1], 'float32')
model = Model()
forward_pre_hook_handle = model.flatten.register_forward_pre_hook(forward_pre_hook)
out = model(x)
1.2 layer中forwar_post_hook的使用
作用在layer的前向传播后,可以用于修改或抓取输出数据,使用hook1=layer.register_forward_post_hook(func)注册hook对象,使用hook1.remove卸载hook
def forward_post_hook(layer, input, output):
return 2*output
x = paddle.ones([10, 1], 'float32')
model = Model()
forward_post_hook_handle = model.flatten.register_forward_post_hook(forward_post_hook)
out = model(x)
print(out)
forward_post_hook_handle.remove()
输出:Tensor(shape=[10, 1], dtype=float32, place=CPUPlace, stop_gradient=True,
[[2.],
[2.],
...
tensor.register_hook(func)为当前 Tensor 注册一个反向的 hook 函数。
该被注册的 hook 函数将会在每次当前 Tensor 的梯度 Tensor 计算完成时被调用。
被注册的 hook 函数不会修改输入的梯度 Tensor ,但是 hook 可以返回一个新的临时梯度 Tensor 代替当前 Tensor 的梯度继续进行反向传播。
使用示例如下所示
import paddle
# hook function return None
def print_hook_fn(grad):
print(grad)
# hook function return Tensor
def double_hook_fn(grad):
grad = grad * 2
return grad
x = paddle.to_tensor([0., 1., 2., 3.], stop_gradient=False)
y = paddle.to_tensor([4., 5., 6., 7.], stop_gradient=False)
z = paddle.to_tensor([1., 2., 3., 4.])
# one Tensor can register multiple hooks
h = x.register_hook(print_hook_fn)
x.register_hook(double_hook_fn)
w = x + y
# register hook by lambda function
w.register_hook(lambda grad: grad * 2)
o = z.matmul(w)
o.backward()
# print_hook_fn print content in backward
# Tensor(shape=[4], dtype=float32, place=CUDAPlace(0), stop_gradient=False,
# [2., 4., 6., 8.])
print("w.grad:", w.grad) # w.grad: [1. 2. 3. 4.]
print("x.grad:", x.grad) # x.grad: [ 4. 8. 12. 16.]
print("y.grad:", y.grad) # y.grad: [2. 4. 6. 8.]
# remove hook
h.remove()
2、实现即插即用的dropout
2.1 DropHookModel的实现
通过DropHookModel的封装,可以实现对任意网络模型的任意位置添加dropout、dropblock等模块,并保证在train时生效,在eval时不起作用。
import paddle
import paddle.nn as nn
class DropHookModel(nn.Layer):
def __init__(self,model,hook_layer,hook_func,hook_type):
super(DropHookModel, self).__init__()
self.model = model
self.hook_layer = hook_layer
self.hook_func = hook_func
self.hook_type = hook_type
self.hooks=[]
self.train()
#在train里进行hook绑定
def train(self):
self.model.train()
for layer,func,type_ in zip(self.hook_layer,self.hook_func,self.hook_type):
if type_=="forward_pre_hook":
hook=layer.register_forward_pre_hook(func)
elif type_=="forwar_post_hook":
hook=layer.register_forward_post_hook(func)
else:
raise("type_ must is one of ['forward_pre_hook','forwar_post_hook']")
self.hooks.append(hook)
#在eval里取消绑定
def eval(self):
self.model.eval()
for hook in self.hooks:
hook.remove()
#用于前向传播
def forward(self,x):
return self.model.forward(x)
#返回用于训练的参数
def parameters(self):
return self.model.parameters()
#返回用于训练的参数
def named_parameters(self):
return self.model.parameters()
#用于设置模型参数
def set_state_dict(self,model_dict):
self.model.set_state_dict(model_dict)
#用于返回模型参数
def state_dict(self):
return self.model.state_dict()
#创建hook函数1
dropout2d_1=nn.Dropout2D(p=0.3)
dropout2d_1.train()
def farward_hook_1(module, input, output):
output=dropout2d_1(output)
return output
#创建hook函数2
dropout2d_2=nn.Dropout2D(p=0.3)
dropout2d_2.train()
def farward_hook_2(module, input, output):
output=dropout2d_2(output)
return output
model=paddle.vision.resnet18()
#设置要绑定的layer对象
hook_layer=[model.layer4,model.layer3]
#设置要绑定的hook函数
hook_func=[farward_hook_1,farward_hook_2]
#设置要绑定的hook类型
hook_type=["forwar_post_hook","forwar_post_hook"]
#继续hook函数的绑定
drop_model=DropHookModel(model,hook_layer,hook_func,hook_type)
2.2 DropHookModel的使用
通过观察代码及执行输出,可以看到train模式下,每次的输出都不一样(dropout2d生效),而eval模式下,每次的输出都一样(dropout2d失效)
test_data=paddle.rand(shape=(1,3,224,224))
print("drop_model in train mode:")
drop_model.train()
for i in range(5):
out=drop_model(test_data)
print(out.numpy().argmax(axis=1),out.numpy().max(axis=1))
print("\ndrop_model in eval mode:")
drop_model.eval()
for i in range(5):
out=drop_model(test_data)
print(out.numpy().argmax(axis=1),out.numpy().max(axis=1))
代码执行输出如下所示 drop_model in train mode: [873] [2.8585076] [909] [2.964711] [369] [3.7595296] [145] [3.1055984] [391] [3.1836662] drop_model in eval mode: [919] [4.50915] [919] [4.50915] [919] [4.50915] [919] [4.50915] [919] [4.50915]2.3 DropHookModel的训练
通过博主对train、eval、forward、parameters4个函数的封装,使得DropHookModel对象可以跟正常的模型一样使用与训练。
# 设置优化器
optim = paddle.optimizer.Adam(parameters=drop_model.parameters())
# 设置损失函数 其中内置了softmax和onehot
loss_fn = paddle.nn.CrossEntropyLoss()
x_data = paddle.rand(shape=(10,3,224,224))
y_data = paddle.randint(low=0, high=100, shape=[10,1])
for i in range(10):
predicts=drop_model(x_data)
loss = loss_fn(predicts, y_data)
# 计算准确率 等价于 prepare 中metrics的设置
acc = paddle.metric.accuracy(predicts, y_data)
# 反向传播
loss.backward()
# 更新参数
optim.step()
# 梯度清零
optim.clear_grad()
print(i,loss.item())
代码执行输出如下所示: 0 12.406988143920898 1 6.672011852264404 2 6.359235763549805 3 6.101187229156494 4 5.717696666717529 5 5.145468711853027 6 4.492755889892578 7 4.362451076507568 8 3.6308979988098145 9 3.2801218032836914
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)