三、RNN模型与 NLP应用 —— LSTM_python

三、RNN模型与 NLP应用 —— LSTM

前言
LSTM概览
LSTM结构
LSTM参数量
使用LSTM训练情感分析
总结:

前言

LSTM(Long short-term memory)是对Simple RNN的改进, 其可以避免梯度消失的问题, 可以有更长的记忆

LSTM概览

图1. Simple RNN只有1个参数矩阵, 而LSTM有4个参数矩阵 LSTM结构

图2. LSTM的传输带结构, 通过传输带向量 c t c_t ct来避免梯度消失的问题

遗忘门 f t f_t ft(Forget Gate): - 选择性遗忘传输带向量 c c c

遗忘门向量f有选择地让传输带向量 c c c通过:
– 如果f中某元素为0, 则 c c c中的对应元素不能通过;
– 如果f中某元素为1, 则 c c c中的对应元素全部通过;
– 如果f中某元素为(0,1), 则 c c c中的对应元素部分通过.

图3. 将 h t − 1 h_{t-1} ht−1和 x t x_t xt拼接后, 与遗忘门可学习矩阵 W f W_f Wf点乘, 通过 s i g m o i d sigmoid sigmoid激活输出遗忘门向量 f t f_t ft

输入门 i t i_t it (Input Gate): - 决定更新哪些传输带向量 c c c
图4. 将 h t − 1 h_{t-1} ht−1和 x t x_t xt拼接后, 与输入门可学习矩阵 W i W_i Wi点乘, 通过 s i g m o i d sigmoid sigmoid激活输出输入门向量 i t i_t it

新状态 c ~ t \tilde{c} _t c~t : - 待添加到传输带向量 c t c_t ct上

图5. 将 h t − 1 h_{t-1} ht−1和 x t x_t xt拼接后, 与新状态可学习矩阵 W c W_c Wc点乘, 通过 t a n h tanh tanh激活输出新状态向量 c ~ t \tilde{c} _t c~t,

图6. 传输带向量 c t c_t ct的更新

现在已经算出了遗忘门 f t f_t ft, 输入门 i t i_t it 和新状态 c ~ t \tilde{c} _t c~t, 可以用这些门作用到旧状态 c t − 1 c_{t-1} ct−1来更新 c t c_t ct, c t = f t ∘ c t − 1 + i t ∘ c ~ t c_t=f_t∘c_{t-1}+i_t∘\tilde{c} _t ct=ft∘ct−1+it∘c~t.

输出门 o t o_t ot(Output Gate): - 决定从传输带向量 c t c_t ct到状态 h t h_t ht的流量

图7. 将 h t − 1 h_{t-1} ht−1和 x t x_t xt拼接后, 与输出门可学习矩阵 W o W_o Wo点乘, 通过tanh激活输出输出门向量 o t o_t ot

求输出 h t h_t ht:

图8. 将传输带向量 c t c_t ct通过 t a n h tanh tanh映射到 ( − 1 , 1 ) (-1,1) (−1,1), 再与输出门向量 o t o_t ot对应相乘, 得到输出向量 h t h_t ht. 一份 h t h_t ht作为当前的输出, 一份 h t h_t ht传入下一步.

LSTM参数量

图9. LSTM: 4个门控单元

LSTM共有4个门控网络参数矩阵: f t , i t , c ~ t , o t f_t, i_t, \tilde{c} _t, o_t ft,it,c~t,ot. 每个参数矩阵的行数都是 s h a p e ( h ) shape(h) shape(h), 列数都是 s h a p e ( h ) + s h a p e ( x ) shape(h)+shape(x) shape(h)+shape(x), 所以LSTM参数总量为: 4 ∗ s h a p e ( h ) ∗ [ s h a p e ( h ) + s h a p e ( x ) ] 4* shape(h)*[ shape(h)+shape(x)] 4∗shape(h)∗[shape(h)+shape(x)].

使用LSTM训练情感分析

图10. 仅输出一个句子的最后状态 h t h_t ht

from keras.models import Sequential # Sequential 为将神经网络的层按顺序搭起来
from keras.layers import LSTM, Dense, Embedding, Flatten
vocabulary = 10000  # all words number
embedding_dim = 32  # shape(x)=32
word_num = 500       # sequence length
state_dim = 32       # shape(h)=32

model = Sequential()
model.add(Embedding(vocabulary, embedding_dim, input_length=word_num)) 
model.add(LSTM(state_dim,return_sequences=False)) #
model.add(Dense(1, activation='sigmoid')) # 仅输入最后一个状态ht, 输出(0,1)
model.summary()

图11. LSTM的参数总量: 8320 = 2080 ∗ 4 8320=2080*4 8320=2080∗4, 其中4为参数矩阵的个数, 2080 = s h a p e ( h ) ∗ [ s h a p e ( h ) + s h a p e ( x ) ] = 32 ∗ ( 32 + 32 ) + 32 2080= shape(h)*[shape(h)+shape(x)]=32*(32+32)+32 2080=shape(h)∗[shape(h)+shape(x)]=32∗(32+32)+32

总结:

LSTM比RNN多了一条传输带 c t c_t ct, 使过去的信息更容易传输到下一时刻, 使记忆更长.
LSTM有4个门控参数矩阵: f t , i t , c ~ t , o t f_t, i_t, \tilde{c} _t, o_t ft,it,c~t,ot.
LSTM的参数总量: 4 ∗ s h a p e ( h ) ∗ [ s h a p e ( h ) + s h a p e ( x ) ] 4* shape(h)*[ shape(h)+shape(x)] 4∗shape(h)∗[shape(h)+shape(x)].

欢迎分享，转载请注明来源：内存溢出

原文地址: https://outofmemory.cn/langs/716715.html

三、RNN模型与 NLP应用 —— LSTM

发表评论

评论列表（0条）

三、RNN模型 与 NLP应用 —— LSTM

发表评论

评论列表（0条）

三、RNN模型与 NLP应用 —— LSTM