A review of Dropout as applied to RNNs part 2.

Dropout Grind by Tate Roskelley
Fig. 1. Fastai implementation of AWD-LSTM. Sensivity of validation loss to variation of each dropout parameter when the other dropout parameters are kept at default values. Final loss after 100 epochs.
Fig. 2. awd-lstm-lm original implementation of AWD-LSTM. Sensivity of validation loss to variation of each dropout parameter when the other dropout parameters are kept at default values. Final loss after 100 epochs.
Fig. 3. Fastai implementation of AWD-LSTM. Sensivity of validation loss to variation of each dropout parameter when the other dropout parameters are kept at zero. Final loss after 100 epochs.
Fig. 4. awd-lstm-lm original implementation of AWD-LSTM. Sensivity of validation loss to variation of each dropout parameter when the other dropout parameters are kept at zero. Final loss after 100 epochs.
Fig 5. awd-lstm-lm original implementation of AWD-LSTM on PTB dataset. Sensivity of validation loss to variation of each dropout parameter when the other dropout parameters are kept at zero.
Fig 6. Fastai implementation of AWD-LSTM on PTB dataset. Same analysis as per above. Note Weight Drop here was set to 0.001 for all tests except wd testing.
Fig 7. awd-lstm-lm original implementation of AWD-LSTM on Wikitext-2 datset. Sensitivity of validation loss to variation of each dropout parameter when the other dropout parameters are kept at zero.
Fig 8. Fastai implementation of AWD-LSTM on Wikitext-2 dataset. Same analysis as per above. Note Weight Drop here was set to 0.001 for all tests except wd testing (The fastai code has since been modified to allow a zero wd to be used).
def embedded_dropout(embed, words, dropout=0.1, scale=None):
if dropout:
mask = embed.weight.data.new().resize_((embed.weight.size(0), 1)).bernoulli_(1 - dropout).expand_as(embed.weight) / (1 - dropout)
mask = Variable(mask)
masked_embed_weight = mask * embed.weight
else:
masked_embed_weight = embed.weight
if scale:
masked_embed_weight = scale.expand_as(masked_embed_weight) * masked_embed_weight

padding_idx = embed.padding_idx
if padding_idx is None:
padding_idx = -1
X = embed._backend.Embedding.apply(words, masked_embed_weight,
padding_idx, embed.max_norm, embed.norm_type,
embed.scale_grad_by_freq, embed.sparse
)
return X
Fig 9. awd-lstm-lm original implementation of AWD-LSTM varying Embedding Dropout (de) parameter on a) PTB dataset and b) Wikitext-2 datset. Other dropout parameters are kept at default values.
Fig 10. awd-lstm-lm original implementation of AWD-LSTM varying Embedding Dropout (de) parameter on a) PTB dataset and b) Wikitext-2 datset. Other dropout parameters are kept at zero.
Fig 11. a) AWD-LSTM (original) varying Embedding Dropout (de) parameter on PTB dataset. Other dropouts kept at default values. b) Fastai implementation of AWD-LSTM varying Embedding Dropout (de) parameter on PTB dataset. Other dropouts kept at default values.
def forward(self, input, hidden, return_h=False):
emb = embedded_dropout(self.encoder, input, dropout=self.dropoute if self.training else 0)
emb = self.lockdrop(emb, self.dropouti)
raw_output = emb
new_hidden = []
raw_outputs = []
outputs = []
for l, rnn in enumerate(self.rnns):
current_input = raw_output
raw_output, new_h = rnn(raw_output, hidden[l])
new_hidden.append(new_h)
raw_outputs.append(raw_output)
if l != self.nlayers - 1:
raw_output = self.lockdrop(raw_output, self.dropouth)
outputs.append(raw_output)
hidden = new_hidden

output = self.lockdrop(raw_output, self.dropout)
outputs.append(output)

result = output.view(output.size(0)*output.size(1), output.size(2))
if return_h:
return result, hidden, raw_outputs, outputs
return result, hidden
Fig 12. awd-lstm-lm original implementation of AWD-LSTM language modelling on the Wikitext-2 dataset, varying a single dropout parameter between 0.1 and 0.9 keeping all other parameters at zero. a) Input Dropout (di), b) Hidden Dropout (dh), and c) Dropout (output dropout) (d).
Fig 13. awd-lstm-lm original implementation of AWD-LSTM language modelling on the Wikitext-2 dataset, varying a single dropout parameter between 0.1 and 0.9 keeping all other parameters at default values. a) Input Dropout (di), b) Hidden Dropout (dh), and c) Dropout (output dropout) (d).
Fig 14. a) AWD-LSTM (original) varying Input Dropout (di) parameter on PTB dataset. Other dropouts kept at default values. b) Fastai implementation of AWD-LSTM varying Input Dropout (di) parameter on PTB dataset. Other dropouts kept at default values.
Fig 15. a) AWD-LSTM (original) varying Hidden Dropout (dh) parameter on PTB dataset. Other dropouts kept at default values. b) Fastai implementation of AWD-LSTM varying Hidden Dropout (dh) parameter on PTB dataset. Other dropouts kept at default values.
Fig 16. a) AWD-LSTM (original) varying Output Dropout (d) parameter on PTB dataset. Other dropouts kept at default values. b) Fastai implementation of AWD-LSTM varying Output Dropout (d) parameter on PTB dataset. Other dropouts kept at default values.
self.rnns = [WeightDrop(rnn, ['weight_hh_l0'], dropout=wdrop) for  rnn in self.rnns]
Fig 17. a) AWD-LSTM (original) varying the Weight Drop (wd) parameter on PTB dataset. Other dropouts kept at default values. b) Fastai implementation of AWD-LSTM varying the Weight Drop (wd) parameter on PTB dataset. Other dropouts kept at default values.
Fig 18. awd-lstm-lm original implementation of AWD-LSTM language modelling varying the Weight Drop (wd) parameter on a) Wikitext-2 dataset and b) PTB dataset. All other dropout parameters are set to zero.
Fig 19. awd-lstm-lm original implementation of AWD-LSTM language modelling varying the Weight Drop (wd) parameter on a) Wikitext-2 dataset and b) PTB dataset. All other dropout parameters are set to default values.
Fig. 20. Fastai implementation of AWD-LSTM and variation of weight drop with other dropout parameters set to zero. a) Wikitext-2 language model (note y scale difference here to other plots). b) PTB language model.
Fig 21. awd-lstm-lm original implementation of AWD-LSTM on PTB dataset. a) Varying hidden dropout (dh), b) embedded dropout (de). Other dropout parameters are kept at zero.
Fig 21. cont. AWD-LSTM on PTB dataset. c) varying weight drop (wd), d) output dropout (d). Other dropout parameters are kept at zero.
Fig 21. cont. AWD-LSTM on PTB dataset. e) varying input drop (di). Other dropout parameters are kept at zero.
Fig 22. Fastai implentation of AWD-LSTM on PTB dataset a) varying embedding dropout (de), b) varying weight drop (wd), c) varying outpout dropout (d). Other dropout parameters are kept at zero. Tests on other parameters with and without default values is in progress.
Fig 22. cont. a) varying hidden dropout (dh), b) varying input drop (di). Other dropout parameters are kept at zero.
self.rnn_enc = nn.GRU(em_sz_enc, nh, num_layers=nl, dropout=rnn_enc_drop)
self.rnn_dec = nn.GRU(em_sz_dec, em_sz_dec, num_layers=nl, dropout=rnn_dec_drop)
self.emb_enc_drop = nn.Dropout(emb_enc_drop)
...
def forward(...)
...
emb = self.emb_enc_drop(self.emb_enc(inp))
self.out_drop = nn.Dropout(out_drop)
...
def forward(...)
...
for i in range(self.out_sl):
...
outp = self.out(self.out_drop(outp[0]))
Fig 23. Translation model using Sequence to Sequence RNN, impact of dropout parameters, spread of values for each parameter after 12 epochs.
Fig 24. Translation model using Sequence to Sequence Attention RNN, impact of dropout parameters, spread of values for each parameter after 12 epochs.
Fig 25. Translation model using Sequence to Sequence Attention RNN. Individual dropout parameters a) red, b) rdd effects on loss with other dropout parameters set to zero.
Fig 26. Translation model using Sequence to Sequence Attention RNN. Individual dropout parameters a) od, b) eed effect on loss with other dropout parameters set to zero.
Fig 27. Derivative of loss vs epoch for different dropout types.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adrian G

Adrian G

Geophysicist and Deep Learning Practitioner