Seq2Seq, Seq2Point modeling implementations using 1D convolution, LSTM, Attention mechanisms, Transformer, and Temporal Fusion Transformer(TFT).
The repo implements the following:
- Basic convolution and LSTM layers implementation
- Bahdanau attention LSTM Encoder-Decoder network by Bahdanau et al.(2014)
- Vanilla Transformer by Vaswani et al.(2017). See
architectures.transformer.Transformer. - Temporal Fusion Transformer by Lim et al.(2020). See
architectures.tft.TemporalFusionTransformer.
Transformer-based classes always produce sequence-to-sequence outputs.
RNN-based classes can selectively produce sequence or point outputs:
- Difference between
rnn_seq2seqandrnn_seq2pointis the decoder part. The former uses autoregressive LSTM decoder to generate sequence of vectors, while the latter uses MLP decoder to generate a single vector.
All network parameters are initialized from torch.zeros. See architectures.init.
See Tutorial.ipynb for details.
Supports
hidden_sizeHidden state size of LSTM encoder.num_layersNumber of stacks in CNN, LSTM encoder, LSTM decoder, and FC layers.bidirectionalWhether to use bidirectional LSTM encoder.dropoutDropout rate. Applies to:
Residual drop path in 1DCNN
hidden state dropout in LSTM encoder/decoder(for every time step).
Unliketorch.nn.LSTM, dropout is applied from the first LSTM layer.layernormLayer normalization in LSTM encoder and decoder.attentionAttention in LSTM decoder.
Supports'bahdanau'for Bahdanau style,'dotproduct'for Dot Product style, and'nonefor non-attended decoder.
from architectures.rnn_seq2seq import *
from architectures.rnn_seq2point import *
from architectures.transformer import *
from architectures.tft import *
# LSTM encoder - LSTM decoder - MLP
seq2seq_lstm = LSTMSeq2Seq(
Cin, Cout, hidden_size, num_layers, bidirectional, dropout, layernorm, attention
)
# 1DCNN+LSTM encoder - LSTM decoder
seq2seq_cnnlstm = CNNLSTMSeq2Seq(
Cin, Cout, hidden_size, num_layers, bidirectional, dropout, layernorm, attention
)
# LSTM encoder - MLP
seq2point_lstm = LSTMSeq2Point(
Cin, Cout, hidden_size, num_layers, bidirectional, dropout, layernorm
)
# 1DCNN+LSTM encoder - MLP
seq2point_cnnlstm = CNNLSTMSeq2Point(
Cin, Cout, hidden_size, num_layers, bidirectional, dropout, layernorm
)
# Transformer
seq2seq_transformer = Transformer(
Cin, Cout, num_layers, n_heads, d_model, dropout, d_ff
)
# Temporal Fusion Transformer
seq2seq_tft = TemporalFusionTransformer(
Cin, Cout, num_layers, n_heads, d_model, dropout
)
-
xInput to the network. Supports$(B,L_{in},C_{in})$ only. -
yOutput label for teacher forcing. Supports$(B,*,C_{out})$ only. Defaults toNone(fully autoregressive). -
teacher_forcingTeacher forcing ratio$\in [0,1]$ . Defaults to-1(fully autoregressive). -
trg_lenTarget sequence length to generate. Defaults to1.
If only x and trg_len is given as arguments, the model will autoregressively produce trg_len length of outputs.
By inheriting architectures.skeleton.Skeleton, model properties are automatically saved to attributes:
- Parameters can be counted by
model.count_params() - Properties are accessed using
model.model_infoattribute. - The identical model instance can be created by
ModelClass(**model.model_init_args).
seq2seq_lstm.count_params()
model_info = seq2seq_lstm.model_info
model_init_args = seq2seq_lstm.model_init_args
print(model_info)
another_model_instance = LSTMSeq2Seq(**model_init_args)
Number of trainable parameters: 10422835
{'attention': 'bahdanau', 'bidirectional': True, 'dropout': 0.3, 'hidden_size': 256, 'input_size': 6, 'layernorm': True, 'num_layers': 3, 'output_size': 50}