tft_torch.tft module

This module contains the primary model implemented in this project.

class tft_torch.tft.TemporalFusionTransformer(config: omegaconf.dictconfig.DictConfig)

This class implements the Temporal Fusion Transformer model described in the paper Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting.

Parameters

config (DictConfig) – A mapping describing both the expected structure of the input of the model, and the architectural specification of the model. This mapping should include a key named data_props in which the dimensions and cardinalities (where the inputs are categorical) are specified. Moreover, the configuration mapping should contain a key named model, specifying attention_heads , dropout , lstm_layers , output_quantiles and state_size , which are required for creating the model.

class tft_torch.tft.InputChannelEmbedding(state_size: int, num_numeric: int, num_categorical: int, categorical_cardinalities: List[int], time_distribute: Optional[bool] = False)

A module to handle the transformation/embedding of an input channel composed of numeric tensors and categorical tensors. It holds a NumericInputTransformation module for handling the embedding of the numeric inputs, and a CategoricalInputTransformation module for handling the embedding of the categorical inputs.

Parameters
  • state_size (int) – The state size of the model, which determines the embedding dimension/width of each input variable.

  • num_numeric (int) – The quantity of numeric input variables associated with the input channel.

  • num_categorical (int) – The quantity of categorical input variables associated with the input channel.

  • categorical_cardinalities (List[int]) – The quantity of categories associated with each of the categorical input variables.

  • time_distribute (Optional[bool]) – A boolean indicating whether to wrap the composing transformations using the TimeDistributed module.

class tft_torch.tft.NumericInputTransformation(num_inputs: int, state_size: int)

A module for transforming/embeddings the set of numeric input variables from a single input channel. Each input variable will be projected using a dedicated linear layer to a vector with width state_size. The result of applying this module is a list, with length num_inputs, that contains the embedding of each input variable for all the observations and time steps.

Parameters
  • num_inputs (int) – The quantity of numeric input variables associated with this module.

  • state_size (int) – The state size of the model, which determines the embedding dimension/width.

class tft_torch.tft.CategoricalInputTransformation(num_inputs: int, state_size: int, cardinalities: List[int])

A module for transforming/embeddings the set of categorical input variables from a single input channel. Each input variable will be projected using a dedicated embedding layer to a vector with width state_size. The result of applying this module is a list, with length num_inputs, that contains the embedding of each input variable for all the observations and time steps.

Parameters
  • num_inputs (int) – The quantity of categorical input variables associated with this module.

  • state_size (int) – The state size of the model, which determines the embedding dimension/width.

  • cardinalities (List[int]) – The quantity of categories associated with each of the input variables.

class tft_torch.tft.VariableSelectionNetwork(input_dim: int, num_inputs: int, hidden_dim: int, dropout: float, context_dim: Optional[int] = None, batch_first: Optional[bool] = True)

This module is designed to handle the fact that the relevant and specific contribution of each input variable to the output is typically unknown. This module enables instance-wise variable selection, and is applied to both the static covariates and time-dependent covariates.

Beyond providing insights into which variables are the most significant oones for the prediction problem, variable selection also allows the model to remove any unnecessary noisy inputs which could negatively impact performance.

Parameters
  • input_dim (int) – The attribute/embedding dimension of the input, associated with the state_size of th model.

  • num_inputs (int) – The quantity of input variables, including both numeric and categorical inputs for the relevant channel.

  • hidden_dim (int) – The embedding width of the output.

  • dropout (float) – The dropout rate associated with GatedResidualNetwork objects composing this object.

  • context_dim (Optional[int]) – The embedding width of the context signal expected to be fed as an auxiliary input to this component.

  • batch_first (Optional[bool]) – A boolean indicating whether the batch dimension is expected to be the first dimension of the input or not.

class tft_torch.tft.GatedLinearUnit(input_dim: int)

This module is also known as GLU - Formulated in: Dauphin, Yann N., et al. “Language modeling with gated convolutional networks.” International conference on machine learning. PMLR, 2017.

The output of the layer is a linear projection (X * W + b) modulated by the gates sigmoid (X * V + c). These gates multiply each element of the matrix X * W + b and control the information passed on in the hierarchy. This unit is a simplified gating mechanism for non-deterministic gates that reduce the vanishing gradient problem, by having linear units coupled to the gates. This retains the non-linear capabilities of the layer while allowing the gradient to propagate through the linear unit without scaling.

Parameters

input_dim (int) – The embedding size of the input.

class tft_torch.tft.GatedResidualNetwork(input_dim: int, hidden_dim: int, output_dim: int, dropout: Optional[float] = 0.05, context_dim: Optional[int] = None, batch_first: Optional[bool] = True)

This module, known as GRN, takes in a primary input (x) and an optional context vector (c). It uses a GatedLinearUnit for controlling the extent to which the module will contribute to the original input (x), potentially skipping over the layer entirely as the GLU outputs could be all close to zero, by that suppressing the non-linear contribution. In cases where no context vector is used, the GRN simply treats the context input as zero. During training, dropout is applied before the gating layer.

Parameters
  • input_dim (int) – The embedding width/dimension of the input.

  • hidden_dim (int) – The intermediate embedding width.

  • output_dim (int) – The embedding width of the output tensors.

  • dropout (Optional[float]) – The dropout rate associated with the component.

  • context_dim (Optional[int]) – The embedding width of the context signal expected to be fed as an auxiliary input to this component.

  • batch_first (Optional[bool]) – A boolean indicating whether the batch dimension is expected to be the first dimension of the input or not.

class tft_torch.tft.GateAddNorm(input_dim: int, dropout: Optional[float] = None)

This module encapsulates an operation performed multiple times across the TemporalFusionTransformer model. The composite operation includes: a. A Dropout layer. b. Gating using a GatedLinearUnit. c. A residual connection to an “earlier” signal from the forward pass of the parent model. d. Layer normalization.

Parameters
  • input_dim (int) – The dimension associated with the expected input of this module.

  • dropout (Optional[float]) – The dropout rate associated with the component.

class tft_torch.tft.InterpretableMultiHeadAttention(embed_dim: int, num_heads: int)

The mechanism implemented in this module is used to learn long-term relationships across different time-steps. It is a modified version of multi-head attention, for enhancing explainability. On this modification, as opposed to traditional versions of multi-head attention, the “values” signal is shared for all the heads - and additive aggregation is employed across all the heads. According to the paper, each head can learn different temporal patterns, while attending to a common set of input features which can be interpreted as a simple ensemble over attention weights into a combined matrix, which, compared to the original multi-head attention matrix, yields an increased representation capacity in an efficient way.

Parameters
  • embed_dim (int) – The dimensions associated with the state_size of th model, corresponding to the input as well as the output.

  • num_heads (int) – The number of attention heads composing the Multi-head attention component.