transformer_implementation.blocks.layers package¶
Submodules¶
transformer_implementation.blocks.layers.FeedForward module¶
- class transformer_implementation.blocks.layers.FeedForward.FeedForward(config)¶
Bases:
Module
A position-wise Feed Forward Neural Network (FFNN) class for transformer models.
The class implementing a position-wise FFNN. The FFNN consists of two linear transformations with a GELU activation in between, followed by a dropout for regularization.
Attributes¶
- c_fctorch.nn.Linear
The first fully connected layer of the feed-forward network. It takes as input a tensor with n_embd features and returns a tensor with 4 * n_embd features.
- gelutorch.nn.GELU
The Gaussian Error Linear Unit activation function.
- c_projtorch.nn.Linear
The second fully connected layer of the feed-forward network. It takes as input a tensor with 4 * n_embd features and returns a tensor with n_embd features.
- dropouttorch.nn.Dropout
The dropout layer for regularization. The dropout rate is specified in the configuration.
Methods¶
- forward(x: torch.Tensor) -> torch.Tensor:
Computes the forward pass of the network.
Parameters¶
- configobject
- A configuration object with the following attribute:
n_embd (int): The size of the input and output feature vectors. bias (bool): If True, the linear layers will include a bias term. dropout (float): The dropout rate to use for regularization.
transformer_implementation.blocks.layers.LayerNorm module¶
- class transformer_implementation.blocks.layers.LayerNorm.LayerNorm(ndim: int, bias: bool)¶
Bases:
Module
A Layer Normalization module with optional bias.
This implementation of Layer Normalization allows turning off the bias term, which is not directly supported by PyTorch’s layer normalization function.
Attributes¶
- weighttorch.nn.Parameter
A learnable scale factor initialized to one. This has the same shape as the input feature dimension.
- biastorch.nn.Parameter
A learnable bias term initialized to zero if bias is True, else None. This has the same shape as the input feature dimension.
Methods¶
- forward(input: torch.Tensor) -> torch.Tensor:
Applies layer normalization to the input tensor.
Parameters¶
- ndimint
The feature dimension size of the input tensor.
- biasbool
If True, adds a learnable bias to the output.
transformer_implementation.blocks.layers.MultiHeadAttention module¶
- class transformer_implementation.blocks.layers.MultiHeadAttention.MultiHeadAttention(config)¶
Bases:
Module
Implements a multi-head attention module in PyTorch.
This class is a child of the PyTorch nn.Module class. It uses scaled dot product attention mechanism and includes dropout for regularization.
Attributes¶
- n_headint
The number of attention heads.
- n_embdint
The size of the input and output feature vectors.
- dropoutfloat
The dropout rate to use for regularization.
- block_sizeint
The size of the block to use for the attention mask.
- q_attntorch.nn.Linear
The query projection layer.
- k_attntorch.nn.Linear
The key projection layer.
- v_attntorch.nn.Linear
The value projection layer.
- c_projtorch.nn.Linear
The output projection layer.
- attn_dropouttorch.nn.Dropout
The dropout layer for the attention mechanism.
- resid_dropouttorch.nn.Dropout
The dropout layer for the output.
- biastorch.Tensor
The attention mask to ensure causal attention.
Methods¶
- scaled_dot_product_attention(q, k, v, mask: bool = None):
Computes the scaled dot product attention.
- forward(q_x, k_x, v_x, mask = None, is_masked = False):
Computes the forward pass of the multi-head attention.
Parameters¶
- configobject
- A configuration object with the following attributes:
n_head (int): The number of attention heads. n_embd (int): The size of the input and output feature vectors. bias (bool): If True, the linear layers will include a bias term. dropout (float): The dropout rate to use for regularization. block_size (int): The size of the block to use for the attention mask.
- forward(q_x, k_x, v_x, mask=None, is_masked=False)¶
Implements the forward pass of the multi-head attention.
Parameters¶
- q_xtorch.Tensor
The input query tensor.
- k_xtorch.Tensor
The input key tensor.
- v_xtorch.Tensor
The input value tensor.
- maskbool, optional
The attention mask. If None, no mask is applied. Default is None.
- is_maskedbool, optional
Define if this MHA is a Masked MHA. Do we have to add or not a triangular mask ?
Returns¶
- tuple
The output tensor and the attention weights.
- scaled_dot_product_attention(q, k, v, mask: bool = None)¶
Computes the scaled dot product attention.
Parameters¶
- qtorch.Tensor
The query tensor.
- ktorch.Tensor
The key tensor.
- vtorch.Tensor
The value tensor.
- maskbool, optional
The attention mask. If None, no mask is applied. Default is None.
Returns¶
- tuple
The output tensor and the attention weights.