transformer_implementation.blocks.layers package¶

Submodules¶

transformer_implementation.blocks.layers.FeedForward module¶

class transformer_implementation.blocks.layers.FeedForward.FeedForward(config)¶

Bases: Module

A position-wise Feed Forward Neural Network (FFNN) class for transformer models.

The class implementing a position-wise FFNN. The FFNN consists of two linear transformations with a GELU activation in between, followed by a dropout for regularization.

Attributes¶

c_fctorch.nn.Linear: The first fully connected layer of the feed-forward network. It takes as input a tensor with n_embd features and returns a tensor with 4 * n_embd features.
gelutorch.nn.GELU: The Gaussian Error Linear Unit activation function.
c_projtorch.nn.Linear: The second fully connected layer of the feed-forward network. It takes as input a tensor with 4 * n_embd features and returns a tensor with n_embd features.
dropouttorch.nn.Dropout: The dropout layer for regularization. The dropout rate is specified in the configuration.

Methods¶

forward(x: torch.Tensor) -> torch.Tensor:: Computes the forward pass of the network.

Parameters¶

configobject

A configuration object with the following attribute:: n_embd (int): The size of the input and output feature vectors. bias (bool): If True, the linear layers will include a bias term. dropout (float): The dropout rate to use for regularization.

forward(x) → Tensor¶

Implements the forward pass of the feed-forward network.

Parameters¶

xtorch.Tensor: The input tensor with a size of n_embd.

Returns¶

torch.Tensor: The output tensor, post-processed by the feed-forward network.

transformer_implementation.blocks.layers.LayerNorm module¶

class transformer_implementation.blocks.layers.LayerNorm.LayerNorm(ndim: int, bias: bool)¶

Bases: Module

A Layer Normalization module with optional bias.

This implementation of Layer Normalization allows turning off the bias term, which is not directly supported by PyTorch’s layer normalization function.

Attributes¶

weighttorch.nn.Parameter: A learnable scale factor initialized to one. This has the same shape as the input feature dimension.
biastorch.nn.Parameter: A learnable bias term initialized to zero if bias is True, else None. This has the same shape as the input feature dimension.

Methods¶

forward(input: torch.Tensor) -> torch.Tensor:: Applies layer normalization to the input tensor.

Parameters¶

ndimint: The feature dimension size of the input tensor.
biasbool: If True, adds a learnable bias to the output.

forward(input)¶

Implements the forward pass of the LayerNorm module.

Parameters¶

inputtorch.Tensor: The input tensor that will be normalized.

Returns¶

torch.Tensor: The normalized output tensor.

transformer_implementation.blocks.layers.MultiHeadAttention module¶

class transformer_implementation.blocks.layers.MultiHeadAttention.MultiHeadAttention(config)¶

Bases: Module

Implements a multi-head attention module in PyTorch.

This class is a child of the PyTorch nn.Module class. It uses scaled dot product attention mechanism and includes dropout for regularization.

Attributes¶

n_headint: The number of attention heads.
n_embdint: The size of the input and output feature vectors.
dropoutfloat: The dropout rate to use for regularization.
block_sizeint: The size of the block to use for the attention mask.
q_attntorch.nn.Linear: The query projection layer.
k_attntorch.nn.Linear: The key projection layer.
v_attntorch.nn.Linear: The value projection layer.
c_projtorch.nn.Linear: The output projection layer.
attn_dropouttorch.nn.Dropout: The dropout layer for the attention mechanism.
resid_dropouttorch.nn.Dropout: The dropout layer for the output.
biastorch.Tensor: The attention mask to ensure causal attention.

Methods¶

scaled_dot_product_attention(q, k, v, mask: bool = None):: Computes the scaled dot product attention.
forward(q_x, k_x, v_x, mask = None, is_masked = False):: Computes the forward pass of the multi-head attention.

Parameters¶

configobject

A configuration object with the following attributes:: n_head (int): The number of attention heads. n_embd (int): The size of the input and output feature vectors. bias (bool): If True, the linear layers will include a bias term. dropout (float): The dropout rate to use for regularization. block_size (int): The size of the block to use for the attention mask.

forward(q_x, k_x, v_x, mask=None, is_masked=False)¶

Implements the forward pass of the multi-head attention.

Parameters¶

q_xtorch.Tensor: The input query tensor.
k_xtorch.Tensor: The input key tensor.
v_xtorch.Tensor: The input value tensor.
maskbool, optional: The attention mask. If None, no mask is applied. Default is None.
is_maskedbool, optional: Define if this MHA is a Masked MHA. Do we have to add or not a triangular mask ?

Returns¶

tuple: The output tensor and the attention weights.

scaled_dot_product_attention(q, k, v, mask: bool = None)¶

Computes the scaled dot product attention.

Parameters¶

qtorch.Tensor: The query tensor.
ktorch.Tensor: The key tensor.
vtorch.Tensor: The value tensor.
maskbool, optional: The attention mask. If None, no mask is applied. Default is None.

Returns¶

tuple: The output tensor and the attention weights.

transformer_implementation.blocks.layers package¶

Submodules¶

transformer_implementation.blocks.layers.FeedForward module¶

Attributes¶

Methods¶

Parameters¶

Parameters¶

Returns¶

transformer_implementation.blocks.layers.LayerNorm module¶

Attributes¶

Methods¶

Parameters¶

Parameters¶

Returns¶

transformer_implementation.blocks.layers.MultiHeadAttention module¶

Attributes¶

Methods¶

Parameters¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Module contents¶

Transformer

Navigation

Related Topics