Skip to content

Activations API

Activation functions introduce non-linearity into the network, allowing it to learn complex patterns.

Base Activation

mpneuralnetwork.activations.Activation

Bases: Layer

Base class for activation functions.

Activations are treated as layers in this framework. They apply a non-linear transformation element-wise to the input.

Attributes:

Name Type Description
activation Callable

The function to apply during the forward pass.

activation_prime Callable

The derivative of the function for the backward pass.

Source code in src/mpneuralnetwork/activations.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
class Activation(Layer):
    """Base class for activation functions.

    Activations are treated as layers in this framework. They apply a non-linear
    transformation element-wise to the input.

    Attributes:
        activation (Callable): The function to apply during the forward pass.
        activation_prime (Callable): The derivative of the function for the backward pass.
    """

    def __init__(self, activation: T, activation_prime: T) -> None:
        """Initializes the activation layer.

        Args:
            activation (Callable[[ArrayType], ArrayType]): The activation function.
            activation_prime (Callable[[ArrayType], ArrayType]): The derivative of the activation function.
        """
        self.activation: T = activation
        self.activation_prime: T = activation_prime

    def forward(self, input_batch: ArrayType, training: bool = True) -> ArrayType:
        """Applies the activation function to the input.

        Args:
            input_batch (ArrayType): Input data of any shape.
            training (bool, optional): Whether in training mode. Defaults to True.

        Returns:
            ArrayType: Activated output with the same shape as `input_batch`.
        """
        self.input = input_batch
        return self.activation(self.input)

    def backward(self, output_gradient_batch: ArrayType) -> ArrayType:
        """Computes the gradient of the activation function.

        Applies the chain rule: `grad = output_gradient * activation'(input)`.

        Args:
            output_gradient_batch (ArrayType): Gradient flowing from the next layer.

        Returns:
            ArrayType: Gradient with respect to the input.
        """
        res: ArrayType = xp.multiply(output_gradient_batch, self.activation_prime(self.input))
        return res

    @property
    def params(self) -> dict[str, tuple[ArrayType, ArrayType]]:
        """Activations usually have no trainable parameters.

        Returns:
            dict: Empty dictionary.
        """
        return {}

params property

Activations usually have no trainable parameters.

Returns:

Name Type Description
dict dict[str, tuple[ArrayType, ArrayType]]

Empty dictionary.

__init__(activation, activation_prime)

Initializes the activation layer.

Parameters:

Name Type Description Default
activation Callable[[ArrayType], ArrayType]

The activation function.

required
activation_prime Callable[[ArrayType], ArrayType]

The derivative of the activation function.

required
Source code in src/mpneuralnetwork/activations.py
20
21
22
23
24
25
26
27
28
def __init__(self, activation: T, activation_prime: T) -> None:
    """Initializes the activation layer.

    Args:
        activation (Callable[[ArrayType], ArrayType]): The activation function.
        activation_prime (Callable[[ArrayType], ArrayType]): The derivative of the activation function.
    """
    self.activation: T = activation
    self.activation_prime: T = activation_prime

backward(output_gradient_batch)

Computes the gradient of the activation function.

Applies the chain rule: grad = output_gradient * activation'(input).

Parameters:

Name Type Description Default
output_gradient_batch ArrayType

Gradient flowing from the next layer.

required

Returns:

Name Type Description
ArrayType ArrayType

Gradient with respect to the input.

Source code in src/mpneuralnetwork/activations.py
43
44
45
46
47
48
49
50
51
52
53
54
55
def backward(self, output_gradient_batch: ArrayType) -> ArrayType:
    """Computes the gradient of the activation function.

    Applies the chain rule: `grad = output_gradient * activation'(input)`.

    Args:
        output_gradient_batch (ArrayType): Gradient flowing from the next layer.

    Returns:
        ArrayType: Gradient with respect to the input.
    """
    res: ArrayType = xp.multiply(output_gradient_batch, self.activation_prime(self.input))
    return res

forward(input_batch, training=True)

Applies the activation function to the input.

Parameters:

Name Type Description Default
input_batch ArrayType

Input data of any shape.

required
training bool

Whether in training mode. Defaults to True.

True

Returns:

Name Type Description
ArrayType ArrayType

Activated output with the same shape as input_batch.

Source code in src/mpneuralnetwork/activations.py
30
31
32
33
34
35
36
37
38
39
40
41
def forward(self, input_batch: ArrayType, training: bool = True) -> ArrayType:
    """Applies the activation function to the input.

    Args:
        input_batch (ArrayType): Input data of any shape.
        training (bool, optional): Whether in training mode. Defaults to True.

    Returns:
        ArrayType: Activated output with the same shape as `input_batch`.
    """
    self.input = input_batch
    return self.activation(self.input)

Hidden Layers

These activations are typically used in intermediate layers.

mpneuralnetwork.activations.ReLU

Bases: Activation

Rectified Linear Unit activation function.

Formula

f(x) = max(0, x)

Range: [0, inf). Computationally efficient and mitigates the vanishing gradient problem. Most common activation for hidden layers in deep networks.

Source code in src/mpneuralnetwork/activations.py
102
103
104
105
106
107
108
109
110
111
112
113
114
class ReLU(Activation):
    """Rectified Linear Unit activation function.

    Formula:
        `f(x) = max(0, x)`

    Range: [0, inf).
    Computationally efficient and mitigates the vanishing gradient problem.
    Most common activation for hidden layers in deep networks.
    """

    def __init__(self) -> None:
        super().__init__(lambda x: xp.maximum(0, x, dtype=DTYPE), lambda x: x > 0)

mpneuralnetwork.activations.PReLU

Bases: Activation

Parametric Rectified Linear Unit.

Formula

f(x) = x if x > 0 f(x) = alpha * x if x <= 0

Where alpha is a learnable parameter updated during training. Allows the network to learn the negative slope, avoiding "dying ReLU" problems.

Parameters:

Name Type Description Default
alpha float

Initial value for the negative slope. Defaults to 0.01.

0.01
Source code in src/mpneuralnetwork/activations.py
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
class PReLU(Activation):
    """Parametric Rectified Linear Unit.

    Formula:
        `f(x) = x` if `x > 0`
        `f(x) = alpha * x` if `x <= 0`

    Where `alpha` is a learnable parameter updated during training.
    Allows the network to learn the negative slope, avoiding "dying ReLU" problems.

    Args:
        alpha (float, optional): Initial value for the negative slope. Defaults to 0.01.
    """

    def __init__(self, alpha: float = 0.01) -> None:
        super().__init__(
            lambda x: xp.maximum(alpha * x, x, dtype=DTYPE),
            lambda x: xp.where(x < 0, alpha, 1),
        )
        self.alpha: float = alpha

    def get_config(self) -> dict:
        config = super().get_config()
        config.update({"alpha": self.alpha})
        return config

mpneuralnetwork.activations.Swish

Bases: Activation

Swish activation function.

Formula

f(x) = x * sigmoid(x)

Range: (~-0.28, inf). Proposed by Google. A smooth, non-monotonic function that often outperforms ReLU on deep networks.

Source code in src/mpneuralnetwork/activations.py
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
class Swish(Activation):
    """Swish activation function.

    Formula:
        `f(x) = x * sigmoid(x)`

    Range: (~-0.28, inf).
    Proposed by Google. A smooth, non-monotonic function that often outperforms ReLU
    on deep networks.
    """

    def __init__(self) -> None:
        super().__init__(
            lambda x: x / (1 + xp.exp(-x, dtype=DTYPE)),
            lambda x: (1 + xp.exp(-x, dtype=DTYPE) + x * xp.exp(-x, dtype=DTYPE)) / (1 + xp.exp(-x, dtype=DTYPE)) ** 2,
        )

mpneuralnetwork.activations.Tanh

Bases: Activation

Hyperbolic Tangent activation function.

Formula

f(x) = tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

Range: (-1, 1). Zero-centered, making it often preferable to Sigmoid for hidden layers.

Source code in src/mpneuralnetwork/activations.py
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
class Tanh(Activation):
    """Hyperbolic Tangent activation function.

    Formula:
        `f(x) = tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))`

    Range: (-1, 1).
    Zero-centered, making it often preferable to Sigmoid for hidden layers.
    """

    def __init__(self) -> None:
        super().__init__(
            lambda x: xp.tanh(x, dtype=DTYPE),
            lambda x: (1 - xp.tanh(x, dtype=DTYPE) ** 2),
        )

mpneuralnetwork.activations.Sigmoid

Bases: Activation

Sigmoid activation function.

Formula

f(x) = 1 / (1 + exp(-x))

Range: (0, 1). Used for binary classification (output layer) or gating mechanisms (like in LSTMs). Can suffer from vanishing gradients in deep networks.

Source code in src/mpneuralnetwork/activations.py
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
class Sigmoid(Activation):
    """Sigmoid activation function.

    Formula:
        `f(x) = 1 / (1 + exp(-x))`

    Range: (0, 1).
    Used for binary classification (output layer) or gating mechanisms (like in LSTMs).
    Can suffer from vanishing gradients in deep networks.
    """

    def __init__(self) -> None:
        def sigmoid(x: ArrayType) -> ArrayType:
            return 1 / (1 + xp.exp(-x, dtype=DTYPE))  # type: ignore[no-any-return]

        super().__init__(lambda x: sigmoid(x), lambda x: sigmoid(x) * (1 - sigmoid(x)))

Output Layers

These activations are typically used in the final layer to produce probability distributions.

mpneuralnetwork.activations.Softmax

Bases: Layer

Softmax activation function.

Formula

f(x)_i = exp(x_i / T) / sum(exp(x_j / T))

Typically used in the output layer for multi-class classification. Converts a vector of K real numbers into a probability distribution of K possible outcomes. The temperature parameter T is used to scale the logits before computing the softmax.

Source code in src/mpneuralnetwork/activations.py
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
class Softmax(Layer):
    """Softmax activation function.

    Formula:
        `f(x)_i = exp(x_i / T) / sum(exp(x_j / T))`

    Typically used in the output layer for multi-class classification.
    Converts a vector of K real numbers into a probability distribution of K possible outcomes.
    The temperature parameter T is used to scale the logits before computing the softmax.
    """

    def __init__(self, temperature: float = 1.0, epsilon: float = 1e-8) -> None:
        """Initializes the Softmax layer.

        Args:
            temperature (float, optional): Temperature parameter. Defaults to 1.0.
            epsilon (float, optional): Small float added to denominator to avoid dividing by zero. Defaults to 1e-8.
        """
        self.temperature: float = temperature
        self.epsilon: float = epsilon

    def forward(self, input_batch: ArrayType, training: bool = True) -> ArrayType:
        """Applies Softmax function.

        Args:
            input_batch (ArrayType): Input logits of shape (batch_size, num_classes).
            training (bool, optional): Unused. Defaults to True.

        Returns:
            ArrayType: Probabilities of shape (batch_size, num_classes).
        """
        scaled_logits = input_batch / (self.temperature + self.epsilon)

        m = xp.max(scaled_logits, axis=1, keepdims=True)
        e = xp.exp(scaled_logits - m, dtype=DTYPE)

        self.output = e / xp.sum(e, axis=1, keepdims=True, dtype=DTYPE)
        return self.output

    def backward(self, output_gradient_batch: ArrayType) -> ArrayType:
        """Computes gradient for Softmax.

        Note: This is rarely used directly if using `CategoricalCrossEntropy` loss,
        as the framework optimizes the combined gradient calculation for numerical stability.

        Args:
            output_gradient_batch (ArrayType): Gradient from next layer.

        Returns:
            ArrayType: Gradient w.r.t input.
        """
        sum_s_times_g: ArrayType = xp.sum(self.output * output_gradient_batch, axis=1, keepdims=True, dtype=DTYPE)  # type: ignore[assignment]

        res: ArrayType = (self.output * (output_gradient_batch - sum_s_times_g)) / (self.temperature + self.epsilon)
        return res

    @property
    def params(self) -> dict[str, tuple[ArrayType, ArrayType]]:
        return {}

__init__(temperature=1.0, epsilon=1e-08)

Initializes the Softmax layer.

Parameters:

Name Type Description Default
temperature float

Temperature parameter. Defaults to 1.0.

1.0
epsilon float

Small float added to denominator to avoid dividing by zero. Defaults to 1e-8.

1e-08
Source code in src/mpneuralnetwork/activations.py
173
174
175
176
177
178
179
180
181
def __init__(self, temperature: float = 1.0, epsilon: float = 1e-8) -> None:
    """Initializes the Softmax layer.

    Args:
        temperature (float, optional): Temperature parameter. Defaults to 1.0.
        epsilon (float, optional): Small float added to denominator to avoid dividing by zero. Defaults to 1e-8.
    """
    self.temperature: float = temperature
    self.epsilon: float = epsilon

backward(output_gradient_batch)

Computes gradient for Softmax.

Note: This is rarely used directly if using CategoricalCrossEntropy loss, as the framework optimizes the combined gradient calculation for numerical stability.

Parameters:

Name Type Description Default
output_gradient_batch ArrayType

Gradient from next layer.

required

Returns:

Name Type Description
ArrayType ArrayType

Gradient w.r.t input.

Source code in src/mpneuralnetwork/activations.py
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
def backward(self, output_gradient_batch: ArrayType) -> ArrayType:
    """Computes gradient for Softmax.

    Note: This is rarely used directly if using `CategoricalCrossEntropy` loss,
    as the framework optimizes the combined gradient calculation for numerical stability.

    Args:
        output_gradient_batch (ArrayType): Gradient from next layer.

    Returns:
        ArrayType: Gradient w.r.t input.
    """
    sum_s_times_g: ArrayType = xp.sum(self.output * output_gradient_batch, axis=1, keepdims=True, dtype=DTYPE)  # type: ignore[assignment]

    res: ArrayType = (self.output * (output_gradient_batch - sum_s_times_g)) / (self.temperature + self.epsilon)
    return res

forward(input_batch, training=True)

Applies Softmax function.

Parameters:

Name Type Description Default
input_batch ArrayType

Input logits of shape (batch_size, num_classes).

required
training bool

Unused. Defaults to True.

True

Returns:

Name Type Description
ArrayType ArrayType

Probabilities of shape (batch_size, num_classes).

Source code in src/mpneuralnetwork/activations.py
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
def forward(self, input_batch: ArrayType, training: bool = True) -> ArrayType:
    """Applies Softmax function.

    Args:
        input_batch (ArrayType): Input logits of shape (batch_size, num_classes).
        training (bool, optional): Unused. Defaults to True.

    Returns:
        ArrayType: Probabilities of shape (batch_size, num_classes).
    """
    scaled_logits = input_batch / (self.temperature + self.epsilon)

    m = xp.max(scaled_logits, axis=1, keepdims=True)
    e = xp.exp(scaled_logits - m, dtype=DTYPE)

    self.output = e / xp.sum(e, axis=1, keepdims=True, dtype=DTYPE)
    return self.output

mpneuralnetwork.activations.Sigmoid

Bases: Activation

Sigmoid activation function.

Formula

f(x) = 1 / (1 + exp(-x))

Range: (0, 1). Used for binary classification (output layer) or gating mechanisms (like in LSTMs). Can suffer from vanishing gradients in deep networks.

Source code in src/mpneuralnetwork/activations.py
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
class Sigmoid(Activation):
    """Sigmoid activation function.

    Formula:
        `f(x) = 1 / (1 + exp(-x))`

    Range: (0, 1).
    Used for binary classification (output layer) or gating mechanisms (like in LSTMs).
    Can suffer from vanishing gradients in deep networks.
    """

    def __init__(self) -> None:
        def sigmoid(x: ArrayType) -> ArrayType:
            return 1 / (1 + xp.exp(-x, dtype=DTYPE))  # type: ignore[no-any-return]

        super().__init__(lambda x: sigmoid(x), lambda x: sigmoid(x) * (1 - sigmoid(x)))