03

ReLU Activation

Rectified Linear Unit

Function
max(0,x)
Sparsity
High
Params
0
📈 Activation Function
x y 0
ReLU
-
🔄 Compare Activations
ReLU
max(0,x)
Leaky
αx if x < 0
ELU
α(eˣ-1)
Sigmoid
1/(1+e⁻ˣ)
Tanh
(eˣ-e⁻ˣ)/...
GELU
x·Φ(x)
856
Positive Values
680
Zeroed (Negative)
44%
Sparsity
0.42
Mean Output
ReLU Formula
f(x) = max(0, x)
f'(x) = 1 if x > 0, else 0
🎮 Controls
Threshold 0.00
Animation Speed 1.0x
📊 Distribution
Before ReLU
After ReLU

🔥 Why ReLU?

ReLU introduces non-linearity while being computationally efficient. It helps avoid vanishing gradients and enables learning complex patterns.

⚠️ Dead Neurons

Neurons with consistently negative inputs may "die" and stop learning. Leaky ReLU variants address this issue.