Regularization Engine
During training, each neuron is retained with probability p. This prevents units from co-adapting too much.
● Active: Signal propagates (Solid Core).
○ Dropped: Output forced to zero (Wireframe).
We scale active neurons by 1/(1-p) during training so no scaling is needed at test time.