Training deep neural networks efficiently is often challenging due to unstable gradients, slow convergence, and sensitivity to parameter initialisation. One of the core reasons behind these issues is internal covariate shift, a phenomenon where the distribution of activations changes...