I'd argue that deep learning is derived from formal principles of mathematical reasoning in a very concrete sense. Deep learning learns a predictive function of the features that minimizes the loss function (with some caveats). If the loss function and training data are well chosen, that minimizes the probability of being wrong.