How Machines Learn by Anil Ananthaswamy
I highly recommend Anil Ananthaswamy’s How Machines Learn as an introduction to the foundational concepts of deep learning. It touches on the math, but keeps it accessible for a high school grad, and grounds it with practitioner interviews (LeCun, Sutskever, Hinton, Hopfield, etc). Ananthaswamy starts with perceptrons and ends with deep learning networks, concluding before generative models.
Here are some of the ideas which were new to me:
- With enough neurons, a neural network with even just a single hidden layer can approximate any function (Cybenko 1989)
- Convolutional neural networks, like the early LeNet and AlexNet, learn the convolutional filters during training (LeNet 1989, AlexNet 2012)
- Kernel methods map input data to a higher dimensional space where there might be a separating hyperplane (Guyon 1991)
- Overparameterized deep learning models don’t overfit, despite shattering the training data (see implicit and explicit regularization, Occam’s Razor and Lottery Ticket Hypothesis)
“I definitely remember being perplexed by how simple the whole thing is… How can it be? You look at your undergrad classes in math or physics, and they’re so complicated. And then this stuff is so simple. You just read two papers and you understand such powerful concepts. How can it be that it’s so simple?”
- Ilya Sutskever