This is a blog about machine learning and deep learning fundamentals built by the authors of the textbook Machine Learning Refined published by Cambridge University Press. The posts, cut into short series, use careful writing and interactive coding widgets to provide an intuitive and playful way to learn about core concepts in AI - from some of the most basic to the most advanced.

Each and every post here is a Python Jupyter notebook, prettied up for the web, that you can download and run on your own machine by pulling our GitHub repo.





3.1. What are derivatives?   text   slides

3.2. Derivatives at a point and the numerical differentiator   text   slides

3.3. Derivative equations and hand computations   text   slides

3.4. Automatic differentiation - the forward mode   text   slides

3.5. Higher order derivatives   text   slides

3.6. Taylor series   text   slides

3.7. Derivatives of multi-input functions   text   slides

3.8. Effective gradient computation   text   slides

3.9. The Hessian and higher order derivatives   text   slides

3.10. Multi-input Taylor Series   text   slides

3.11. Getting to know autograd: your professional grade automatic differentiator   text   slides

5.0. Motivation for mathematical optimization  text   slides

5.1. The zero order condition for optimality   text   slides

5.2. Global optimization methods   text   slides

5.3. Local optimization methods   text   slides

5.4. Random search  text   slides

5.5. Coordinate search and descent  text   slides

6.0. Introduction   text   slides

6.1. The first order optimality condition   text   slides

6.2. The geometric anatomy of lines and hyperplanes   text   slides

6.3. The geometric anatomy of first order Taylor series approximations   text   slides

6.4. Gradient descent   text   slides

6.5. Conservative steplength rules   text   slides

6.6. First order coordinate descent methods  text   slides

7.1. Quadratic functions   text   slides

7.2. Second order derivatives and curvature   text   slides

7.3. Newton's method   text   slides

7.4. Regularization, Newton's method, and non-convex functions  text   slides

7.5. The first order derivation of Newton's method   text   slides

7.6. Quasi-Newton methods   text   slides

7.7. Coordinate descent   text   slides

7.8. The second order optimality condition   text   slides

8.1. Least squares linear regression   text   slides

8.2. The probabilistic perspective on Least Squares linear regression   text   slides

8.3. Least absolute deviations linear regression   text   slides

8.4. Feature scaling via standard normalization   text   slides

9.1. Logistic regression   text   slides

9.2. The perceptron   text   slides

9.3. Support vector machines   text   slides

9.4. Feature scaling via standard normalization   text   slides

10.1. One-versus-All classification   text   slides

10.2. Multiclass softmax classification   text   slides

10.3. Feature scaling via standard normalization   text   slides

11.1. Fixed spanning sets, orthonormality, and projections   text   slides

11.2. Principal Component Analysis and the Autoencoder   text   slides

11.3. Feature scaling via PCA sphereing   text   slides

11.4. Recommender Systems   text   slides

11.5. K-means clustering   text   slides

11.6. General matrix factorization techniques   text   slides

12.1. Features, functions, and nonlinear regression   text   slides

12.2. Features, functions, and nonlinear classification   text   slides

12.3. Features, functions, and nonlinear unsupervised learning   text   slides

12.4. Automating nonlinear learning  text   slides

12.5. Universal approximation  text   slides

12.6. Validation error   text   slides

12.7. Model search via Boosting   text   slides

12.8. Model search via Regularization   text   slides

12.9. Ensembling   text   slides

13.1. Introduction to multi-layer perceptrons   text   slides

13.2. Batch normalization   text   slides

13.3. Normalized gradient descent   text   slides

13.4. Momentum methods   text   slides

13.5. Regularization   text   slides

13.6. Stochastic and mini-batch gradient descent   text   slides

13.7. General steepest descent   text   slides

13.8. Early stopping   text   slides

14.1. The convolution operation   text   slides

14.2. Edge histogram based features   text   slides

14.3. Single layer convolutional neural networks   text   slides

14.4. Deep convolutional neural networks   text   slides

14.5. Transfer learning   text   slides

14.6. Adversarial examples and the fragility of convolutional networks   text   slides

14.7. Data normalization schemes for images   text   slides

15.1. Introduction   text   slides

15.2. Fixed Order Dynamic Systems   text   slides

15.3. Recurrence Relations   text   slides

15.4. Variable order dynamic systems   text   slides

15.5. Autoregressive modeling   text   slides

15.6. Recurrent networks   text   slides

15.7. Optimization tricks for recurrent networks   text   slides

15.8. Advanced architectures   text   slides

18.1. Fundamentals of Reinforcement Learning   text   slides

18.2. Q-Learning   text   slides

18.3. Q-Learning enhancements   text   slides

18.4. On generalizability of Reinforcement Learning   text   slides