code
share


ā–ŗ Chapter 3: Derivatives and Automatic Differentiation

3.1 What are derivatives?

The concept of the derivative is by far the most important and far reaching concept from calculus used in machine learning / deep learning, and for good reason: the derivative lies at the heart of many mathematical optimization schemes used today. These optimization schemes are what allow us to tune parameters of machine learning models so that they can e.g., forecast the price of crude oil, or recognize human speech, or detect human faces in images, or a whole host of other things.

InĀ [1]:

3.1.1 Derivatives at a pointĀ¶

The derivative is a simple tool for understanding a mathematical function locally - meaning at and around a single point. More specifically the derivative at a point defines the best linear approximation - a line in two dimensions, a hyperplane in higher dimensions - that matches the given function at that point as well as a line / hyperplane can.

Why would someone need / come up with such an idea? Because most of the mathematical functions we deal with in machine learning, mathematical optimization, and science in general are too high dimensional for us to examine by eye. Because they live in higher dimensions we need tools (e.g., calculus) to help us understand and intuit their behavior.

First: just the pictures, pleaseĀ¶

Lets begin exploring this idea in pictures before jumping into the math. Lets examine a few candidate functions - beginning with the standard sinusoid

\begin{equation} g(w) = \text{sin}(w) \end{equation}

In the next Python cell we draw this function over a small range of its inputs, and then at each point draw the line defined by the function's derivative there on top.

The final result is an animated slider widget - at each increment of the slider the sinusoidal function is drawn in black, the point we are at in red, and the corresponding line produced using the derivative in green. Sliding from left to right moves the point - and its associated derivative given line - smoothly across the function.

InĀ [2]:
# what function should we play with?  Defined in the next line.
g = lambda w: np.sin(w)

# create an instance of the visualizer with this function 
taylor_viz = calclib.taylor2d_viz.visualizer(g = g)

# run the visualizer for our chosen input function
taylor_viz.draw_it(first_order = True,num_frames = 200)
Out[2]:



Notice a few things - first as you adjust the slider notice how the line produced by the derivative of the point is always tangent to the function. This is true more generally as well - for any function the linear approximation given by the derivative is tangent to the function at every point. Second - notice how the slope of the line defined by the derivative hugs the function at every point - it seems to match the general local steepness of the curve everywhere. This is also true in general: the slope of the tangent line given by the derivative always gives local steepness - or slope - of the function itself. The derivative naturally encodes this information. Third: notice how at each increment of the slider the tangent line defined by the derivative matches the function itself near the point in red. This is also true in general - the derivative at a point always defines a line that matches the underlying function near that point. In short - the derivative at a point is the slope of the tangent line at that point.

The derivative at a point defines a line that is always tangent to a function, encodes its steepness at that point, and generally matches the underlying function near the point locally. In short - the derivative at a point is the slope of the tangent line at that point.

Lets examine another candidate function using the same widget toy

\begin{equation} g(w) = \text{sin}(4w) + 0.1w^2 \end{equation}

You can use the widget to explore this animation for any function you wish - just swap out the first line in the Python cell below with your desired function.

InĀ [3]:
# what function should we play with?  Defined in the next line.
g = lambda w: np.sin(4*w) + 0.1*w**2

# create an instance of the visualizer with this function 
taylor_viz = calclib.taylor2d_viz.visualizer(g = g)

# run the visualizer for our chosen input function
taylor_viz.draw_it(first_order = True,num_frames = 400)
Out[3]:



Again as you slide from left to right you can see how the line defined by the derivative at each point stays tangent to the curve, hugs the function's shape everywhere, and generally matches the function near the point.

And as mentioned previously - this notion holds for functions of any dimension. The only difference is that our tangent line becomes a hyperplane. For example the simple sinusoid in 3 dimensions given by

\begin{equation} g(w_0, w_1) = \text{sin}(w_0) \end{equation}

The next Python cell produces a static plot illustrating this function (shown in gray) along with the tangent hyperplane (in green) defined by its derivative at a specific point (in red). Again the function, as well as the given point, can be adjusted at will in the Jupyter notebook version of this Section.

InĀ [4]:
# define the function to plot, as well as a point at which to draw tangent hyperplane
g = lambda w: np.sin(w[0])
w_val = [-1.5,1]

# load in function to examine
taylor_viz = calclib.taylor3d_viz.visualizer(g = g)

# start examination
taylor_viz.draw_it(w_val = w_val,first_order = True,view = [20,110]);