6.3 The geometric anatomy of first order Taylor series approximations

Press the button 'Toggle code' below to toggle code on and off for entire this presentation.

In [1]:
from IPython.display import display
from IPython.display import HTML
import IPython.core.display as di # Example: di.display_html('<h3>%s:</h3>' % str, raw=True)

# This line will hide code by default when the notebook is exported as HTML
di.display_html('<script>jQuery(function() {if (jQuery("body.notebook_app").length == 0) { jQuery(".input_area").toggle(); jQuery(".prompt").toggle();}});</script>', raw=True)

# This line will add a button to toggle visibility of code blocks, for use with the HTML export version
di.display_html('''<button onclick="jQuery('.input_area').toggle(); jQuery('.prompt').toggle();">Toggle code</button>''', raw=True)

6.3.1 Single-input function derivatives and the direction of steepest ascent / descent

  • The derivative of a single-input function defines a tangent line at each point in its input domain - this is called its first order Taylor series approximation.
  • For a generic differentiable function $g(w)$ we can define this tangent line at each point $w^0$ as
\begin{equation} h(w) = g(w^0) + \frac{\mathrm{d}}{\mathrm{d}w}g(w^0)(w - w^0) \end{equation}

  • The steepest ascent direction is given explicitly by the slope of this line, which is the derivative itself
\begin{equation} \text{steepest ascent direction of tangent line} = \frac{\mathrm{d}}{\mathrm{d}w}g(w^0). \end{equation}
  • Likewise, the steepest descent direction is given by the negative slope of this line - which is the negative derivative
\begin{equation} \text{steepest descent direction of tangent line} = -\frac{\mathrm{d}}{\mathrm{d}w}g(w^0). \end{equation}
  • This particular (tangent) line is built to explicitly to closely approximate its underlying function near $w^0$.

  • Because of of this its steepest ascent and descent directions tell us not just the directions we should travel in in order to increase / decrease its value locally, but the direction we should travel in (at least locally around $w^0$ the input point defining the tangent line) in order to increase / decrease the value of the underlying function itself.

Example 1. The derivative as a direction of ascent / descent for a single-input quadratic

  • $g(w) = 0.5w^2 + 1$
  • The derivative / steepest ascent direction: black
  • The negative derivative / steepest descent direction: red
In [2]:
# what function should we play with?  Defined in the next line.
g = lambda w: 0.5*w**2 + 1

# run the visualizer for our chosen input function
callib.derivative_ascent_visualizer.animate_visualize2d(g=g,num_frames = 10,plot_descent = True)
Out[2]:



Example 2. The derivative as a direction of ascent / descent for a single-input wavy function

  • $g(w) = \text{sin}(3w) + 0.1w^2 + 1.5$
In [3]:
# what function should we play with?  Defined in the next line.
g = lambda w: np.sin(3*w) + 0.1*w**2 + 1.5

# run the visualizer for our chosen input function
callib.derivative_ascent_visualizer.animate_visualize2d(g=g,num_frames = 100,plot_descent = True)
Out[3]:



6.3.2 Multi-input function derivatives and the direction of greatest ascent / descent

  • With an $N$ dimensional input function $g\left(\mathbf{w}\right)$ instead of one derivative we have $N$ partial derivatives, one in each input direction, stacked into a vector called the gradient
\begin{equation} \nabla g\left(\mathbf{w}\right) = \begin{bmatrix} \ \frac{\partial}{\partial w_1}g\left(\mathbf{w}\right) \\ \frac{\partial}{\partial w_2}g\left(\mathbf{w}\right) \\ \vdots \\ \frac{\partial}{\partial w_N}g\left(\mathbf{w}\right). \end{bmatrix} \end{equation}
  • Likewise the first order Taylor series is now a tangent hyperplane, which at a point $\mathbf{w}^0$ has the (analogous to the single input case) formula
\begin{equation} h(\mathbf{w}) = g(\mathbf{w}^0) + \nabla g(\mathbf{w}^0)^T(\mathbf{w} - \mathbf{w}^0). \end{equation}
  • In complete analogy to the single-input case:
\begin{equation} \text{steepest ascent direction along $n^{th}$ axis} = \frac{\partial}{\partial w_n} g(\mathbf{w}^0). \end{equation}\begin{equation} \text{steepest descent direction along $n^{th}$ axis} = - \frac{\partial}{\partial w_n} g(\mathbf{w}^0). \end{equation}
  • The steepest ascent direction itself - with respect to the entire $N$ dimensional input space - is then given by the entire gradient
\begin{equation} \text{ascent direction of tangent hyperplane} = \nabla g(\mathbf{w}^0). \end{equation}\begin{equation} \text{descent direction of tangent hyperplane} = -\nabla g(\mathbf{w}^0) \end{equation}

The steepest ascent / descent direction of the first order Taylor series approximation tells us the direction we must travel in (at least locally around where it most closely resembles its underlying function) in order to increase / decrease both the linear approximation and underlying function. These directions are defined explicitly by the gradient of the function.

Example 3. The gradient as a direction of ascent / descent for a multi-input quadratic function

  • $g(w_1,w_2) = w_1^2 + w_2^2 + 6$
  • $\mathbf{w}^0 = \begin{bmatrix} -1\\ 1 \end{bmatrix}$
In [14]:
# define function, and points at which to take derivative
func = lambda w:  w[0]**2 + w[1]**2 + 6
w0 = [-1,1];

# animate 2d slope visualizer
view = [33,30]
callib.derivative_ascent_visualizer.visualize3d(func=func,view = view,pt = w0,plot_descent = True)
  • Same function, different point
  • $\mathbf{w}^0 = \begin{bmatrix} -1\\ -1 \end{bmatrix}$
In [15]:
# define function, and points at which to take derivative
func = lambda w:  w[0]**2 + w[1]**2 + 6
w0 = [-1,-1];

# animate 2d slope visualizer
view = [33,-30]
callib.derivative_ascent_visualizer.visualize3d(func=func,view = view,pt = w0,plot_descent = True)

Example 4. The gradient as a direction of ascent / descent for a multi-input wavy function

  • $g(w_1,w_2) = 5 + \text{sin}(1.5w_1) - 2w_2$
  • $\mathbf{w}^0 = \begin{bmatrix} 0\\ 0 \end{bmatrix}$
In [16]:
# define function, and points at which to take derivative
func = lambda w:  np.sin(1.5*w[0] - 2*w[1]) + 6
w0 = [0,0];

# animate 2d slope visualizer
view = [33,50]
callib.derivative_ascent_visualizer.visualize3d(func=func,view = view,pt = w0,plot_descent = True)
  • Same function, different point
  • $\mathbf{w}^0 = \begin{bmatrix} 1\\ -1 \end{bmatrix}$
In [17]:
# define function, and points at which to take derivative
func = lambda w:  np.sin(1.5*w[0] - 2*w[1]) + 6
w0 = [1,-1];

# animate 2d slope visualizer
view = [33,40]
callib.derivative_ascent_visualizer.visualize3d(func=func,view = view,pt = w0,plot_descent = True)

6.3.3 Viewing the gradient descent direction in the input space

  • Remember, ascent and descent directions live in the input space of the function.
  • Removing the vertical dimension from our three-dimensional plots we can still draw such a function via a contour plot
  • The contour plot for a given function $g\left(\mathbf{w}\right)$ shows constant slices $g\left(\mathbf{w}\right) = c$ of the function as lines or curves projected onto the input space.

Example 4. Gradient descent directions on the contour plot of a quadratic function

  • $g\left(\mathbf{w}\right) = w_0^2 + w_1^2 + 2$
In [8]:
# function to plot
g = lambda w: w[0]**2 + w[1]**2 + 2

# random points at which to compute the gradient
pts = np.array([[ 4.24698761,  1.39640246, -3.75877989],
               [-0.49560712,  3.22926095, -3.65478083]])

# produce contour plot with gradients
callib.perp_gradient_viewer.illustrate_gradients(g,pts)

NOTE: the gradient ascent/descent direction at an input $\mathbf{w}^{\star}$ is always perpendicular to the contour $g\left(\mathbf{w}^{\star}\right) = c$.

Example 5. Gradient descent directions on the contour plot of a wavy function

  • $g\left(\mathbf{w}\right) = w_0^2 + w_1^2 + 2\text{sin}\left(w_0 + w_1\right)^2 + 2$
In [9]:
# function to plot
g = lambda w: w[0]**2 + w[1]**2 + 2*np.sin(1.5*(w[0] + w[1])) + 2

# points at which to compute the gradient
pts = np.array([[ 4.24698761,  1.39640246, -3.75877989],
               [-0.49560712,  3.22926095, -3.65478083]])

# produce contour plot with gradients
callib.perp_gradient_viewer.illustrate_gradients(g,pts)

Example 6. Gradient descent directions on the contour plot of a standard non-convex test function

  • $g\left(\mathbf{w}\right) = \left(w_0^2 + w_1 - 11 \right)^2 + \left( w_0 + w_1^2 - 6 \right)^2$
In [10]:
# function to plot
g = lambda w: (w[0]**2 + w[1] - 11)**2 + (w[0] + w[1]**2 - 7)**2

# points at which to compute the gradient
pts = np.array([[ 2.2430266 , -1.06962305, -1.60668751],
               [-0.57717812,  1.38128471, -1.61134124]])

# produce contour plot with gradients
callib.perp_gradient_viewer.illustrate_gradients(g,pts)

claim: The gradient-defined ascent/descent directions are always perpendicular to a function's contour.

proof:If we suppose $g\left(\mathbf{w}\right)$ is a differentiable function and $\mathbf{a}$ is some input point, then $\mathbf{a}$ lies on the contour defined by all those points where $g\left(\mathbf{w}\right) = g\left(\mathbf{a}\right) = c$ for some constant $c$. If we take another point from this contour $\mathbf{b}$ very close to $\mathbf{a}$ then the vector $\mathbf{a} - \mathbf{b}$ is essentially perpendicular to the gradient $\nabla g\left(\mathbf{a}\right)$ since $\nabla g\left(\mathbf{a}\right)^T\left(\mathbf{a} - \mathbf{b}\right) = 0$ essentially defines the line in the input space whose normal vector is precisely $\nabla g\left(\mathbf{a}\right)$. So indeed both the ascent and descent directions defined by the gradient of $g$ at $\mathbf{a}$ are perpendicular to the contour there. And since $\mathbf{a}$ was any arbitrary input of $g$, the same argument holds for each of its inputs.