code
share


â–º Chapter 7: Second order methods

7.8 The second order optimality condition

In this brief Section we discuss the final calculus-based descriptor of a function's minima and stationary points more broadly speaking: the second order condition for optimality. As we have seen in the previous Chapter the first order condition defines all stationary points (minima, maxima, and saddle points) via a single condition - the first order system of equations. In theory if we could compute all solutions to such a system for a given function we could - in order to determine the global minima - simply evaluate all of the solutions and find the smallest evaluators (declaring them the global minima). However outside of these global minima nothing about the first order condition can help us reveal the identity of any other stationary points - whether or not they be local minima, maxima, or saddle points. However second order information (i.e., the second derivatives of a function) can be leveraged to determine - in most cases - the identity of a stationary point. This is precisely the second order optimality condition - a filter that can determine the identity of a given stationary point.

In other words, the second order condition does not directly help us with our main pursuit of finding (approximate) global minima of a function. In fact, as we will see later, second order algorithms are actually explicitly built to satisfy the first not the second order optimality condition. So while the second order condition provides further characterization of a function's stationary points, it is of largely theoretical interest.

In [1]:

7.8.1 Intuiting the condition by example¶

Studying a few simple examples it is easy to come to some far-reaching conclusions about how the second derivative helps unveil the identity of stationary points. In the next Python cell we plot the three single-input functions along with their first and second order derivatives. We mark in green the evaluation of all stationary points by the function in green (where we also show the tangent line in green), as well as the evaluations by the first and second derivatives marking each evaluation in green. In the first and second derivative panels we draw the horizontal zero axis as a dashed black line.

In [2]:
# plot a single input quadratic in both two and three dimensions
func1 = lambda w: np.sin(2*w)
func2 = lambda w: w**3
func3 = lambda w: np.sin(3*w) + 0.1*w**2

# use custom plotter to show both functions
calib.derivative_visualizer.show_stationary_v2(func1 = func1,func2 = func2,func3 = func3)

Studying these simple examples we can see consistent behavior of certain stationary points. In particular we can see consistency in how the value of a function's second derivative at a stationary point $w^0$ helps us identify whether it is a local minima, local maxima, or saddle point. In the examples above we see that a stationary point $w^0$ is

  • a local or global minimum if $\frac{\partial^2}{\partial w^2}g(w) > 0$ (since it occurs at convex portions of a function)
  • a local or global maximum if $\frac{\partial^2}{\partial w^2}g(w) < 0$ (since it occurs at concave portions of a function)
  • a saddle point if $\frac{\partial^2}{\partial w^2}g(w) = 0$ and $\frac{\partial^2}{\partial w^2}g(w)$ changes sign at $w$ (since it occurs at an inflection point of a function, i.e., where a function goes from concave to convex or vice-versa)

These second order characteristics hold more generally as well for any single input function, and taken together are indeed the second order condition for optimality for single input functions.

With multi-input functions the analogous second order condition holds. As with all things having to do with convexity/concavity and the second order derivative matrix (a.k.a. the Hessian) the rule translates to the eigenvalues of the Hessian. More specifically a stationary point $\mathbf{w}^0$ of a multi-input function $g(\mathbf{w}^0)$ is

  • a local minimum if all eigenvalues of $\nabla^2 g(\mathbf{w}^0)$ are positive (since it occurs at convex portions of a function)
  • a local maximum if all eigenvalues of $\nabla^2 g(\mathbf{w}^0)$ are negative (since it occurs at concave portions of a function)
  • a saddle point if the eigenvalues of $\nabla^2 g(\mathbf{w}^0)$ are of mixed values (i.e., some are negative, some are positive) (since it occurs at an inflection point of a function)

These rules reduce to those just stated for single-input functions when $N=1$ since then the Hessian matrix collapses into a single second order derivative.

Second order condition for optimality: the second derivative(s) of a function can determine whether a stationary point is a local minimum, local maximum, or saddle point. Generally speaking a stationary point $\mathbf{w}^0$ is a local minimum, local maximum, or saddle point if the eigenvalues of $\nabla^2 g(\mathbf{w}^0)$ are all positive, negative, or mixed, respectively.