Principles of Optimization¶

Now that you have been introduced to the foundational concepts of optimization, this post will describe optimization from a more mathematical perspective. From the previous section, you may recall that the primary goal of optimization is to minimize or maximize an objective function. But what does it mean for something to be a minimizer?

Concept of a Minimizer¶

To explain the concept of a minimizer, let’s consider a simple example of a ball that can roll along a curved track, as shown below. The ball is constrained to always be on the surface of the track, and it must remain within the two walls on either side of the track. The ball starts at the top of the hill, as shown with the blue, solid circle. Then, it might roll and settle at either one of the positions shown with the red, hatched circles. If you consider both, the global minimizer is the position on the left, as this is when the ball reaches its lowest point across all options. However, the position on the right is an example of a local minimizer, as this position is the lowest point within a small region around the point. For optimization problems, we want to find the global minimizer, as this corresponds to the smallest value of the objective function. However, it is possible that we conclude at a local minimizer, such as when the ball moves to the position on the right.

example of a ball on a curved track, demonstrating the idea of global and local minimizers — Fig. 21 Global and Local Minimizers for a Ball on a Curved Track¶

Concept of a Gradient¶

While the position of the ball allows us to determine its height, another important piece of information is the slope of the surface at a given position. At each point along the surface, the gradient is equivalent to the slope of the surface that the ball can move along. The gradient is mathematically represented by \(\nabla f(x)\), and it always points uphill, which is opposite to the red arrows shown in the previous image.

To gain an intuition for the gradient, experiment with the two GeoGebra applets embedded below. For this first applet, the scenario of the ball moving on the curved track is mimicked. The orange point can be dragged along the curved surface between the two boundaries, and, as the point moves, the red arrow representing the gradient of the function changes size and direction. You can also toggle to view the descent direction, which will be introduced later in this section.

This second applet increases the dimension of the problem by one, as now the point can be moved both horizontally and vertically. The applet is divided into two separate view panes. On the right, you can see the surface of the objective function, which in this case does not have a well-defined meaning. However, for the purposes of this example, you can again consider the objective function as the altitude, which is the value in the \(Z\) direction. The view on the left shows the contour plot of the function on the right. Here, the curves drawn indicate the locations where the function has a constant value, and the values are shown next to the curves. The benefit of a contour plot is that it allows for viewing a three-dimensional surface in a two-dimensional space. Again, the orange point can be dragged throughout the two-dimensional space, and the gradient will update as the point moves. The position of the orange point will also update in the right pane to help with understanding the two visuals.

In both of these applets, it is important to observe that the gradient of the function always points up a hill or in the direction of increasing function value. Now, how does information about the gradient get utilized for optimization?

The Gradient’s Role in Optimization¶

In the context of optimization, the gradient plays an important role in determining what direction the optimizer will move our current design point.

Consider again the example where we want to minimize the ball’s height. Recalling that the gradient tells us the direction of maximum increase and points uphill, this tells us a direction that we do not want to go! Therefore, during optimization, we want to move in descent directions, which are directions that will point us towards minimizers.

Now, in all three of the examples that have been explored so far, particularly in the two applets, you may have noticed something interesting that happens to the gradient at minimizers (or maximizers). If you have not noticed this yet, feel free to scroll back and up and experiment before continuing!

example of a ball on a curved track with a prompt to consider the relationship between minimizers and the gradient value — Fig. 22 What Happens to the Gradient when the Ball is at a Minimizer?¶

For all the examples above, and for all functions in general, the value of the gradient at a minimizer will be zero. Thinking about this intuitively, if the ball is already at the bottom of the hill, then there is nowhere else that it can move such that the function value decreases further!

So, from a high-level perspective, when optimizing, we want to move the design point in descent directions until we reach a minimizer, where the value of the gradient at that point is zero.

Now that you are familiar with the concepts of gradients, minimizers, and descent directions, the next sections will dive into a few specific examples of optimization to help introduce these ideas in action for real problems. When thinking through the examples, try to connect the ideas that you learned in this section with the practical task of solving the problems to solidify the underlying theory!

Principles of Optimization¶

Concept of a Minimizer¶

Concept of a Gradient¶

The Gradient’s Role in Optimization¶

Test your Knowledge¶