These four constrained optimization methods looks similarly when first seen:
- Lagrange Multipliers
- Penalty Methods
- Augmented Lagrangian Methods
- Merit Methods
Here is a comprehensive explaination towards these four methods written by Brian Borchers.
No, they’re not all the same and it’s important to understand the differences between them.
Start with a simple optimization problem $$ {\rm min},f({\bf x}) $$ subject to $$ {\bf g}({\bf x})=0 $$ where we can assume for simplicity that $f$ and $g$ are smooth (at least twice continously differentiable.)
The Lagrangian function is $$ L({\bf x},\boldsymbol{\lambda})=f({\bf x})+\boldsymbol{\lambda}^\top {\bf g}({\bf x}) $$ Note that $L$ is a function of ${\bf x}$ and $\boldsymbol{\lambda}$. The first order necessary condition for a point ${\bf x}^$ to be a minimizer is that there is a $\boldsymbol{\lambda}^$ such that $({\bf x}^,\boldsymbol{\lambda}^)$ is a stationary point of $L$. In the method of multipliers, we try to solve the nonlinear system of equations $$ \nabla_{\bf x}L={\bf 0} \\nabla_{\boldsymbol{\lambda}}L={\bf 0} $$ This is typically done by alternately minimizing with respect to $\bf x$ and updating $\boldsymbol{\lambda}$. Given a Lagrange multiplier estimate $\boldsymbol{\lambda}^{(k)}$, we minimize $L({\bf x},\boldsymbol{\lambda}^{(k)})$ to get ${\bf x}^{(k)}$. Then we update $\boldsymbol{\lambda}$ with $$ \boldsymbol\lambda^{(k+1)}=\boldsymbol\lambda^{(k)}+\alpha_k {\bf g}({\bf x}^k) $$ Where $\alpha_k$ is a step size parameter that can be set in various ways.
An penalty function for our problem is a function that is $0$ if ${\bf g}({\bf x})={\bf 0}$ and greater than $\bf 0$ when ${\bf g}({\bf x}) \neq \bf 0$. A commonly used penalty function is the quadratic penalty function $$ \phi({\bf g}({\bf x}))={\bf g}^2 ({\bf x}) $$ In the penalty function method, we solve an unconstrained problem of the form $$ {\rm min}, f({\bf x})+\boldsymbol\rho^\top\phi({\bf g}({\bf x})) $$ where $\boldsymbol \rho$ is a penalty parameter that is increased until the solution of the penalized problem is close to satisfying ${\bf g}({\bf x})={\bf 0}$. Note that $\boldsymbol \rho$ is not a Lagrange multiplier in this case.
For problems with inequality constraints a commonly used penalty function is $$ \phi({\bf g}({\bf x})) = \left({\rm max} \left{ 0, {\bf g}({\bf x}) \right}\right)^2 $$ An augmented Lagrangian function combines the penalty function idea with the Lagrangian: $$ \hat L({\bf x},\boldsymbol\lambda, \boldsymbol\rho)=f({\bf x})+\boldsymbol\lambda^\top{\bf g}({\bf x})+\boldsymbol\rho^\top\phi({\bf g}({\bf x})) $$ Augmented Lagrangian methods minimize $\hat L$ with respect to $\bf x$, update the Lagrange multiplier estimate $\boldsymbol \lambda$ and then (if necessary) update the penalty parameter $\boldsymbol\rho$ in each iteration. In practice, augmented Lagrangian methods outperform simple penalty methods and the method of multipliers.
Merit functions are used in a variety of nonlinear programming algorithms. You’ll most commonly see them used in sequential quadratic programming methods. In these methods, a search direction, ${\bf d}^{(k)}$, is computed at each iteration. The step is from ${\bf x}^{(k)}$ to $$ {\bf x}^{(k+1)}={\bf x}^{(k)}+\alpha_k {\bf d}^{(k)} $$ where the step size parameter $\alpha_k$ is determined by minimizing a merit function $$ \alpha_k = {\rm arg}\min_\alpha M({\bf x}^{(k)}+\alpha{\bf d}^{(k)}) $$ The merit function is typically something like a penalized objective function or an augmented Lagrangian, but there’s a great deal of freedom in the form of the merit function.
These functions and the associated methods are described in many textbooks on nonlinear optimization. A good discussion can be found in Numerical Optimization by Nocedal and Wright.