Good NLP Formulations

In this tutorial we offer some advice and guidance on how to set up or formulate an NLP model so that a solver will be able compute a good solution and do so quickly, reliably, and predictably. Much of this applies to other model classes allowing nonlinear functions, but for ease and simplicity of exposition we focus on the NLP case here.

A good formulation for an NLP model typically involves several things, including specifying sensible initial values, setting variable bounds, and scaling variables and equations. Other factors to consider are techniques for blocking degenerate cycling and the potential benefits of avoiding expressions in nonlinear functions. Finally we look at reformulations and approximations for discontinuous functions like abs, max and min in section Reformulating DNLP Models.

Specifying Initial Values

The variable levels and equation marginals in GAMS are typically passed to a solver to be used as the initial point. The initial values specified are especially important for NLP models for several reasons:

  • Non-convex models may have multiple local solutions. Local NLP solvers search for a local solution and return it when it is found. An initial point in the neighborhood of a desired solution is more likely to return that solution.
  • Initial values that (almost) satisfy many of the constraints reduce the work involved in finding a first feasible solution.
  • Initial values that are close to a local optimum reduce the work required to find that local optimum, therefore reducing solution time.
  • The progress of the optimization algorithm is based on good directional information, i.e on good derivatives. The derivatives in a nonlinear model depend on the current point, so an improved initial point can improve solver performance.

Variable levels and equation marginals are specified by setting the variable attribute .L and the equation attribute .m before solution. This is often done with assignments that occur before the solve statement, e.g.:

domPrice.L(i,region,t) = domPrice0(i,region,t);
flowLim.m(arcs) = 1;
Note
The default value for the variable levels and equation marginals is 0.

The variable bounds also play a role in determining the initial point passed to the NLP solver. When a solve occurs, the levels for all variables in the model are first projected onto the set defined by the variable bounds. Thus, setting a variable's lower bound to a positive value ensures that the initial value of this variable will never be zero. This is very useful, since in many cases zero is an unsuitable initial value for nonlinear variables. For example, based on the product term \(x \cdot y\), an initial value of zero for \(x\) will lead to an initial derivative value of zero wrt \(y\), so it will appear as if the function does not depend on \(y\). Variables at zero can also cause numerical difficulties with logarithms, real powers, or divisions. These difficulties occur not just at zero but also for very small values (i.e. values very close to zero) as well.

We recommend to specify as many sensible initial values for the nonlinear variables as possible. It may be desirable to initialize all variables to 1 or to the scale factor if the GAMS scaling option is used. A better alternative is to first select reasonable values for some variables that are known from context or experience to be important and then to use some of the equations of the model to compute the values for other variables. For example, consider the following equation, where pm, pwm and er are variables and tm is a parameter:

pmDef(i)..   pm(i) =e= pwm(i)*er*(1+tm(i));

The following assignment statements use the equation to derive consistent initial values for the variable pm from sensible initial values for the variables pwm and er:

pwm.L(i) = 1;
er.L = 1;
pm.L(i) = pwm.L(i)*er.L*(1+tm(i));

It would be a mistake to assign only to pwm and er and assume that the solver will choose to adjust the variable pm to make the equation feasible: it could choose to adjust pwm or er instead. With all the assignments above made, we can be assured that the equation pmDef will be satisfied at the initial point.

Setting the initial point by loading a prior solution that has been saved via the savepoint mechanism is also an effective strategy that is very easy to implement.

Setting Variable Bounds

Lower and upper bounds on variables are set by assigning values to the variable attributes .lo and .up in the following way:

price.lo(i,region,t) = 1e-4;
flow.up(arcs) = arcCap(arcs);

Lower and upper bounds on variables in nonlinear models serve multiple purposes. Some bounds represent constraints based on the reality that is being modeled. For example, a certain production level must be non-negative or an arc in a network has a flow capacity of at most ten. These bounds are called model bounds. Other bounds help the algorithm by preventing it from moving far away from any optimal solution and/or into regions with singularities in the nonlinear functions or unreasonably large function or derivative values. These bounds are called algorithmic bounds. Solver performance can be improved and execution errors (see domLim and domUsd) avoided when algorithmic bounds on variables are introduced.

Model bounds are determined by the reality being modeled and do not cause any problems. However, algorithmic bounds must be carefully chosen by the modeler. We recommend to pay particular attention if a variable is the argument in log(x), log10(x) or exp(x) and if a variable occurs in the denominator of a division. If log(x) or log10(x) appears in a model, where x is a variable, we recommend a lower bound of 1.e-3 for x since log(x) gets very small as x approaches zero and is undefined for negative values of x. In addition, the first derivative gets very large as x approaches zero. If exp(x) features in a model, where x is a variable, we recommend an upper bound between 20 and 25 for x. If a variable x appears in the denominator, we recommend a lower bound of 1.e-2 for x, since 1/x is extremenly nonlinear for small arguments. Small values for variables used with negative exponents are also not desirable. Solver performance can be improved and execution errors avoided when one introduces algorithmic bounds on variables.

Note that lower and upper bounds facilitate finding a feasible solution as most solvers will honor bounds at all times, but inequalities are not necessarily satisfied at intermediate points. A further advantage of variable bounds compared to inequalities is improved presolve performance: NLP solver preprocessors will typically incur little or no computational overhead due to variable bounds.

Avoiding Expressions in Nonlinear Functions

It is often useful to avoid nonlinear functions of expressions (e.g. a division by the sum of many variables). Instead, an intermediate variable can be used for the expression. This applies in particular if the expressions depend on many variables. Consider the following example:

variable x(i), y;
equation ydef;
ydef..   y =e= 1 / sum(i, x(i));

This example could be reformulated via the intermediate variable xsum and its defining equation xsumdef in the following way:

variable x(i), y, xsum;
equation xsumdef, ydef;
xsumdef..   xsum =e= sum(i, x(i));
ydef   ..   y =e= 1/xsum;
xsum.lo = 1.e-2;

In the equation ydef, the intermediate variable xsum appears in the denominator instead of the original summation. This allows us to impose a lower bound on the variable xsum to avoid dividing by zero. Of course, the model will contain more rows and columns if intermediate variables are introduced, but this increase in size is offset by a decrease in complexity and, in many cases, by an increase in sparsity as well.

Scaling Variables and Equations

Recall that nonlinear programming algorithms use the derivatives of the objective function and the constraints to find good search directions and they use function values to determine if constraints are satisfied or not. The relative size of the derivatives and the function values is influenced by the units of measurement that are used for the variables and constraints, and will have an effect on the performance of the solver and the result computed. Therefore, a proper, consistent scaling of the model is important to the success of the solution algorithm and the quality of the answer returned.

For example, assume that two goods are equally costly: both cost $1 per kg. However, the first is specified in grams and the second in metric tons, so that their coefficients in the cost function will be vastly different: $1000 per gram and $0.001 per ton respectively. If cost is measured in $1000 units, then the coefficients will be 1 and 1.e-6 respectively. This discrepency in size may cause the algorithm to ignore the variable with the smaller coefficient, since the coefficient is comparable to some of the zero tolerances. To avoid such problems, the units of measurements need to be carefully chosen, that is, variables and constraints need to be properly scaled.

We recommend scaling with the following goals in mind:

  • Solution level values of variables should fall into a range around 1, e.g. from 0.01 to 100.
  • The magnitude of the nonzero constraint marginals at solution should fall into a range around 1, e.g. from 0.01 to 100.
  • The magnitude of the derivatives of the nonlinear terms (i.e. the Jacobian elements) should fall into a range around 1, e.g. from 0.01 to 100, both at the initial point and at the solution.
  • The constants in the equations should have absolute values around 1, e.g. from 0.01 to 100.

Well-scaled variables are measured in appropriate units. In most cases users should select the unit of measurement for the variables such that their expected value is around unity. Of course, there will always be some variation. For example, if x(i) is the production at location i, one could select the same unit of measurement for all components of x, say, a value around the average capacity.

In well-scaled equations the individual terms are measured in appropriate units. After choosing units for the variables users should choose the unit of measurement for the equations such that the expected values of the individual terms are around 1. For example, if these rules are followed, material balance equations will usually have coefficients of 1 and -1.

Usually well-scaled variables and equations result in well-scaled derivatives. To check whether the derivatives are well-scaled, we recommend running the model with a positive value for the option limrow and inspecting the coefficients in the equation listing of the GAMS list file.

For more about scaling in GAMS, see section Model Scaling - The Scale Option. Note that while many solvers have internal scaling procedures, a better result can generally be achieved by a judicious choice of units by the model developer.

Blocking Degenerate Cycling

Most commercial linear programming solvers use a perturbation technique to avoid degenerate cycling during the solution process: they temporarily add small numbers to the right-hand sides of equations. In general, NLP solvers do not have such an internal feature. Sometimes the success and performance of an NLP solver can be enhanced by a manual perturbation formulation.

In particular, if users observe that the NLP solution process has a large number of iterations where the solver does not make significant progress in altering the objective function value, we recommend to modify the equations in the model by replacing the value of zero on the right-hand side with a small number. This may accelerate the solution process. Assume that we have the following equation:

\[ f(x) \leq 0 \]

This could be reformulated as:

\[ f(x) \leq \delta*0.001 \]

Here we set \( \delta \) to 1 if we wish to keep the addition and to zero otherwise. The value of 0.001 is just an example and needs to be adjusted based on the model context. The number should be chosen such that it does not introduce significant distortions into the problem solution. Such an addition quite frequently reduces solution time by helping the solver avoid degenerate cycling. If it is done correctly, the resulting model solution is not qualitatively different from the original model solution. Users may also first solve the model with \( \delta = 1\) and subsequently with \( \delta = 0\) to get rid of the effects of the small numbers.

We recommend that user avoid using the same number but instead use some systematically varying number or a random number. The technique of adding a small number on the right-hand side may also be used in problems where many equations have the same nonzero value on the right-hand side.

Reformulating DNLP Models

Nonlinear models in GAMS belong to one of the following two classes: smooth and discontinuous. Typically, all functions with endogenous arguments contained in the model are smooth functions (i.e. functions with continuous derivatives) like sin, exp and log. These models can be solved using NLP. If any of the endogenous functions in the model are not smooth (i.e. are discontinuous), the model cannot be solved as an NLP: the DNLP model type must be used instead. Examples of non-smooth functions include ceil and sign, where the function itself is not continuous, and max, min, and abs, where the derivatives are not continuous. Typically, NLP solvers are designed to work with continuous derivatives, and much of the convergence theory behind them assumes this continuity. Discontinuous functions or derivatives may cause numerical problems, poor performance, spurious (i.e. wrong) solutions, and other issues, so they should be used with special care and only if necessary.

N.B.: to avoid a proliferation of model types, nonlinear programming is the only model type split into smooth (NLP) and nonsmooth (DNLP) variants. All other model types allowing nonlinear functions (e.g. MINLP, MCP, CNS) include both smooth and nonsmooth functions. This is not because nonsmooth functions are less problematic in these contexts. It simply became too unwieldy to maintain this distinction across all types of nonlinear models.

A powerful and effective way to model discontinuous functions is with binary variables, which results in a model of type MINLP. The model [ABSMIP] demonstrates this formulation technique for the functions abs, min, max and sign. Alternatively, reformulations or approximations may be used to model discontinuous functions such that the resulting model is of type NLP. Here we offer some guidance on how to reformulate or approximate the discontinuous functions abs, max and min using only smooth functions and continuous variables and thus transform a DNLP model into an NLP. This transformation is generally more reliable than solving the original as a DNLP.

Note that some of the reformulations suggested below enlarge the feasible space. They rely on the objective function to choose a solution that is contained in the original feasible space, i.e. where the relationship defined by the nonsmooth function holds. If the objective cannot be relied on to do this, it is also possible to use one of the smooth approximations for the nonsmooth functions defined below.

Reformulating and Approximating the Function ABS

The function abs(x) returns the absolute value of the argument x. If we are minimizing an absolute value, then we can split this value into its positive and negative parts (both represented by positive variables) and minimize the sum of these variables. This formulation enlarges the feasible region but the optimal solution will be the one where the sum of the positive and negative parts is equal to the absolute value.

variables x, y, z;
equations
  obj  '1-norm'
  f ;
obj.. abs(x) + abs(y) =E= z;
f  .. sqr(x-3) + sqr(y+5) =L= 1;

model nonsmooth  / obj, f /;
solve nonsmooth using dnlp min z;

positive variables xPlus, xMinus, yPlus, yMinus;
equations
  obj2 'smooth version of 1-norm'
  xDef
  yDef
  ;
obj2.. xPlus + xMinus + yPlus + yMinus =E= z;
xDef.. x =E= xPlus - xMinus;
yDef.. y =E= yPlus - yMinus;

model smooth / obj2, xDef, yDef, f /;
solve smooth using nlp min z;

Note that the discontinuity in the derivative of the function abs has been converted into lower bounds on the new variables xPlus, xMinus, etc: these bounds are handled routinely by any NLP solver. Note too that the feasible space is larger than before. For example, many pairs xPlus and xMinus satisfy our equation x =E= xPlus - xMinus;. However, our objective ensures that one of xPlus and xMinus will be zero at the solution so that the sum of xPlus and xMinus wil be the absolute value of x.

In case the objective function does not contain a term that tries to minimize the absolute value, a smooth approximation can be used instead of the reformulation described above. This approximation should be close to the absolute value and also have smooth derivatives. Such an approximation for abs(f(x)) is:

sqrt(sqr(f(x)) + sqr(delta))

Here delta is a small scalar. The value of delta controls the accuracy of the approximation and the curvature around f(x)=0. The approximation error is largest when f(x)=0 and decreases as f(x) gets farther from zero. A value for delta ranging between 1.e-3 and 1.e-4 should be appropriate in most cases. Users could also use a larger value in an initial optimization, reduce it and solve the model again. If delta is reduced below 1.e-4, then large second order terms might lead to slow convergence or even prevent convergence. An example of this approximation used for the previous example is below:

$macro MYABS(t,d) [sqrt(sqr(t)+sqr(d))]
equation obj3;
obj3.. MYABS(x,1e-4) + MYABS(y,1e-4) =E= z;

model approx / obj3, f /;
solve approx using nlp min z;

Note the use of the macro facility to encapsulate the smooth reformulation. As mentioned above, this approximation has its largest error where f(x)=0. If it is important to get accurate values at this point, then we recommend the following alternative approximation:

sqrt(sqr(f(x)) + sqr(delta)) - delta

Note that the only difference is the subtraction of the constant term delta. In this case, the error will equal zero at f(x)=0 and it will increase to -delta as f(x) moves away from zero.

Reformulating and Approximating the Function MAX

The function max(x1,x2,x3,...) returns the maximum value of the arguments, where the number of the arguments may vary. Typically, the equation

t >= max(f(x),g(y))

is replaced by two inequalities:

t >= f(x)
t >= g(y)

Here x, y and t are variables and f(x) and g(y) are some functions depending x and y respectively. Provided the objective function has some term that tries to minimize t, one of the constraints will become binding at solution and t will equal the maximum of the two terms. The extension to more than two arguments in the function max should be obvious. A simple example follows:

variables
  x  / LO 0, L 0.2, UP [pi/2] /
  mx 'max[sin(x),cos(x)]'
  z  'objective var'
  ;
equation oDef;
oDef.. x / 100 + max[sin(x),cos(x)] =E= z;

model nonsmooth  / oDef /;
solve nonsmooth using dnlp min z;

equations oDef2, sinBnd, cosBnd;
oDef2.. x / 100 + mx =E= z;
sinBnd.. mx =G= sin(x);
cosBnd.. mx =G= cos(x);

model smooth / oDef2, sinBnd, cosBnd /;
solve smooth using nlp min z;

In case the objective function does not force the max-term to be minimized, a smooth approximation for max(f(x),g(y)) can be used, as in the following example code:

[f(x) + g(y) + sqrt(sqr(f(x)-g(y)) + sqr(delta))] /2
$macro MYMAX(t1,t2,d) [0.5 * [t1 + t2 + sqrt(sqr(t1-t2) + sqr(d))] ]
equation oDef3;
oDef3.. x / 100 + MYMAX(sin(x),cos(x),1e-4) =E= z;

model approx / oDef3 /;
solve approx using nlp min z;

Here delta is a small scalar, preferably ranging from 1.e-2 to 1.e-4. The approximation error takes its maximum of delta/2 when f(x)=g(y) and decreases as one moves away from this point. To shift the error away from the point of discontinuity, the following approximation can be used:

[f(x) + g(y) + sqrt(sqr(f(x)-g(y)) + sqr(delta)) - delta] /2

Reformulating and Approximating the Function MIN

The reformulation of and approximation to the min function is similar to the max case above and will not be repeated in full here. Briefly,

t =e= min(f(x),g(y))

is replaced by:

t =l= f(x)
t =l= g(y)

and is effective as long as the objective maximizes t. If not, this smooth approximation for min(f(x),g(y)) can be used:

[f(x) + g(y) - sqrt(sqr(f(x)-g(y)) + sqr(delta))] /2