Good Coding Practices

The GAMS language is quite flexible regarding the syntax and format of the code it accepts, offering users considerable latitude in how they organize and format their GAMS code. Most modelers develop their own style as they gain experience with the GAMS system. This tutorial reflects the coding preferences of Bruce A. McCarl (currently professor of Agricultural Economics at Texas A&M University). Note that Bruce has extensive experience with GAMS, both as a modeler and an educator, and many GAMS users know, use, and benefit from his work. The goal of this tutorial is not to present a rigid set of rules to follow arbitrarily, but rather to help users develop their own coding preferences and style. The larger goal is to build self-documenting models that are easy to read and understand, to edit, and to debug: both for the developer working in the present, and for a larger group of colleagues and consultants working with the model over a span of months or years.

We will cover the following topics:

Using Longer Names and Descriptive Text

The readability of GAMS code may be significantly improved by using longer self-explanatory names for identifiers (e.g. names of sets, parameters, variables, etc). Consider the following lines of code from the production and inventory model [ROBERT]:

Sets  p     'products'       / low, medium, high /
      r     'raw materials'  / scrap, new /
      tt    'long horizon'   / 1*4 /
      t(tt) 'short horizon'  / 1*3 /;

Table  a(r,p)  input 'coefficients'

          low  medium  high
 scrap      5     3      1
 new        1     2      3;

Table  c(p,t)  'expected profits'

            1    2    3
 low       25   20   10
 medium    50   50   50
 high      75   80  100;

Variables  x(p,tt)  'production and sales'
           s(r,tt)  'opening stocks'
           profit;

 Positive variables x, s;

 Equations  cc(t)    'capacity constraint'
            sb(r,tt) 'stock balance'
            pd       'profit definition' ;

 cc(t)..       sum(p, x(p,t)) =l= m;

 sb(r,tt+1)..  s(r,tt+1) =e= s(r,tt) - sum(p, a(r,p)*x(p,tt));

 pd..          profit =e= sum(t, sum(p, c(p,t)*x(p,t))
                        - sum(r, misc("storage-c",r)*s(r,t)))
                        + sum(r, misc("res-value",r)*s(r,"4"));

 s.up(r,"1") = misc("max-stock",r);

These lines may be reformatted in the following way (see (good.gms)):

Sets process          'available production process'
       / low    'uses a low amount of new materials',
         medium 'uses a medium amount of new materials',
         high   'uses a high amount of new materials' /
     rawmateral        'source of raw materials' / scrap, new /
     Quarters          'long horizon'            / spring, summer, fall, winter /
     quarter(Quarters) 'short horizon'           / spring, summer, fall /

 Table  usage(rawmateral,process)  'input coefficients'
          low  medium  high
 scrap      5     3      1
 new        1     2      3

 Table  expectprof(process,quarters)  'expected profits'
         spring summer fall
 low        25    20    10
 medium     50    50    50
 high       75    80   100;

 Variables  production(process,Quarters)    'production and sales'
            openstock(rawmateral,Quarters)  'opening stocks'
            profit ;

 Positive variables production, openstock;

 Equations  capacity(quarter)               'capacity constarint'
            stockbalan(rawmateral,Quarters) 'stock balance'
            profitacct                      'profit definition' ;

 capacity(quarter)..
        sum(process, production(process,quarter)) =l= mxcapacity;

 stockbalan(rawmateral,Quarters+1)..
      openstock(rawmateral,Quarters+1) =e=
      openstock(rawmateral,Quarters)
      - sum(process, usage(rawmateral,process)
                   *production(process,Quarters));

 profitacct.. profit =e=
   sum(quarter,
       sum(process, expectprof(process,quarter)
                    *production(process,quarter))
     - sum(rawmateral, miscdata("store-cost",rawmateral)*
                       openstock(rawmateral,quarter)))
     + sum(rawmateral, miscdata("endinv-value",rawmateral)
                      *openstock(rawmateral,"winter"));

 openstock.up(rawmateral,"spring") = miscdata("max-stock",rawmateral);

Note that the two formulations are equivalent in their effect, but in the second formulation longer, more descriptive names were used for the sets, tables, variables and equations. In addition, longer names were used for the set elements and in the definition of the set process the set elements have additional explanatory text. Observe that the second formulation is easier to understand. This will be particularly useful if and when the code is revisited in 5 years' time.

Note
  • Recall that GAMS allows long names for identifiers and labels (set elements). Users may exploit this feature to introduce long descriptive names. However, note that names for labels that are longer than 10 characters do not work well in multi-column displays. See the paragraph on customizing display width for details.
  • Use explanatory text for identifiers to indicate units, sources, descriptions, etc. It's not that hard to do and it pays dividends later.
  • Similarly, use explanatory text for set elements as appropriate.

For example, the descriptive text in the in the second line in the following code snippet is much more informative than the text in the first line:

Parameter vehsales(r)   'regional vehicle sales';
Parameter vehsales(r)   'regional vehicle sales ($ millions/yr)';

Note that the descriptive text will be displayed whenever the respective identifier is displayed. Hence, including units in the text will save time if results will have to be interpreted later.

Including Comments on Procedures and the Nature and Sources of Data

We recommend that the documentation of the code offers answers to the following questions:

  • What are the units of the variables and parameters?
  • Where did the data come from?
  • What are the characteristics of the data such as units and year of applicability?
  • Why was a constraint set up in the way it is implemented?

In addition, it is often helpful to add comments that describe assumptions, the intent of equation terms, data sources, including document name, page number, table number, year of applicability, units, URL etc.

Consider the following example where various forms of comments are illustrated:

* this is a one line comment that could describe data
$ontext
My data could be described in this multi-line comment
This is the second line
$offtext

* The following dollar control option activates end-of-line comments
* and redefines the symbol for end-of-line comments
$eolCom #
x = sum(i, z(i)) ; # this is an end-of-line comment

* The following dollar control option activates in-line comments
* and redefines the symbols for in-line comments
$inLineCom (*  *)
x = sum(i, z(i)) ; (* this is an in-line comment *) r = sum(i, z(i)) ;

For more information on comments in GAMS, see section Comments.

Choosing Raw Data Instead Of Computed Data

Modelers often have a choice how they enter data: they could either use raw data and transform it to the extent needed inside GAMS or process data externally and enter the final results into GAMS.

The second choice may be attractive if the raw data is available in a spreadsheet where it can be manipulated before it is introduced to GAMS. However, over time spreadsheets and other data manipulation programs change or get lost and often these programs are not documented well. Therefore, we recommend to enter data into GAMS in a form that is as close as possible to the actual collected data and then manipulate the data with GAMS to obtain the desired form. This will make it much easier to update models later or to work out implicit assumptions.

Avoiding the Universal Set in the Context of Data Input

While GAMS permits using the universal set * as an index in a parameter or table statement, in most cases it is not advisable to do so. Consider the following example from the production and inventory model [ROBERT]:

Sets   r          'raw materials'  / scrap, new / ;

Table  misc(*,r)  'other data'
            scrap  new
 max-stock    400  275
 storage-c     .5    2
 res-value     15   25   ;
...

pd.. profit =e= sum(t, sum(p, c(p,t)*x(p,t))
              - sum(r, misc("storage-c",r)*s(r,t)))
              + sum(r, misc("res-value",r)*s(r,"4"));

Note that the definition of the table misc indicates that any entry in the first index position is allowed. There is no domain checking. Consequently, if the label res-value is misspelled as res-val in the equation pd, GAMS will compile and execute the program without signaling an error, but instead of the expected values (i.e. misc(r,"res-value")), the values of misc(r,"res-val") will be used in the equation. These zero values will lead to faulty results, and the modeler will not be alerted to this fact. To ensure that the results of a GAMS run are reliable and trustworthy, we strongly recommend to use domain checking by introducing a new set for the labels in the first index position of the table misc:

Sets   r         'raw materials'
         / scrap, new /
       miscitem  'misc input items'
         / max-stock, storage-c, res-value /;

Table  misc(miscitem,r)  'other data'
            scrap  new
 max-stock    400  275
 storage-c     .5    2
 res-value     15   25   ;

Observe that the new set miscitem contains exactly the labels that appear in the rows of the table misc. Hence the set miscitem may be used in the first index position of the definition of misc without loss of generality, but with the benefit of domain checking.

Defining Sets and Subsets Wisely

Generally, the elements of a set have a feature in common or they are similar in some way. In this section we will give some guidance on how to partition the labels in the data into sets. In addition, we will discuss in which contexts it is useful to introduce subsets. For an introduction to sets in GAMS, see chapter Set Definition.

For example, suppose we have three grades of oil and three processes to crack it. The question arises whether we should introduce one set with nine elements or two sets with three elements and a two-dimensional set. We recommend the second alternative.

In another example, we consider a budget for farm spending: we have annual (i.e. cumulative yearly) spending for fertilizer and for seed and also monthly spending for labor and for water. There are 26 decisions or items in the budget. We could introduce a set with 26 elements or we could use the following formulation:

Sets resources  / fertilizer, seed, labor, water /
     periods    / jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec, annual /
     use(resources,periods)
                / (fertilizer,seed).annual
                      (labor,water)    .(jan,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec) /;

We recommend the formulation above and to err on the side of being more extensive or exact with set definitions.

Occasionally it is necessary to group some labels into one set for a certain purpose and then single out some of them for another purpose. Subsets facilitate modeling such a case. For example, a set of all cities in a model may be needed to enter distances and compute related transportation costs. In addition, a subset can be used to specify the cities that are hubs for some activity, since some equations should be restricted to these hubs.

Structuring and Formatting Files to Improve Readability

In this section we will offer some guidlines on structuring and formatting the GAMS code to make it easy to read.

There are several ways to structure the GAMS code. Two styles are outlined in section Organization of GAMS Programs. The following recommendation to enter the sections of the code in a fixed order is an extended version of the first style:

  1. Set definitions for sets that are data related
  2. Parameter, scalar and table definitions, possibly intermixed with calculations
  3. Variable definitions
  4. Equation declarations
  5. Equation definitions (algebraic specification of equations)
  6. Model and solve statement(s)
  7. Definitions of sets and parameters for report writing
  8. Calculations for report writing
  9. Display statement(s) for reports

Note that the code will be easiest to navigate if each section of the code contains only one type of statements. For example, interspersing set definitions with parameter definitions will make the code unnecessarily difficult to read.

In addition to following a fixed structure, it is also essential to properly format the code. Of course, formatting is in many respects a matter of taste. The following list offers some ideas:

  • Align the names of identifiers and descriptive text, as demonstrarted in the examples in this tutorial and in the GAMS User's Guide in general.
  • Use spacing and indents.
  • Use blank lines to highlight something and to mark sections of the code.
  • Ensure that variables and all their index positions are on one line in equation definitions.
  • Indent in indexed operations like sums and programming flow control structures like loops and if statements to delineate terms. The structure of a long and complex statement may be revealed through careful indentation and thoughtful placement of closing parentheses.

We will demonstrate the effect of proper formatting with the following two examples. The first example contains valid GAMS code, but is deliberately poorly formatted:

Sets  products  'available production process' / low 'uses low new materials'
medium 'uses medium new materials', high 'uses high new materials'/
rawmateral    'source of raw materials'  / scrap, new /
Quarters    'long horizon'   / spring, summer, fall ,winter /
quarter(Quarters) 'short horizon'  / spring, summer, fall / ;
Variables  production(products,Quarters)  'production and sales' ;
openstock(rawmateral,Quarters)  'opening stocks', profit ;
Positive variables production, openstock;
Equations  capacity(quarter)  'capacity constraint',
stockbalan(rawmateral,Quarters) 'stock balance',
profitacct  profit definition ;
profitacct.. profit =e= sum(quarter, sum(products, expectprof(
products,quarter) *production(products,quarter))-sum(
rawmateral,miscdata("store-cost",rawmateral)*openstock(rawmateral
  ,quarter)))+ sum(rawmateral, miscdata("endinv-value",rawmateral)  *openstock(rawmateral,"winter"));

The second example contains the same code as the first example, but is carefully formatted:

Sets products          'available production process'
       / low    'uses a low amount of new materials',
         medium 'uses a medium amount of new materials',
         high   'uses a high amount of new materials' /
     rawmateral        'source of raw materials' / scrap, new /
     Quarters          'long horizon'            / spring, summer, fall, winter /
     quarter(Quarters) 'short horizon'           / spring, summer, fall / ;

Variables  production(products,Quarters)   'production and sales'
           openstock(rawmateral,Quarters)  'opening stocks'
           profit ;
Positive Variables production, openstock ;

Equations  capacity(quarter)               'capacity constraint'
           stockbalan(rawmateral,Quarters) 'stock balance'
           profitacct                      'profit definition' ;

profitacct..
    profit =e=
    sum(quarter,
       sum(products, expectprof(products,quarter)
                    *production(products,quarter)
           )
      - sum(rawmateral, miscdata("store-cost",rawmateral)*
                       openstock(rawmateral,quarter)
           )
       )
   + sum(rawmateral, miscdata("endinv-value",rawmateral)
                   *openstock(rawmateral,"winter")
       )
  ;

Observe that inserting blank lines, aligning the names of identifiers and descriptive text, and indenting and formatting closing parentheses in the sums makes the code much easier to read and understand (both now and in the future) and is well worth adopting as standard practice when writing GAMS code.

Other Suggestions

We will complete this tutorial by offering some other useful suggestions that may help modelers develop their own conventions.

Even though GAMS is case insensitive, it is advisable to establish some convention on the use of upper and lower case letters. For example, Paul N. Leiby (currently at Oak Ridge National Laboratory) uses lower case for texts and comments, and upper case for GAMS reserved words and variable and parameter names. The casing used when an identifier or label is first encountered in a GAMS program is the casing stored by GAMS and used in subsequent outputs like the listing file or a GDX file. Any casing can be used (so noWhere is equivalent to nowHere) but the casing stored is determined by first use.

A similar situation holds for label quoting: the type of quotes stored (if any) are determined by first use.

Note that the dollar control option $onSymList will cause a list of all identifier names to be displayed in the compilation output of the output file. This list may be used to review the spelling and casing of the identifiers as they will appear in output files. Similarly, the dollar control option $onUELList will cause an ordered list of all labels to be displayed in the compilation output of the output file. This is useful for checking both the case and order of the labels used in the GAMS program. For more on issues related to label ordering, see section Ordered and Unordered Sets.

To keep track of the data types of identifier names, some modelers always start set names with s_, names of parameters with d_ (for "data"), variables names with v_ and equation names with e_.

Some experienced GAMS users always surround explanatory text with quotes: this makes the text stand out, prevents it from being interpreted as a label or identifier, and allows special characters like $, - and & to be used.

If a file is used by several modelers and is updated occasionally, a file modification log at the top of the file will be in order. It should contain the following information: the modification date, version number, modification(s) made and who made the modification. For example, a set called version may be used to keep track of the dates the input files were modified:

Set version(*,*,*,*);
. . .
version("my_file","May","19","2016") = yes;
version("my_include_file","Sep","30","2016") = yes;
. . .
display version;

Note that the display statement will generate a display of all elements of the set version, each indicating on which day a component of the model was modified.