Set Definition

Introduction

Sets are fundamental building blocks in any GAMS model. They allow the model to be succinctly stated and easily read. In this chapter we will introduce how sets are declared and initialized. More advanced set concepts, such as assignments to sets, and lag and lead operations are covered in the chapters Dynamic Sets and Sets as Sequences. The topics discussed in this chapter will be enough to provide a good start on most models. We will introduce simple sets, subsets, multi-dimensional sets, singleton sets and the universal set. The chapter will be concluded by a topic on domain checking, a very important feature of GAMS, and a section about Domain Defining Symbol Declarations.

Simple Sets

Using common mathematical notation, a set $$S$$ that contains the elements $$a$$, $$b$$ and $$c$$ is written as:

$S = \{a,b,c\}$

Using GAMS notation, the same set is defined in the following way:

Set S / a, b, c /;


The set statement begins with the keyword set, S is the name of the set, and its members are a, b, and c. They are labels, but are often referred to as elements or members.

Defining a Simple Set: The Syntax

In general, the syntax for simple sets in GAMS is as follows:

set[s] set_name ["text"] [/element [text] {,element [text]} /]
{,set_name ["text"] [/element [text] {,element [text]} /] } ;


Set[s] is the keyword that indicates that this is a set statement. Set_name is the internal name of the set in GAMS, it is an identifier. The optional explanatory text may be used to describe the set or a specific set element for future reference and to ease readability. The list of set elements is delimited by forward slashes. Element is the name of the set element(s). Note that each element in a set must be separated from other elements by a comma or by an end-of-line, and each element is separated from any associated text by a blank.

Consider the following example from the Egyptian fertilizer model [FERTS], where the set of fertilizer nutrients could be written as

Set cq "nutrients" / N,  P2O5 /;


or as

Set cq "nutrients" / N
P2O5  /;


The order in which the set members are listed is usually not important. However, if the members represent, for example, time periods, then it may be useful to refer to the next or previous member. There are special operations to do this, and they are discussed in chapter Sets as Sequences: Ordered Sets. For now, it is enough to remember that the order in which set elements are specified is not relevant, unless and until some operation implying order is used. At that time, the rules change, and the set becomes what we will later call an ordered set.

Note
• The data statement, i.e. specification of set elements in forward slashes can be omitted. In such cases a set is declared without being defined.
• More than one set may be declared and defined in one set statement. Examples are given in subsection Declarating Multiple Sets below.

Illustrative Examples

Consider the following example based on the model [SHALE]:

Set cf "final products" / syncrude "refined crude (mil bbls)"
lpg      "liquefied petroleum gas (million bbls)"
ammonia  "ammonia (mil tons)"
coke     "coke (mil tons)"
sulfur   "sulfur (mil tons)" /;


The set statement is introduced with the keyword set, the name of the set is cf and the explanatory text "final products" describes the set. The set has five elements with explanatory texts that contain details of the units of measurement.

Usually sets are declared and defined once and then referenced in the model. There are two exceptions: the dollar control option onMulti allows adding more elements later, and dynamic sets. For details on dynamic sets, see chapter Dynamic Sets. The following code sightly varies the previous example to demonstrate the option $onMulti: Set cf "final products" / syncrude "refined crude (mil bbls)" lpg "liquefied petroleum gas (million bbls)" ammonia "ammonia (mil tons)" /$onmulti
Set cf "more final products" / coke     "coke (mil tons)"
sulfur   "sulfur (mil tons)" /;

After $onmulti additional elements are added to the set cf. Note that without the dollar control option $onMulti that would generate an error as per default a symbol can have at most one data statement.

Sequences as Set Elements

The asterisk '*' plays a special role in set definitions. It is used to relieve the tedium of typing a sequence of elements for a set, and to make intent clearer. For example, in a simulation model there might be ten annual time periods from 1991 to 2000. Instead of typing ten years, the elements of this set can be written as:

Set t "time" / 1991 * 2000 /;


This means that the set includes the ten elements 1991, 1992, ..., 2000. GAMS builds up these label lists by looking at the differences between the two labels. If the only characters that differ are digits, with the number L formed by these digits in the left and R in the right, then a label is constructed for every integer in the sequence L to R. Any non-numeric differences or other inconsistencies cause errors.

The following example illustrates the most general form of the 'asterisked' definition:

Set g1 / a1bc * a20bc /;


Note that this is not the same as:

Set g2 / a01bc * a20bc /;


Both sets have 20 members, but they have only 11 members in common.

Lists in decreasing order are also possible:

Set y "years in decreasing order" / 2000 * 1991 /;


As a last example, the following set definitions are both illegal because they are not consistent with the rule given above for making lists:

Set illegal1 / a1x1 * a9x9 /
illegal2 / a1 * b9 /;


Declarating Multiple Sets

The keyword set does not need to be used for each set, rather only at the beginning of a group of sets. It is often convenient to put a group of set declarations (and definitions) together at the beginning of the program. When this is done the set keyword needs only be used once. Those who prefer to intermingle set declarations with other statements, have to use a new set statement for each additional group of sets. Note that the keywords set and sets are equivalent. The following example below shows how two sets can be declared together. Note that the semicolon is used only after the last set is declared.

Sets s "Sectors" / manuf, agri, services, government /
r "Regions" / north, eastcoast, midwest, sunbelt / ;


Using Previously Defined Sets in Set Definitions

The following notation allows previously defined sets to be used in a new set definition:

Set i / i1 * i4 /
j / j6 * j9 /
k / #i, set.j /;


The set k contains all elements of the sets i and j. Note that the hash sign '#' followed by a set name is a shorthand for referring to all the elements in a set. The notation set.set_name works identically and is just a different way to refer to all elements in a previously defined set.

The Alias Statement: Multiple Names for a Set

Sometimes it is necessary to have more than one name for the same set. In input-output models for example, each commodity may be used in the production of all other commodities and it is necessary to have two names for the set of commodities to specify the problem without ambiguity. In the general equilibrium model [ORANI], the set of commodities c is written as

Set c "Commodities" / food, clothing /;


A second name for the set c is established with either of the following statements:

Alias (c, cp) ;
Alias (cp, c) ;


Here cp is the new set name that can be used instead of the original set name c.

Note
The newly introduced set name may be used as an alternative name for the original set; the associated set will always contain the same elements as the original set.

With the alias statement more than one new name may be introduced for the original set:

Alias (c,cp, cpp, cppp);


Here cp, cpp, cppp are all new names for the original set c.

Note
The order of the set names in the alias statement does not matter. The only restriction is that exactly one of the sets in the statement must be defined earlier. All the other sets are introduced by the alias statement.

Typical examples for the usage of aliases are problems where transportation costs between members of one set have to be modeled. The following code snippet is adapted from the Andean fertilizer model [ANDEAN]:

Set i "plant locations" / palmasola, pto-suarez, potosi, baranquill, cartagena /;
Alias(i,ip);

Table tran(i,i) "transport cost for interplant shipments (us$per ton)" palmasola pto-suarez potosi baranquill pto-suarez 87.22 potosi 31.25 55.97 baranquill 89.80 114.56 70.68 cartagena 89.80 114.56 70.68 5.00 ; Parameter mui(i,ip) "transport cost: interplant shipments (us$ per ton)";
mui(i,ip) = (tran(i,ip) + tran(ip,i));


The alias statement introduces ip as another name for the set i. The table tran is two-dimensional and both indices are the set i. The data for the transport cost between the plants is given in this table; note that the transport costs are given only for one direction here, i.e. the costs from pto-suarez to palmasola are explicitly specified in the table while the costs in the opposite direction are not given at all. The parameter mui is also two-dimensional and both indices refer to the set i, but this time the alias ip is used in the second position. The parameter mui is defined with the assignment statement in the next line: mui contains the transport costs from one plant location to the other, in both directions. Note that if mui were defined without the alias, then all its entries would have been zero. For other examples where aliases are used, see sections The Universal Set and Finding Sets from Data below.

Subsets

It is often necessary to define sets whose members must all be members of some larger set. The syntax is:

set set_ident1(set_ident2) ;


Here set is the keyword indicating that this is a set statement, and set_ident1 is a subset of the larger set set_ident2. The larger set is also called superset.

For instance, we may wish to define the sectors in an economic model following the style in [CHENERY].

Set i    "all sectors"        / light-ind, food+agr, heavy-ind, services /
t(i) "traded sectors"     / light-ind, food+agr, heavy-ind /
nt   "non-traded sectors" / services /;


Some types of economic activity, for example exporting and importing, may be logically restricted to a subset of all sectors. In order to model the trade balance we need to know which sectors are traded, and one obvious way is to list them explicitly, as in the definition of the set t above. The specification t(i) means that each member of the set t must also be a member of the set i. GAMS will enforce this relationship, which is called domain checking. Obviously, the order of declaration and definition is important: the membership of i must be known before t is defined, otherwise checking cannot be done.

Note
All elements of the subset must also be elements of the superset.

It is legal but unwise to define a subset without reference to the larger set, as is done above for the set nt. In this case domain checking cannot be performed: if services were misspelled no error would be marked, but the model may give incorrect results. Hence, it is recommended to use domain checking whenever possible. It catches errors and allows to write models that are conceptually cleaner because logical relationships are made explicit.

An alternative way to define elements of a subset is with assignments:

Set i    "all sectors"    / light-ind, food+agr, heavy-ind, services /
t(i) "traded sectors" / light-ind,  heavy-ind /;
t('food+agr') = yes;


In the last line the element food+agr of the set i is assigned to the subset t. Assignments may also be used to remove an element from a subset:

t('light-ind') = no;


Note that yes and no are reserved words in GAMS. Note further that if a subset is assigned to, it then becomes a dynamic set. For more on assignments in GAMS in general, see section The Assignment Statement.

Attention
A subset can be used as a domain in the declaration of other sets, variables, parameters and in equations as long as it is no dynamic set.

This completes the discussion of sets in which the elements are simple. This is sufficient for many GAMS applications. However, there are a variety of problems for which it is useful to have sets that are defined in terms of two or more other sets.

Multi-Dimensional Sets

It is often necessary to provide mappings between elements of different sets. For this purpose, GAMS allows the use of multi-dimensional sets. For the current maximum number of permitted dimensions, see Dimensions. The next two subsections explain how to express one-to-one and many-to-many mappings between sets.

One-to-one Mapping

Consider a set whose elements are pairs: $$A = \{(b,d),\; (a,c),\; (c,e)\}$$. In this set there are three elements and each element consists of a pair of letters. This kind of set is useful in many types of modeling. For example, in the world aluminum model [ALUM] a port has to be associated with a nearby mining region

Set i       "mining regions"     / china, ghana, ee+ussr, s-leone /
n       "ports"              / accra, freetown, leningrad, shanghai /
in(i,n) "mines to ports map" / china  .shanghai
ghana  .accra
s-leone.freetown /;

Here i is the set of mining regions, n is the set of ports and in is a two dimensional set that associates each port with a mining region. The dot between china and shanghai is used to create one such pair. Blanks may be used freely around the dot for readability. The set in has four elements, and each element consists of a region-port pair. The notation (i,n) after the set name in indicates that the first member of each pair must be a member of the set i of mining regions, and that the second must be in the set n of ports. GAMS will domain check the set elements to ensure that all members belong to the appropriate sets.

Many-to-Many Mapping

A many-to-many mapping is needed in certain cases. Consider the following sets:

Set  i         / a, b /
j         / c, d, e /
ij1(i,j)  / a.c, a.d /
ij2(i,j)  / a.c, b.c /
ij3(i,j)  / a.c, b.c, a.d, b.d /;


Here the set ij1 presents a one-to-many mapping where one element of the set i maps onto many elements of the set j. The set ij2 represents a many-to-one mapping where many elements of the set i map onto one element of the set j. The set ij3 is the most general case: a many-to-many mapping where many elements of the set i map to many elements of the set j.

These sets may be written compactly as:

Set  i        / a, b /
j        / c, d, e /
ij1(i,j) / a.(c,d) /
ij2(i,j) / (a,b).c /
ij3(I,j) / (a,b).(c,d) /;


The parenthesis provides a list of elements that is expanded when creating pairs. Note that the dot '.', if used like above, acts as product operator and supports building the Cartesian product of sets.

Attention
When complex sets like this are created, it is important to check that the desired set has been obtained. The checking can for example be done be done by using a display statement.

GAMS provides more notation to define multi-dimensional sets in a succint way. As introduced above the hash sign '#' followed by a set name is a shorthand for referring to all the elements in a set. The matching operator ':' may be used to map ordered sets. This operator is similar to the product operator '.'. However, in this case elements are matched pairwise by mapping elements with the same order number. The examples below demonstrate these concepts.

Set
i         / a, b /
j         / c, d, e /

ij4a(i,j) / a.#j /
ij4b(i,j) / a.c, a.d, a.e /

ij5a(i,j) / #i.#j /
ij5b(i,j) / a.c, a.d, a.e, b.c, b.d, b.e /

ij6a(i,j) / #i:#j /
ij6b(i,j) / a.c, b.d /;


Note that set names that differ only by the last letter denote identical sets. For example, set ij4a is identical to set ij4b. Observe that set i has two elements and set j has three elements, where e is the element with the highest order. Set ij6a is an ordered mapping of all elements of set i to all elements of set j. However, since there is a mismatch in the number of elements, element e is not mapped to.

These concepts may be generalized to sets with higher dimensions. Mathematically, these are called $$3$$-tuples, $$4$$-tuples, or more generally, $$n$$-tuples. Some examples for the compact representation of sets of $$n$$-tuples using combinations of dots, parentheses, and commas are shown in Table 1.

Compact Notation Result
(a,b).c.d a.c.d, b.c.d
(a,b).(c,d) .e a.c.e, b.c.e, a.d.e, b.d.e
(a.1*3).c (a.1, a.2, a.3).c or a.1.c, a.2.c, a.3.c
1*3. 1*3. 1*3 1.1.1, 1.1.2, 1.1.3, ..., 3.3.3

Table 1: Examples for compact representation of multi-dimensional sets

Note that the asterisk may also be used in conjunction with the dot. Recall that the elements of the list 1*4 are $$\{1,2,3,4\}$$.

A powerful and very compact way to define multi-dimensional sets is with a special option that takes an identifier as value and carries out identifier operations like index matching using the matching operator ':'. The following example illustrates the method.

Set i / i1*i4 /
j / j1*j5 /
k / k1,k2 /
h / h1*h3 /;

Set b(i,j,k), c(i,j,k,h);

Option b(i:j,k), c(b:h);
display b, c;


The set b is a three-dimensional set, the option statement specifies which permutations of the elements of i, j, and k are elements of b. The matching operator ':' is between i and j, so we must first match the elements of the sets i and j. That gives us the the first two positions. For the third position we cycle through all elements of the set k. This results in the following elements for the set b:

i1.j1.k1, i1.j1,k2, i2.j2.k1, ... , i4.j4.k2


The set c is a four-dimensional set. Note that the first three dimensions are identical to the domain of the set b. The option statement specifies that in the first three positions we will have elements of the set b and and these are matched with the elements of the set h which are in the fourth position. Now, the set h has only three elements, so only the first three elements of the set b are matched with the members of the set h. This results in the following set:

i1.j1.k1.h1, i1.j1.k2.h2, i2.j2.k1.h3


As recommended above, it is important to always check whether the multi-dimensional sets generated with compact statement like these are indeed the sets that were intended.

For more sophisticated examples of how to use the matching operator within an option statement please see section Index Matching .

The Table Format for Multi-Dimensional Sets

An alternative way to declare multi-dimensional sets is with tables. We show by example how tables may be used in the context of set definitions:

Set origins      / Berlin, Paris /
destinations / London, Chicago, Budapest /
/ Berlin.London, Berlin.Budapest,
Paris .London, Paris .Budapest /;

London       Chicago     Budapest
Berlin       yes          no           yes
Paris        yes                       yes  ;


The set linked_1 is a two-dimensional set that is defined with the dot notation introduced above. The set linked_2 is the same set defined using the table notation: the keyword set is followed by the keyword table and the name of the set with its domain. The table itself consists of the elements of the first index in the first column, the elements of the second index in the first row, and the data in the grid positions. Note that the keyword yes indicates that a label combination is part of the two-dimensional set and the keyword no or a blank indicates that the label combination is not contained in the new set. Please see section Tables for detailed requirements for inputting data in the table format.

Alternatively, the multi-dimensional set may be declared first without any elements, and the elements are added later in a separate table statement:

Set   origins      / Berlin, Paris /
destinations / London, Chicago, Budapest /
London       Chicago     Budapest
Berlin       yes          no           yes
Paris        yes                       yes;


Instead of the keywords yes and no users may also use numbers to specify membership in the two-dimensional set: nonzero numeric entries mean that a label combination is part of the set and zero or a blank indicates that the label combination is not contained in the set.

Projection and Aggregation of Sets

In GAMS, projection and aggregation operations on sets can be performed in two different ways: with an option statement and with an assignment. For a detailed discussion, see section Projection and Aggregation of Sets and Parameters.

Singleton Sets

A singleton set in GAMS is a special set that has at most one element (zero elements are allowed as well). Like other sets, singleton sets may have a domain with several dimensions. For the current maximum number of permitted dimensions, see Dimensions. Singleton sets are declared and defined with the keyword singleton that acts as a modifier to the keyword set:

Set            i      / a, b, c /;
Singleton Set  j      / d       /
k(i)   / b       /
l(i,i) / b.c     /;


The sets j, k and l are declared as singleton sets, each of them has just one element. The set k is a subset of the set i and the set l is a two-dimensional set.

Note that a data statement for a singleton set with more than one element will create a compilation error:

   1  Singleton Set s / s1*s3 /;
****                          $844 2 display s; Error Messages 844 Singleton with more than one entry (see$onStrictSingleton)


It also possible to assign an element to a singleton set. In this case the singleton set is automatically cleared of the previous element first. For example, adding the following line to the code above will result in set k containing only element a after execution:

k('a') = yes;


The dollar control option offStrictSingleton may be used to allow sets that are declared as singleton sets to have more than one element in compile time definitions. However, in this case only the first listed element is a valid element of the set. Note that the value of zero for the command line parameter strictSingleton has the same effect for execution time definitions of singleton sets via assignment statements.

For more on dollar control options, see chapter Dollar Control Options. For more on GAMS command line parameters,see chapter The GAMS Call and Command Line Parameters. For more on compilation errors, see section Compilation Errors.

Singleton sets can be especially useful in assignment statements since they do not need to be controlled by a controlling index or an indexed operator like other sets. Consider the following examples:

Set            i      / a, b, c /;
Singleton Set  k(i)   / b /
h(i)   / a /;
Parameter      n(i)   / a 2, b 3, c 5 /;
Scalar z1, z2;

z1 = n(k);
z2 = n(k) + 100*n(h);


The singleton sets k and h are both subsets of the set i. The parameter n is defined over the set i. The scalar z1 is assigned a value of the parameter n without naming the respective label explicitly in the assignment. It is already specified in the definition of the singleton set k. The assignment statement for the scalar z2 contains an expression where the singleton sets k and h are referenced without a controlling index or an indexed operation.

Note
Singleton sets cannot be used as domains.

The Universal Set: * as Set Identifier

GAMS provides the universal set denoted by '*' for cases where the user wishes not to specify an index but have only a placeholder for it. The following examples show two ways how the universal set is introduced in a model. We will discuss the advantages and disadvantages of using the universal set later. The first example is from the production and inventory model [ROBERT]:

Sets   r          "raw materials"  / scrap, new /;
Table  misc(*,r)  "other data"
scrap  new
max-stock    400  275
storage-c     .5    2
res-value     15   25;


A table is an input format for the data type parameter and has at least two dimensions. For details see section Tables. In our example, the first index is the universal set '*' and the second index is the previously defined set r. Since the first index is the universal set any entry whatsoever is allowed in this position. In the second position elements of the set r must appear, they are domain checked, as usual.

The second example illustrates how the universal set is introduced in a model with an alias statement.

Alias (new_universe,*);
Set k(new_universe)    / Chicago / ;


The alias statement links the universal set with the set name new_universe. Set k is a subset of the universal set and Chicago is declared to be an element of k. Any item may be added freely to k.

The universal set is particularly useful for generating reports, since it allows the use of any labels without having to define special sets for them. For an example, see section Set Attributes below. For more on report writing, see chapter The Put Writing Facility.

Attention
It is recommended to not use the universal set for data input, since there is no domain checking and thus typos will not be detected and data that the user intends to be in the model might actually not be part of it.

Observe that in GAMS a simple set is always regarded as a subset of the universal set. Thus the set definition

Set i / i1*i10 /;


is the same as

Set i(*)  / i1*i10 /;


GAMS follows the concept of a domain tree for domains in GAMS. It is assumed that a set and its subset are connected by an arc where the two sets are nodes. Now consider the following one dimensional subsets:

Set i, ii(i), j(i), jj(j), jjj(jj);


These subsets are connected with arcs to the set i and thus form a domain tree that is rooted in the universe node '*'. This particular domain tree may be represented as follows:

* - i - ii
|
- j - jj - jjj


Note that with the construct i(jjj) we may access ii iterating through the members of jjj. For an example, see domain tree in the loop statement.

Observe that the universal set is assumed to be ordered and operators for ordered sets such ord, lag and lead may be applied to any sets aliased with the universal set.

Set and Set Element Referencing

Sets or set elements are referenced in many contexts, including assignments, calculations, equation definitions and loops. Usually GAMS statements refer to the whole set or a single set element. In addition, GAMS provides several ways to refer to more than one, but not all elements of a set. In the following subsections we will show by example how this is done. GAMS also has set functions that specifically reference sets and introduced in the chapter about logical conditions.

Referencing the Whole Set

Most commonly whole sets are referenced as in the following examples:

Set i / i1*i100 /;
Parameter k(i);
k(i) = 4;
Scalar z;
z = sum(i, k(i));


The parameter k is declared over the set i, in the assignment statement in the next line all elements of the set i are assigned the value 4. The scalar z is defined to be the sum of all values of the parameter k(i).

Referencing a Single Element

Sometimes it is necessary to refer to specific set elements. This is done by using single or double quotes around the label(s). We may add the following line to the example above:

k('i77') = 15;


This statement changes the value of k('i77') to 15, all the other values of k remain 4.

Referencing a Part of a Set

There are multiple ways to restrict the domain to more than one element, e.g. subsets, conditionals and tuples. Suppose we want the parameter k from the example above to be assigned the value 10 for the first 8 elements of the set i. The following two lines of code illustrate how easily this may be accomplished with a subset:

Set j(i) / i1*i8 /;
k(j) = 10;


First we define the set j to be a subset of the set i with exactly the elements we are interested in. Then we assign the new value to the elements of this subset. The other values of the parameter k remain unchanged. For examples using conditionals and tuples, see sections Restricting the Domain: Conditionals and Restricting the Domain: Tuples respectively.

Set Attributes

A GAMS set element has several numbers attached to it. These values are called attributes; they may be recovered during execution. The attributes are listed in Table 3.

Set Attribute Symbol Description
Position .pos Element position in the current set (set does not have to be ordered), starting with 1.
Ord .ord Same as .pos but for ordered sets only.
Offset .off Element position in the current set minus 1. So .off = .pos - 1 (set does not have to be ordered).
Reverse .rev Reverse element position in the current set, so the value for the last element is 0, the value for the penultimate is 1, etc. (set does not have to be ordered)
Unique Element List .uel Element position in the unique element list. For details see section Ordered and Unordered Sets
Length .len Length of the set element name (a count of the number of characters).
Value .val If a set element is a number, this attributes gives the value of the number. For extended range arithmetic symbols, the symbols are reproduced. If a set element is a string that is not a number, then this attribute is not defined and trying to use it results in an error.
First set element .first Returns 1 for the first set element, otherwise 0.
Last set element .last Returns 1 for the last set element, otherwise 0.

Table 3: Set Attributes

The attributes may be accessed with an assignment statement:

data(set_name) = set_name.attribute ;


Here data is a parameter, set_name is the name of the set and .attribute is one of the attributes listed above. The following example serves as illustration:

Set       id            "example set"            / Madison, tea-time, '-inf', '-7', '13.14'/;
Parameter report(id,*)  "set attribute values";

report(id,'position') = id.pos ;
report(id,'reverse')  = id.rev ;
report(id,'offset')   = id.off ;
report(id,'length')   = id.len ;
report(id,'first')    = id.first;
report(id,'last')     = id.last ;

display report;


The parameter report is declared to have two dimensions with the set id in the first position and the universal set in the second position. In the following six statements the values of report are defined for six entries of the universal set. Note how the flexibility of the universal set is used here to make reporting easy. The display statement generates the output that follows.

----     11 PARAMETER report  set attribute values

position     reverse      offset      length       first        last

tea-time       2.000       3.000       1.000       8.000
-inf           3.000       2.000       2.000       4.000
-7             4.000       1.000       3.000       2.000
13.14          5.000                   4.000       5.000                   1.000

Finding Sets from Data

Sometimes it is desirable to find a set from the available data in order to use it later in the model. We will show by example how this may be accomplished using the alias statement, the universal set and conditionals. Suppose we have only the data related to the transportation model [TRNSPORT] and we want to identify the sets. We can tell from the data that there are two sets that we are interested in. First, we define these two sets as aliases of the universal set, which means that no elements are specified:

Alias(sources, places, *);


Then we enter the data that contain an indicator of which set elements are valid entries in the sets to be computed. We use the table format.

Table trandata (sources,places)    "data from spreadsheet"
Newyork     Chicago    totalsupply
Seattle          2.5          1.7         350
Sandiego         2.5          1.8         300
totalneed        325           75                               ;


Next we define subsets that we will need in the calculations that follow:

Set  source(sources)       "sources in spreadsheet data"


Now we have everything that we need to do the calculation using the data on hand. In our case, a label qualifies as an element of the set source if it has an entry for totalsupply in the table above, and a label is an element of the set destination if it has an entry for totalneed in the table trandata:

source(sources)$(trandata(sources,"totalsupply")) = yes; destination(places)$(trandata("totalneed", places )) =  yes;


These conditional assignments define the elements of the sets source and destination. From this point on these sets may be used in the model. However, note that the resulting sets are dynamic sets. Hence they cannot be used as domains in declaration statements of other sets, parameters, variables and equations. But they may be referenced and used in equation definitions.

Such computations may for example be useful if the user gets a data table from elsewhere and needs to specify the sets. Alternatively, if the data is available in gdx format, the dollar control option load provides functionality to project sets from data contained in a GDX file.

Domain Checking

The GAMS compiler performs a check to ensure that each label quoted as a member of a set is indeed an element of the respective set, and each element defined in a subset is in fact a member of the superset. This screening for consistency is called domain checking. It is done whenever a domain is referenced, be it in set, variable, parameter or equation declarations and definitions, or in assignments. The following examples serve as illustration.

Set   i     "all cities"      / Lima, Toronto, Wuhan, Shanghai /
as(i) "Asian cities"    / Wuhan, Shanhai, Calcutta /
am    "American cities" / Lima, Toront /;


The set as is declared to be a subset of the set i, therefore domain checking will test every label for inconsistencies. It will catch two errors: there is a typo in Shanhai and Calcutta is not a member of the set i, so it cannot legally be a member of a subset.

   1  Set   i     "all cities"      / Lima, Toronto, Wuhan, Shanghai /
2        as(i) "Asian cities"    / Wuhan, Shanhai, Calcutta /
****                                               $170$170
**** 170  Domain violation for element
3        am    "American cities" / Lima, Toront /;


The user can rectify the spelling error, and either delete Calcutta from the subset as or add it to the superset i. The following line will pass domain checking:

Set   as(i) "Asian cities"    / Wuhan, Shanghai /;


Note that am is not declared as a subset of the set i even though it apparently should contain cities contained in i. Hence, am cannot be domain checked and the typo in Toront will go undetected. This has consequences for the next line:

Parameter pam(am) "population in millions"  / Lima 8.9, Toronto 5.6 /;


In this parameter definition the domain of the parameter pam is the set am. GAMS will report an error here, since domain checking does not recognize the label Toronto. Toront, as specified in the definition of the set am above would be accepted.

A further example for domain checking concerns multi-dimensional domains where the user accidentally switches the positions of the indices:

Parameter  h(as,am) / Wuhan.Lima  10, Wuhan.Toronto 12, Shanghai.Lima 7/;
Parameter  d(as,am);
d(as,am) = 5*h(am,as) + 78;


Observe that we assume that the typo in the label Toronto has been rectified. The parameter h is defined over the domain (as,am). However, in the assignment statement in the last line above, it is referenced with the domain (am,as). This mistake is caught by domain checking and an error is reported.

As we have seen in the definition of the set am above, domain checking is not compulsory. If the following statement is entered, GAMS makes no assumptions about rho until further information is provided.

set       t   years         / 1988 * 1995 /;
Parameter rho discount rate ;


The modeler may later choose to domain check rho by continuing the definition with the following line:

Parameter rho(t)  / 1988 0.07, 1989*1994 0.10, 1995 0.09 /;


Alternatively, the modeler may choose not to domain check the paramter rho, as is shown in the deliberately nonsensical (but legal) statement that follows:

Parameter rho / 1988.January 0.07, strategy-1.cost 44, cat.capacity 99 /;


If a parameter is not domain-checked, the only restriction is that the dimensionality must be constant. Once the number of labels per data item has been established it is frozen; to refer to the parameter differently is an error.

Note
Domain checking is automatic; it is only suppressed in two cases:
1. The index is the universal set or a set aliased to the universal set, see the examples above.
2. The dollar control option $onWarning is used. It has the effect that warnings rather than errors are reported for domain violations. We urge modelers to use domain checking whenever possible. It catches errors and allows users to write models that are conceptually cleaner because logical relationships are made explicit. Note that the dollar control option$load is available in several variations to enable domain checking when loading data from a GDX file. For details, see $loadDC,$loadDCM and $loadDCR and chapter GAMS Data eXchange (GDX). Implicit Set Definition (or: Domain Defining Symbol Declarations) As seen above, sets can be defined through data statements in the declaration. Alternatively, sets can be defined implicitly through data statements of other symbols which use these sets as domains. This is illustrated in the following example, which is derived from the [TRNSPORT] model: Set i 'canning plants' j 'markets'; Table d(i<,j<) 'distance in thousands of miles' new-york chicago topeka seattle 2.5 1.7 1.8 san-diego 2.5 1.8 1.4; Display i,j;  Note the < signs in the domain list of the declaration of d (d(i<,j<)). These signal, that the set i will contain all elements which define the first dimension of symbol d and that the set j will contain all elements which define the second dimension of symbol d, respectively. So, this is the output of the Display statement at the end: ---- 10 SET i canning plants seattle , san-diego ---- 10 SET j markets new-york, chicago , topeka  This syntax is not limited to the table statement, but can be used with any symbol declaration. Also, one domain set can be defined through multiple symbols using the same domain, when using the dollar control option onMulti: Set food fruits(food<) / apple, orange /$onMulti
vegetable(food<) / carrot, cauliflower /
meat(food<)      / beef, pork          /;

Display food;


This is the output of the Display statement:

----      8 SET food

apple      ,    orange     ,    carrot     ,    cauliflower,    beef       ,    pork

Note
If the < sign is used to mark a declaration as "domain defining", this attribute is not limited to the data statement following this declaration, but also influences other ways to define data at compile time like the dollar control option load, as shown in the following example:
Set
i 'canning plants'
j 'markets';

Parameter d(i<,j<) 'distance in thousands of miles';

$gdxIn data.gdx$load d

Attention
Only non-zero elements in a symbol will add elements to an implicitly defined set. This is illustrated in the following two examples.
Set
i 'canning plants'
j 'markets';

Table d(i<,j<) 'distance in thousands of miles'
new-york  chicago  topeka
seattle         2.5              1.8
san-diego       2.5              1.4;

Display i,j;


Note the empty column for chicago. Since there is no data, chicago will not end up in the set j, which can be seen in the output of the Display statement:

----     10 SET j  markets

new-york,    topeka


Also, an explicit 0 in a data statement does not add elements to an an implicitly defined set (in contrast to an eps). This is shown in the following GAMS code and output:

Set
j 'markets';

Parameter
b(j<) 'demand at market j in cases'
/ new-york   325
chicago      0
topeka     eps /;

Display j;

----     10 SET j  markets

new-york,    topeka


In GAMS, a simple set consists of a set name and the elements of the set. Both the name and the elements may have associated text that explains the name or the elements in more detail. More complex sets have elements that are pairs or even $$n$$-tuples. These sets with pairs and $$n$$-tuples are ideal for establishing relationships between the elements in different sets. GAMS also uses a domain checking capability to help catch labeling inconsistencies and typographical errors made during the definition of related sets.