Table of Contents
Concept
GAMS Connect is a framework inspired by the concept of a so-called ETL (extract, transform, load) procedure that allows to integrate data from various data sources. The GAMS Connect framework consists of the Connect database and the Connect agents that operate on the Connect database. Via the available Connect interfaces the user passes instructions to call Connect agents for reading data from various file types into the Connect database, transforming data in the Connect database, and writing data from the Connect database to various file types. Instructions are passed in YAML syntax. Note that in contrast to a typical ETL procedure, read, transform and write operations do not need to be strictly separated.

Usage
GAMS Connect is available via the GAMS command line parameters ConnectIn and ConnectOut, via embedded code Connect, and as a standalone command line utility gamsconnect.
Instructions processed by the GAMS Connect interfaces need to be passed in YAML syntax as follows:
- <agent name1>: <root option1>: <value> <root option2>: <value> . . <root option3>: - <option1>: <value> <option2>: <value> . . - <option1>: <value> <option2>: <value> . . . . - <agent name2>: . . . .
The user lists the tasks to be performed successively. All tasks begin at the same indentation level starting with a -
(a dash and a space) followed by the Connect agent name and a :
(a colon). Connect agent options are represented in a simple <option>: <value>
form. Please check the documentation of Connect Agents for available options. Options at the first indentation level are called root
options and typically define general settings, e.g. the file name. While some agents only have root
options, others have a more complex options structure, where a root option may be a list of dictionaries containing other options. A common example is the root option symbols
(see e.g. GDXReader). Via symbols
many agents allow to define symbol specific options, e.g. the name of the symbol. The option tables of agents with a more complex options structure provide a Scope to reflect this structure - options may be allowed at the first indentation level (root
) and/or are assigned to other root options (e.g. symbols
).
Note that YAML syntax also supports an abbreviated form for lists and dictionary, e.g. <root option3>: [ {<option1>: <value>, <option2>: <value>}, {<option1>: <value>, <option2>: <value>} ]
.
Here is an example that uses embedded Connect code to process instructions:
$onecho > distance.csv
i;j;distance in miles
seattle;new-york;2,5
seattle;chicago;1,7
seattle;topeka;1,8
san-diego;new-york;2,5
san-diego;chicago;1,8
san-diego;topeka;1,4
$offecho
$onecho > capacity.csv
i,capacity in cases
seattle,350.0
san-diego,600.0
$offecho
Set i 'Suppliers', j 'Markets';
Parameter d(i<,j<) 'Distance', a(i) 'Capacity';
$onEmbeddedCode Connect:
- CSVReader:
file: distance.csv
name: distance
indexColumns: [1, 2]
valueColumns: [3]
fieldSeparator: ';'
decimalSeparator: ','
- CSVReader:
file: capacity.csv
name: capacity
indexColumns: [1]
valueColumns: [2]
- GAMSWriter:
symbols:
- name: distance
newName: d
- name: capacity
newName: a
$offEmbeddedCode
display i, j, d, a;
In this example, we are reading two CSV files distance.csv
and capacity.csv
using the CSVReader. Then we directly write to symbols in GAMS using the GAMSWriter.
Note that even though GAMS is case insensitive, GAMS Connect is case sensitive, i.e., YAML instructions are treated case sensitive. This also includes, e.g., indices in CSV files. Consider the following example where the index j2
should be substituted by ABC
when reading the CSV file y.csv
:
$onecho > y.csv
i1,j1,2.5
i1,J2,1.7
i2,j1,1.8
i2,j2,1.4
$offecho
set i,j;
parameter p(i<,j<);
$onEmbeddedCode Connect:
- CSVReader:
file: y.csv
name: p
indexColumns: [1,2]
valueColumns: [3]
header: false
indexSubstitutions: { j2: ABC }
- GAMSWriter:
writeAll: true
$offEmbeddedCode
display i,j,p;
Since the YAML instructions are treated case sensitive, the index J2
will not be substituted.
j1 J2 ABC i1 2.500 1.700 i2 1.800 1.400
An exception are symbol names in the Connect database. Creating or accessing symbols, e.g. via the name
option of many agents, is case insensitive. All instructions provided to the Connect framework are read using UTF-8
encoding (utf-8-sig
). This can be customized by adding a comment in the format # coding=<encoding name>
or # -*- coding: <encoding name> -*-
as first line in the YAML code. Note that UTF-16
encoding is not supported.
Connect Agents Summary
Current Connect agents support the following data source formats: CSV, Excel, GDX and SQL. The following Connect agents are available:
Connect agent | Description | Supported symbol types |
---|---|---|
Concatenate | Allows concatenating multiple symbols in the Connect database. | Sets and parameters |
CSVReader | Allows reading a symbol from a specified CSV file into the Connect database. | Sets and parameters |
CSVWriter | Allows writing a symbol in the Connect database to a specified CSV file. | Sets and parameters |
DomainWriter | Allows rewriting the domain information of an existing Connect symbol. | Sets, parameters, variables, and equations |
Filter | Allows to reduce symbol data by applying filters on labels and numerical values. | Sets, parameters, variables, and equations |
GAMSReader | Allows reading symbols from the GAMS database into the Connect database. | Sets, parameters, variables, and equations |
GAMSWriter | Allows writing symbols in the Connect database to the GAMS database. | Sets, parameters, variables, and equations |
GDXReader | Allows reading symbols from a specified GDX file into the Connect database. | Sets, parameters, variables, and equations |
GDXWriter | Allows writing symbols in the Connect database to a specified GDX file. | Sets, parameters, variables, and equations |
LabelManipulator | Allows to modify labels of symbols in the Connect database. | Sets, parameters, variables, and equations |
Options | Allows to set more general options that can affect the Connect database and other Connect agents. | - |
PandasExcelReader | Allows reading symbols from a specified Excel file into the Connect database. | Sets and parameters |
PandasExcelWriter | Allows writing symbols in the Connect database to a specified Excel file. | Sets and parameters |
Projection | Allows index reordering and projection onto a reduced index space of a GAMS symbol. | Sets, parameters, variables, and equations |
PythonCode | Allows executing arbitrary Python code. | - |
RawCSVReader | Allows reading unstructured data from a specified CSV file into the Connect database. | - |
RawExcelReader | Allows reading unstructured data from a specified Excel file into the Connect database. | - |
SQLReader | Allows reading symbols from a specified SQL database into the Connect database. | Sets and parameters |
SQLWriter | Allows writing symbols in the Connect database to a specified SQL database. | Sets and parameters |
Getting Started
We introduce the basic functionalities of GAMS Connect agents on some simple examples. For more examples see section Examples.
Simple Connect Example for CSV
The following example (a modified version of the trnsport model) shows how to read and write CSV files. The full example is part of DataLib as model connect03. Here is a code snippet of the first lines:
$onEcho > distance.csv
i,new-york,chicago,topeka
seattle,2.5,1.7,1.8
san-diego,2.5,1.8,1.4
$offEcho
$onEcho > capacity.csv
i,capacity
seattle,350
san-diego,600
$offEcho
$onEcho > demand.csv
j,demand
new-york,325
chicago,300
topeka,275
$offEcho
Set i 'canning plants', j 'markets';
Parameter d(i<,j<) 'Distance', a(i) 'Capacity', b(j) 'Demand';
$onEmbeddedCode Connect:
- CSVReader:
file: distance.csv
name: d
indexColumns: 1
valueColumns: "2:lastCol"
- CSVReader:
file: capacity.csv
name: a
indexColumns: 1
valueColumns: 2
- CSVReader:
file: demand.csv
name: b
indexColumns: 1
valueColumns: 2
- GAMSWriter:
writeAll: True
$offEmbeddedCode
[...]
It starts out with the declaration of sets and parameters. With compile-time embedded Connect code, data for the parameters are read from CSV files using the Connect agent CSVReader. The CSVReader agent, for example, reads the CSV file distance.csv
and creates the parameter d
in the Connect database. The name of the parameter must be given by the option name. Column number 1 is specified as the first domain set using option indexColumns. The valueColumns option is used to specify the column numbers 2, 3 and 4 containing the values. Per default, the first row of the columns specified via valueColumns
will be used as the second domain set. The symbolic constant lastCol
can be used if the number of index or value columns is unknown. As a last step, all symbols from the Connect database are written to the GAMS database using the Connect agent GAMSWriter. The GAMSWriter agent makes the parameters d
, a
and b
available outside the embedded Connect code. Note that the sets i
and j
are defined implicitly through parameter d
.
Finally, after solving the transport
model, Connect can be used to export results to a CSV file:
[...]
Model transport / all /;
solve transport using lp minimizing z;
EmbeddedCode Connect:
- GAMSReader:
symbols:
- name: x
- Projection:
name: x.l(i,j)
newName: x_level(i,j)
- CSVWriter:
file: shipment_quantities.csv
name: x_level
unstack: True
endEmbeddedCode
This time, we need to use execution-time embedded Connect code. The Connect agent GAMSReader imports variable x
into the Connect database. With the Connect agent CSVWriter we write the variable level to the CSV file shipment_quantities.csv
:
i_0,new-york,chicago,topeka seattle,50.0,300.0,0.0 san-diego,275.0,0.0,275.0
Setting the option unstack to True
allows to use the last dimension as the header row.
Simple Connect Example for Excel
The following example is part of GAMS Model Library as model cta and shows how to read and write Excel spreadsheets. Here is a code snippet of the first lines:
Set
i 'rows'
j 'columns'
k 'planes';
Parameter
dat(k<,i<,j<) 'unprotected data table'
pro(k,i,j) 'information sensitive cells';
* extract data from Excel workbook
$onEmbeddedCode Connect:
- PandasExcelReader:
file: cox3.xlsx
symbols:
- name: dat
range: Sheet1!A1
rowDimension: 2
columnDimension: 1
- name: pro
range: Sheet2!A1
rowDimension: 2
columnDimension: 1
- GAMSWriter:
writeAll: True
$offEmbeddedCode
[...]
It starts out with the declaration of sets and parameters. With compile-time embedded Connect code, data for the parameters are read from the Excel file cox3.xlsx
using the Connect agent PandasExcelReader. The PandasExcelReader agent allows reading data for multiple symbols that are listed under the keyword symbols
, here, parameter dat
and pro
. For each symbol, the symbol name is given by option name and the Excel range by option range. The option rowDimension defines that the first 2 columns of the data range will be used for the labels. In addition, the option columnDimension defines that the first row of the data range will be used for the labels. As a last step, all symbols from the Connect database are written to the GAMS database using the Connect agent GAMSWriter. The GAMSWriter agent makes the parameters dat
and pro
available outside the embedded Connect code. Note that the sets i
, j
and k
are defined implicitly through parameter dat
.
Finally, after solving the cox3c
model with alternative solutions, Connect can be used to export results to Excel:
[...]
loop(l$((obj.l - best)/best <= 0.01),
ll(l) = yes;
binrep(s,l) = round(b.l(s));
binrep('','','Obj',l) = obj.l;
binrep('','','mSec',l) = cox3c.resUsd*1000;
binrep('','','nodes',l) = cox3c.nodUsd;
binrep('Comp','Cells','Adjusted',l) = sum((i,j,k)$(not s(i,j,k)), 1$round(adjn.l(i,j,k) + adjp.l(i,j,k)));
solve cox3c min obj using mip;
);
embeddedCode Connect:
- GAMSReader:
symbols:
- name: binrep
- PandasExcelWriter:
file: results.xlsx
symbols:
- name: binrep
range: binrep!A1
endEmbeddedCode
This time, we need to use execution-time embedded Connect code. The Connect agent GAMSReader imports the reporting parameter binrep
into the Connect database. With the Connect agent PandasExcelWriter we write the parameter into the binrep
sheet of the Excel file results.xlsx
.
Simple Connect Example for SQL
The following example (a modified version of the whouse model) shows how to read from and write to a SQL database (sqlite). The full example is part of DataLib as model connect04. Here is a code snippet of the first lines:
[...]
Set t 'time in quarters';
Parameter
price(t) 'selling price ($ per unit)'
istock(t) 'initial stock (units)';
Scalar
storecost 'storage cost ($ per quarter per unit)'
storecap 'stocking capacity of warehouse (units)';
$onEmbeddedCode Connect:
- SQLReader:
connection: {"database": "whouse.db"}
symbols:
- name: t
query: "SELECT * FROM timeTable;"
type: set
- name: price
query: "SELECT * FROM priceTable;"
- name: istock
query: "SELECT * FROM iniStockTable;"
- name: storecost
query: "SELECT * FROM storeCostTable;"
- name: storecap
query: "SELECT * FROM storeCapTable;"
- GAMSWriter:
writeAll: True
$offEmbeddedCode
[...]
It starts out with the declaration of sets and parameters. With compile-time embedded Connect code, data for all the symbols are read from the sqlite database whouse.db
using the Connect agent SQLReader by passing the connection url through the option connection. The SQLReader agent, for example, queries the table priceTable
for data and creates the parameter price
in the Connect database. The SQLReader allows reading data for multiple symbols that are listed under the keyword symbols
and are fetched through the same connection. For each symbol the name must be given by the option name. The SQL query statement is passed through the option query. The symbol type can be specified using the option type. By default, every symbol is treated as a GAMS parameter. As a last step, all symbols from the Connect database are written to the GAMS database using the Connect agent GAMSWriter. The GAMSWriter agent makes all read in symbols available outside the embedded Connect code.
Further, after solving the warehouse model, Connect can be used to export the results to tables in the SQL database.
[...]
Model swp 'simple warehouse problem' / all /;
solve swp minimizing cost using lp;
EmbeddedCode Connect:
- GAMSReader:
readAll: True
- Projection:
name: stock.l(t)
newName: stock_level(t)
- Projection:
name: sell.l(t)
newName: sell_level(t)
- Projection:
name: buy.l(t)
newName: buy_level(t)
- SQLWriter:
connection: {"database": "whouse.db"}
ifExists: replace
symbols:
- name: stock_level
tableName: stock_level
- name: sell_level
tableName: sell_level
- name: buy_level
tableName: buy_level
endEmbeddedCode
Here, we need to use execution-time embedded Connect code. The Connect agent GAMSReader imports all the variables into the Connect database. The SQLWriter agent then writes each symbol to respective tables in the SQL database whouse.db
. For example the stock level:
|t_0 |level | |:-------|:---------| |q-1 |100.0 | |q-2 |0.0 | |q-3 |0.0 | |q-4 |0.0 |
The ifExists option allows to either append to an extending table or replace it with new data. By default, the value for ifExists
is set to fails
.
Connect Agents
Concatenate
The Concatenate agent allows concatenating multiple symbols (sets or parameters) in the Connect database into a single symbol of the same type. It takes the union of domain sets of all concatenated symbols and uses that as the domain for the output symbol. There are several options to guide this domain finding process which are explained below. The general idea is best explained with an example. Consider three parameters p1(i,j)
, p2(k,i)
, and p3(k,l)
. The union of all domain sets is i
, j
, k
, and l
and, hence, the output symbol will be parameterOutput(symbols,i,j,k,l)
. The very first index of parameterOutput
contains the name of the concatenated symbol followed by the domain sets. If a domain set is not used by a concatenated symbol the corresponding records in parameterOutput
will feature the emptyUel, a -
(dash) by default, as the following figures show:


The Concatenate agent is especially useful in combination with UI components that provide a pivot table, like GAMS MIRO, to represent many individual output symbols in a single powerful and configurable table format.
Obviously, there are more complex situations with respect to the domain of the resulting parameterOutput
. For example, only a subset of domain sets are relevant and the remaining ones should be combined in as few index positions as possible. For this, assume only domain sets i
and k
from the above example are relevant and j
and l
can be combined in a single index position - a so-called universal domain. The resulting parameterOutput
would look as follows:

Moreover, the Concatenate agent needs to deal with universe domain sets *
and domain sets that are used multiple times in a concatenated symbol. In addition to the symbols
index (always the first index position of the output symbol), by default the union of domain sets of the concatenated symbols determine the domain of the output symbol. If a domain set (including the universe *
) appears multiple times in a concatenated symbol domain, these duplicates will be part of the output symbol domain. For example, q1(*,i,j,*)
and q2(*,i,i)
will result in the output symbol parameterOutput(symbols,*,i,j,*,i,)
by default, mapping index positions 1 to 4 of q1
to positions 2 to 5 of parameterOutput
and index positions 1 to 3 of q2
to 2, 3, and 6.
All the described situations can be configured with a few options of the agent. The option outputDimensions allows to control the domain of the output symbol. The default behavior (outputDimension: all
) gets the domain sets from the concatenated symbols and builds the union with duplicates if required. Alternatively, outputDimensions
can be a list of the relevant domain sets (including an empty list). In any case, the agent iterates through the concatenated symbols and maps the index positions of a concatenated symbol into the index positions of the output symbol using the domain set names. Names not present in outputDimensions
will be added as universal domains. Per default, the domain set names of a concatenated symbol will be the original domain set names as stored in the Connect database. There are two ways to adjust the domain set names of concatenated symbols: dimensionMap and an explicitly given domain for a symbol in the name option. The dimensionMap
which is given once and holds for all symbols allows to map original domain names of concatenated symbols to the desired domain names. The name
option provides such a map by symbol and via the index position rather than the domain names of the concatenated symbol. In the above example with p1(i,j)
, p2(k,i)
, and p3(k,l)
, we could put indices i
and l
as well as j
and k
together resulting in the following output symbol:

This can be accomplished in two ways: either we use dimensionMap: {i: il, l: il, j: jk, k: jk}
or we use name: p1(il,jk)
, name: p2(jk,il)
, and name: p3(jk,il)
to explicitly define the domain names for each symbol. Note that it is not required to set outputDimensions: [il,jk]
since per default the union of domain sets is built using the mapped domain names. In case a domain set is used more than once in a domain of a concatenated symbol the mapping goes from left to right to find the corresponding output domain. If this is not desired, the Projection agent can be used to reorder index positions in symbols or explicit index naming can be used. In the example with q1(*,i,j,*)
and q2(*,i,i)
, the second index position of q2
will be put together with the second index position of q1
. If one wants to map the second i
of q2
(in the third index position) together with the i
of q1
(in second index position), one can, e.g., do with name: q1(*,i,j,*)
, and name: q2(*,i2,i)
.
- Note
- The Concatenate agent creates result symbols
parameterOutput
andsetOutput
for parameters and sets separately. Both have the same output domain. If you want different output domains forparameterOutput
andsetOutput
use two instantiations of the Concatenate agent. - Variables and equations need to be turned into parameters with the Projection agent before they can be concatenated.
- If option
name
is given without an explicit domain for the concatenated symbol, the domain names stored in the Connect container are used and mapped via thedimensionMap
option, if provided. - A domain set of a concatenated symbol that cannot be assigned to an index in
outputDimensions
will be mapped to a so-called universal domain. The Concatenate agent automatically adds as many universal domains as required to the output symbols.
- The Concatenate agent creates result symbols
Here is a complete example that uses the Concatenate agent:
Sets
i(i) / i0*i3 "i_text" /
j(j) / j0*j3 "j_text" /
k(k) / k0*k3 "k_text" /;
Parameters
p1(i) / i1 1 /
p2(k,j) / k1.j0 2, k1.j1 3, k1.j3 4 /
p3(j,j) / j1.j2 5, j2.j0 6, j3.j1 7, j3.j2 8 /
s / 5 /;
Positive Variable x(i,j);
x.l(i,j)$(uniform(0,1)>0.8) = uniformint(0,10);
EmbeddedCode Connect:
- GAMSReader:
readAll: True
- Projection:
name: x.l(i,j)
newName: x_level(i,j)
- Concatenate:
outputDimensions: [j,i]
- GDXWriter:
file: concat_output.gdx
symbols:
- name: setOutput
- name: parameterOutput
endEmbeddedCode
The resulting set and parameter outputs look as follows:

The following options are available for the Concatenate agent.
Option | Scope | Default | Description |
---|---|---|---|
concatenateAll | root | auto | Indicate if all sets and parameters in the Connect database will be concatenated. |
outputDimensions | root | all | Define the dimensions of the output symbols. |
dimensionMap | root | None | Define a mapping for the domain names of concatenated symbols as stored in the Connect database to the desired domain names. |
name | symbols | None | Specify the name of the symbol with potentially index space. |
newName | symbols | None | Specify a new name for the symbol in the symbols column of the output symbol. |
parameterName | root | parameterOutput | Name of the parameter output symbol. |
emptyUel | root | - | Define a character to use for empty uels. |
setName | root | setOutput | Name of the set output symbol. |
symbols | root | None | Specify symbol specific options. |
trace | root | inherited | Specify the trace level for debugging output. |
universalDimension | root | uni | Specify the base name of universal dimensions. |
Detailed description of the options:
concatenateAll = boolean or string (default=auto
)
If
True
all sets and parameters in the given database are concatenated, symbols will be ignored. Default isauto
where the Concatenate agent usessymbols
if specified, otherwise concatenates all sets and parameters in the Connect database.
outputDimensions = list or string (default=all
)
Define the dimensions of the output symbols explicitly using a list, e.g.,
outputDimensions: [i,j]
. The defaultall
gets the domain sets from the concatenated symbols and builds the union with duplicates if required.
dimensionMap = dict (optional)
Define a mapping for domain names of concatenated symbols as stored in the Connect database to the desired domain names. For example,
dimensionsMap: {i: ij, j: ij}
will map both symbol domainsi
andj
toij
.
Specify the name of the symbol with potentially index space. Requires the format
symname(i1,i2,...,iN)
. The index space may be specified to establish a mapping for the domain names of the symbol as stored in the Connect database to the desired domain names. If no index space is provided, the domain names stored in the Connect data are used and mapped via thedimensionMap
option if provided.
Specify a new name for the symbol in the
symbols
column of the output symbol.
parameterName = string (default=parameterOutput
)
Name of the parameter output symbol.
Define a character to use for empty uels.
setName = string (default=setOutput
)
Name of the set output symbol.
A list containing symbol specific options. Allows to concatenate a subset of symbols.
Specify the trace level for debugging output. For
trace > 1
some scalar debugging output will be written to the log. Fortrace > 2
the intermediate data frames will be written abbreviated to the log. Fortrace > 3
the intermediate data frames will be written entirely to the log (potentially large output). Iftrace
has not been set, thetrace
value, set by the Options agent, will be used.
universalDimension = string (default=uni
)
Specify the base name of universal dimensions.
CSVReader
The CSVReader allows reading a symbol (set or parameter) from a specified CSV file into the Connect database. Its implementation is based on the pandas.DataFrame
class and its I/O API method read_csv
. See Simple Connect Example for CSV for a simple example that uses the CSVReader.
Option | Default | Description |
---|---|---|
autoColumn | None | Generate automatic column names. |
autoRow | None | Generate automatic row labels. |
decimalSeparator | . (period) | Specify a decimal separator. |
fieldSeparator | , (comma) | Specify a field separator. |
file | None | Specify a CSV file path. |
header | inferred | Specify the header(s) used as the column names. |
indexColumns | None | Specify columns to use as the row labels. |
indexSubstitutions | None | Dictionary used for substitutions in the index columns. |
name | None | Specify a symbol name for the Connect database. |
names | None | List of column names to use. |
quoting | 0 | Control field quoting behavior. |
readCSVArguments | None | Dictionary containing keyword arguments for the pandas.read_csv method. |
skipRows | None | Specify the rows to skip or the number of rows to skip. |
stack | inferred | Stacks the column names to index. |
textColumns | None | Specify columns to get the set element text from. |
textSubstitutions | None | Dictionary used for substitutions in the text columns. |
thousandsSeparator | None | Specify a thousands separator. |
trace | inherited | Specify the trace level for debugging output. |
valueColumns | None | Specify columns to get the values from. |
valueSubstitutions | None | Dictionary used for substitutions in the value columns. |
Detailed description of the options:
autoColumn = string (optional)
Generate automatic column names. The
autoColumn
string is used as the prefix for the column label numbers. This option overrides the use of aheader
ornames
. However, if there is a header row, one must skip the row by enablingheader
or usingskipRows
.
Generate automatic row labels. The
autoRow
string is used as the prefix for the row label numbers. The generated unique elements will be used in the first index position shifting other elements to the right. UsingautoRow
can be helpful when there are no labels that can be used as unique elements but also to store entries that would be a duplicate entry without a unique row label.
decimalSeparator = string (default=.
)
Specify a decimal separator. [
.
(period),,
(comma)]
fieldSeparator = string (default=,
)
Specify a field separator. [
,
(comma),;
(SemiColon),\t
(Tab)]
Specify a CSV file path.
header = boolean, list (optional)
Specify the header(s) used as the column names. Default behavior is to infer the column names: if no names are passed the behavior is identical to
header=True
and column names are inferred from the first line of data, if column names are passed explicitly then the behavior is identical toheader=False
. Explicitly passheader=True
to be able to replace existing names. Note that missing column names are filled withUnnamed: n
(where n is the nth column (zero based) in the DataFrame). Hence, reading the CSV file:,j1, i1,1,2 i2,3,4 ,5,6results in the following 2-dimensional parameter:
j1 Unnamed: 2 i1 1.000 2.000 i2 3.000 4.000For a multi-row header, a list of integers can be passed providing the positions of the header rows in the data. Note that reading multi-row header is only supported for parameters. Moreover, the CSVReader can only read all columns and not a subset of columns, wherefore only
indexColumns
can be specified and all other columns will automatically be read asvalueColumns
. Note thatindexColumns
can not be provided as column names together with a multi-row header.autoRow
andautoColumn
will be ignored in case of a multi-row header. Here is an example how to read data with a multi-row header:$onecho > multirow_header.csv j,,j1,j1,j1,j2,j2,j2 k,,k1,k2,k3,k1,k2,k3 h,i,,,,,, h1,i1,1,2,,4,5,6 h1,i2,,,3,4,5, $offEcho $onEmbeddedCode Connect: - CSVReader: file: multirow_header.csv name: p header: [1,2] indexColumns: [1,2] - PythonCode: code: | print(connect.container["p"].records) $offEmbeddedCode
The same can be achieved if the data has no index column names:
j,,j1,j1,j1,j2,j2,j2 k,,k1,k2,k3,k1,k2,k3 h1,i1,1,2,,4,5,6 h1,i2,,,3,4,5,If the first line of data after the multi-row header has no data in the
valueColumns
, the CSVReader will interpret this line as index column names.
indexColumns = list or string (optional)
Specify columns to use as the row labels. The columns can either be given as column positions or column names. Column positions can be represented as an integer, a list of integers or a string. For example:
indexColumns: 1
,indexColumns: [1, 2, 3, 4, 6]
orindexColumns: "1:4, 6"
. The symbolic constantlastCol
can be used with the string representation:"2:lastCol"
. If noheader
ornames
is provided,lastCol
will be determined by the first line of data. Column names can be represented as a list of strings. For example:indexColumns: ["i1","i2"]
. Note thatindexColumns
andvalueColumns
/textColumns
and must either be given as positions or names not both. Further note thatindexColumns
as column names are not supported together with a multi-row header.By default the
pandas.read_csv
method interprets the following indices asNaN
: "", "#N/A", "#N/A N/A", "#NA", "-1.#IND", "-1.#QNAN", "-NaN", "-nan", "1.#IND", "1.#QNAN", "<NA>", "N/A", "NA", "NULL", "NaN", "n/a", "nan", "null". The default can be changed by specifyingpandas.read_csv
argumentskeep_default_na
andna_value
via readCSVArguments. Rows with indices that are interpreted asNaN
will be dropped automatically. The indexSubstitutions option allows to remapNaN
entries in the index columns.
indexSubstitutions = dictionary (optional)
Dictionary used for substitutions in the index columns. Each key in
indexSubstitutions
is replaced by its corresponding value. This option allows arbitrary replacements in the index columns of theDataFrame
including stacked column names. Consider the following CSV file:i1,j1,2.5 i1,,1.7 i2,j1,1.8 i2,,1.4Reading this data into a 2-dimensional parameter results in a parameter with
NaN
entries dropped:j1 i1 2.500 i2 1.800By specifying
indexSubstitutions: { .nan: j2 }
we can substitueNaN
entries byj2
:j1 j2 i1 2.500 1.700 i2 1.800 1.400
Specify a symbol name for the Connect database. Note that each symbol in the Connect database must have a unique name.
List of column names to use. If the file contains a header row, then you should explicitly pass
header=True
to override the column names. Duplicates in this list are not allowed.
Control field quoting behavior. Use QUOTE_MINIMAL (
0
), QUOTE_ALL (1
), QUOTE_NONNUMERIC (2
) or QUOTE_NONE (3
). QUOTE_NONNUMERIC (2
) instructs the reader to convert all non-quoted fields to type float. QUOTE_NONE (3
) instructs reader to perform no special processing of quote characters.
readCSVArguments = dictionary (optional)
Dictionary containing keyword arguments for the pandas.read_csv method. Not all arguments of that method are exposed through the YAML interface of the CSVReader agent. By specifying
readCSVArguments
, it is possible to pass arguments directly to thepandas.read_csv
method that is used by the CSVReader agent. For example,readCSVArguments: {keep_default_na: False, skip_blank_lines: False}
.
skipRows = list or integer (optional)
Specify the rows to skip (list) or the number of rows to skip (integer). For example:
skipRows: [1, 3]
orskipRows: 5
.
Stacks the column names to index. Defaults to
True
if there is more than one value/text column, otherwiseFalse
. Note that missing column names are filled withUnnamed: n
(where n is the nth column (zero based) in the DataFrame).
textColumns = list or string (optional)
Specify columns to get the set element text from. The columns can be given as column positions or column names. Column positions can be represented as a integer, a list of integers or a string. For example:
textColumns: 1
,textColumns: [1, 2, 3, 4, 6]
ortextColumns: "1:4, 6"
. The symbolic constantlastCol
can be used with the string representation:"2:lastCol"
. If noheader
ornames
is provided,lastCol
will be determined by the first line of data. Column names can be represented as a list of strings. For example:textColumns: ["i1","i2"]
. Note thattextColumns
andindexColumns
must either be given as positions or names not both.By default the
pandas.read_csv
method interprets the following text asNaN
: "", "#N/A", "#N/A N/A", "#NA", "-1.#IND", "-1.#QNAN", "-NaN", "-nan", "1.#IND", "1.#QNAN", "<NA>", "N/A", "NA", "NULL", "NaN", "n/a", "nan", "null". The default can be changed by specifyingpandas.read_csv
argumentskeep_default_na
andna_value
via readCSVArguments. Rows with texts that are interpreted asNaN
will be dropped automatically. The textSubstitutions option allows to remapNaN
entries in the text columns.
textSubstitutions = dictionary (optional)
Dictionary used for substitutions in the text columns. Each key in
textSubstitutions
is replaced by its corresponding value. While it is possible to make arbitrary replacements this is especially useful for controlling sparse/dense reading. The default reading behavior is sparse since rows with text that is interpreted asNaN
are dropped automatically. Consider the following CSV file:i1,text1 i2, i3,text3
thousandsSeparator = string (optional)
Specify a thousands separator.
Specify the trace level for debugging output. For
trace > 1
some scalar debugging output will be written to the log. Fortrace > 2
the intermediate data frames will be written abbreviated to the log. Fortrace > 3
the intermediate data frames will be written entirely to the log (potentially large output). Iftrace
has not been set, thetrace
value, set by the Options agent, will be used.
valueColumns = list or string (optional)
Specify columns to get the values from. The columns can be given as column positions or column names. Column positions can be represented as a integer, a list of integers or a string. For example:
valueColumns: 1
,valueColumns: [1, 2, 3, 4, 6]
orvalueColumns: "1:4, 6"
. The symbolic constantlastCol
can be used with the string representation:"2:lastCol"
. If noheader
ornames
is provided,lastCol
will be determined by the first line of data. Column names can be represented as a list of strings. For example:valueColumns: ["i1","i2"]
. Note thatvalueColumns
andindexColumns
must either be given as positions or names not both.By default the
pandas.read_csv
method interprets the following values asNaN
: "", "#N/A", "#N/A N/A", "#NA", "-1.#IND", "-1.#QNAN", "-NaN", "-nan", "1.#IND", "1.#QNAN", "<NA>", "N/A", "NA", "NULL", "NaN", "n/a", "nan", "null". The default can be changed by specifyingpandas.read_csv
argumentskeep_default_na
andna_value
via readCSVArguments. Rows with values that are interpreted asNaN
will be dropped automatically. Changing the default of values that are interpreted asNaN
is useful if, e.g., "NA" values should not be dropped but interpreted as GAMS special valueNA
. Moreover, the valueSubstitutions option allows to remapNaN
entries in the value columns.
valueSubstitutions = dictionary (optional)
Dictionary used for substitutions in the value columns. Each key in
valueSubstitutions
is replaced by its corresponding value. While it is possible to make arbitrary replacements this is especially useful for controlling sparse/dense reading. AllNaN
entries are removed automatically by default which results in a sparse reading behavior. Consider the following CSV file:i1,j1, i1,j2,1.7 i2,j1, i2,j2,1.4Reading this data into a 2-dimensional parameter results in a sparse parameter with all
NaN
entries removed:j2 i1 1.700 i2 1.400By specifying
valueSubstitutions: { .nan: eps }
we get a dense representation where allNaN
entries are replaced by GAMS special valueEPS
:j1 j2 i1 EPS 1.700 i2 EPS 1.400Beside
eps
there are the following other GAMS special values that can be used by specifying their string representation:inf
,-inf
,eps
,na
, andundef
. See the GAMS Transfer documentation for more information.Reading this data into a 1-dimensional set results in a sparse set in which all
NaN
entries (those that do not have any set element text) are removed:'i1' 'text 1', 'i3' 'text 3'By specifying
textSubstitutions: { .nan: '' }
we get a dense representation:'i1' 'text 1', 'i2', 'i3' 'text 3'It is also possible to use
textSubstitutions
in order to interpret the set element text. Let's assume we have the following CSV file:,j1,j2,j3 i1,Y,Y,Y i2,Y,Y,N i3,0,Y,YReading this data into a 2-dimensional set results in a dense set:
'i1'.'j1' Y, 'i1'.'j2' Y, 'i1'.'j3' Y, 'i2'.'j1' Y, 'i2'.'j2' Y, 'i2'.'j3' N, 'i3'.'j1' 0, 'i3'.'j2' Y, 'i3'.'j3' YBy specifying
textSubstitutions: { 'N': .nan, '0': .nan }
we replace all occurrences ofN
and0
byNaN
which gets dropped automatically:'i1'.'j1' Y, 'i1'.'j2' Y, 'i1'.'j3' Y, 'i2'.'j1' Y, 'i2'.'j2' Y, 'i3'.'j2' Y, 'i3'.'j3' Y
CSVWriter
The CSVWriter allows writing a symbol (set or parameter) in the Connect database to a specified CSV file. Variables and equations need to be turned into parameters with the Projection agent before they can be written. See Simple Connect Example for CSV for a simple example that uses the CSVWriter.
Option | Default | Description |
---|---|---|
decimalSeparator | . (period) | Specify a decimal separator. |
file | None | Specify a CSV file path. |
fieldSeparator | , (comma) | Specify a field separator. |
header | True | Indicate if the header will be written. |
name | None | Specify the name of the symbol in the Connect database. |
quoting | 0 | Control field quoting behavior. |
setHeader | None | Specify a string that will be used as the header. |
skipElementText | False | Indicate if the set element text will be skipped. |
toCSVArguments | None | Dictionary containing keyword arguments for the pandas.to_csv method. |
trace | inherited | Specify the trace level for debugging output. |
unstack | False | Specify the dimensions to be unstacked to the header row(s). |
Detailed description of the options:
decimalSeparator = string (default=.
)
Specify a decimal separator. [
.
(period),,
(comma)]
Specify a CSV file path.
fieldSeparator = string (default=,
)
Specify a field separator. [
,
(comma),;
(SemiColon),\t
(Tab)]
header = boolean (default=True
)
Indicate if the header will be written.
Specify the name of the symbol in the Connect database.
Control field quoting behavior. Use QUOTE_MINIMAL (
0
), QUOTE_ALL (1
), QUOTE_NONNUMERIC (2
) or QUOTE_NONE (3
). QUOTE_MINIMAL (0
) instructs the writer to only quote those fields which contain special characters such asfieldSeparator
. QUOTE_ALL (1
) instructs the writer to quote all fields. QUOTE_NONNUMERIC (2
) instructs the writer to quote all non-numeric fields. QUOTE_NONE (3
) instructs the writer to never quote fields.
Specify a string that will be used as the header. If an empty header is desired, the string can be empty.
skipElementText = boolean (default=False
)
Indicate if the set element text will be skipped. If
False
, the set element text will be written in the last column of the CSV file.
toCSVArguments = dictionary (optional)
Dictionary containing keyword arguments for the pandas.to_csv method. Not all arguments of that method are exposed through the YAML interface of the CSVWriter agent. By specifying
toCSVArguments
, it is possible to pass arguments directly to thepandas.to_csv
method that is used by the CSVWriter agent. For example,toExcelArguments: {index_label: ["index1", "index2", "index3"]}
.
Specify the trace level for debugging output. For
trace > 1
some scalar debugging output will be written to the log. Fortrace > 2
the intermediate data frames will be written abbreviated to the log. Fortrace > 3
the intermediate data frames will be written entirely to the log (potentially large output). Iftrace
has not been set, thetrace
value, set by the Options agent, will be used.
unstack = boolean, list (default=False
)
Specify the dimensions to be unstacked to the header row(s). If
False
(default) no dimension will be unstacked to the header row. IfTrue
the last dimension will be unstacked to the header row. If multiple dimensions should be unstacked to header rows, a list of integers providing the dimension numbers to unstack can be specified.
DomainWriter
The DomainWriter agent allows to rewrite domain information for existing Connect symbols and helps dealing with domain violations.
Option | Scope | Default | Description |
---|---|---|---|
dropDomainViolations | symbols | False | Indicate how to deal with domain violations. |
name | symbols | None | Specify the name of the symbol in the Connect database. |
symbols | root | None | Specify symbol specific options. |
trace | root | inherited | Specify the trace level for debugging output. |
writeAll | root | auto | Indicate if all symbols in the Connect database will be treated with the root dropDomainViolations setting. |
Detailed description of the options:
dropDomainViolations = boolean (root)/boolean or string (symbols) (default=False
)
The Connect symbols might have some domain violations. This agent allows to drop these domain violations so a write to GAMS or GDX works properly. Setting the root option
dropDomainViolations: True
together withwriteAll: True
will drop domain violations from all symbols in the Connect database. The symbols option also allows to drop domain violations. In thesymbols
section thedropDomainViolations
attribute can be of type boolean (True
orFalse
) or of type string (before
andafter
). If the attribute has not been set for the symbol, the attribute is inherited from the root attribute. The valuebefore
means that domain violations are dropped before a new domain is applied, see attribute name. The valueafter
means that domain violations are dropped after a new domain is applied. The valueTrue
means that domain violations are dropped before and after a new domain is applied.False
means to not drop domain violations.
Specify a symbol name with index space for the Connect database.
name
requires the formatsymname(i1,i2,...,iN)
. The list of indices needs to coincide with the names of the actual GAMS domain sets for a regular domain. A relaxed domain is set if the index is quoted. For examplename: x(i,'j')
means that for the first index a regular domain with domain seti
is established, while for the second index the universal domain*
is used and a relaxed domain namej
is set.
A list containing symbol specific options. Allows to concatenate a subset of symbols.
Specify the trace level for debugging output. For
trace > 1
some scalar debugging output will be written to the log. Fortrace > 2
the intermediate data frames will be written abbreviated to the log. Fortrace > 3
the intermediate data frames will be written entirely to the log (potentially large output). Iftrace
has not been set, thetrace
value, set by the Options agent, will be used.
writeAll = boolean or auto
(default=auto
)
Indicate if all symbols in the Connect database will be treated according to root attribute dropDomainViolations. If
True
, treat all symbols according todropDomainViolations
and ignore symbols. The defaultauto
becomesTrue
if there are no symbol options specified, otherwiseFalse
.
Filter
The Filter agent allows to reduce symbol data by applying filters on labels and numerical values. Here is a complete example that uses the Filter agent:
Set i / seattle, san-diego /
j / new-york, chicago, topeka /;
Parameter d(i,j) /
seattle.new-york 2.5
seattle.chicago 1.7
seattle.topeka 1.8
san-diego.new-york 2.5
san-diego.chicago 1.8
san-diego.topeka 1.4
/;
$onEmbeddedCode Connect:
- GAMSReader:
symbols:
- name: d
- Filter:
name: d
newName: d_new
labelFilters:
- column: 1
keep: ['seattle']
- column: 2
reject: ['topeka']
valueFilters:
- column: value
rule: x<2.5
- GDXWriter:
file: report.gdx
symbols:
- name: d_new
$offEmbeddedCode
The records of the parameter d
are filtered and stored in a new parameter called d_new
. Two label filters remove all labels except seattle
from the first dimension and remove the label topeka
from the second one. The remaining records are filtered by value where only values less than 2.5 are kept in the data. The resulting parameter d_new
which is exported into report.gdx
has only one record (seattle.chicago 1.7
) left.
The following options are available for the Filter agent:
Option | Scope | Default | Description |
---|---|---|---|
column | labelFilters/valueFilters | None | Specify the column to which a filter is applied. |
eps | valueFilters | True | Used to keep or reject special value EPS . |
infinity | valueFilters | True | Used to keep or reject special value +INF . |
keep | labelFilters | None | Specify a list of labels to keep. |
labelFilters | root | None | Specify filters for index columns of a symbol. |
na | valueFilters | True | Used to keep or reject special value NA . |
name | root | None | Specify a symbol name for the Connect database. |
negativeInfinity | valueFilters | True | Used to keep or reject special value -INF . |
newName | root | None | Specify a new name for the symbol in the Connect database. |
regex | labelFilters | None | Specify a regular expression to be used for filtering labels. |
reject | labelFilters | None | Specify a list of labels to reject. |
rule | valueFilters | None | Specify a boolean expression to be used for filtering on numerical columns. |
ruleIdentifier | valueFilters | x | The identifier used for the value filter rule. |
trace | root | inherited | Specify the trace level for debugging output. |
undf | valueFilters | True | Used to keep or reject special value UNDF . |
valueFilters | root | None | Specify filters for numerical columns of a symbol. |
Detailed description of the options:
column = integer or string (optional)
Used to specify the column on which a label filter or a value filter is applied. For label filters the index position can be specified using an integer. For value filters, the following strings are allowed depending on the symbol type:
- Set: not allowed
- Parameter:
value
,all
- Variable and Equation:
level
,marginal
,upper
,lower
,scale
,all
Specifying the string
all
will apply the filter on all columns of the current filter (all index columns for labelFilters and all numerical columns for valueFilters).
Used to keep (
True
) or reject (False
) special valueEPS
.
infinity = boolean (default=True
)
Used to keep (
True
) or reject (False
) special value+INF
.
A list of labels to be kept when applying the label filter. For each label filter it is only allowed to specify either keep, reject, or regex at a time.
labelFilters = list (optional)
A list containing label filters.
Used to keep (
True
) or reject (False
) special valueNA
.
Specify the name of the symbol from the Connect database on whose data the filters will be applied.
negativeInfinity = boolean (default=True
)
Used to keep (
True
) or reject (False
) special value-INF
.
Specify a new name for the symbol in the Connect database which will get the data after all filters have been applied. Each symbol in the Connect database must have a unique name.
A string containing a regular expression that needs to match in order to keep the corresponding label. Uses a full match paradigm which means that the whole label needs to match the specified regular expression. For each label filter it is only allowed to specify either keep, reject, or regex at a time.
A list of labels to be rejected when applying the label filter. For each label filter it is only allowed to specify either keep, reject, or regex at a time.
Used to specify a boolean expression for a value filter. Each numerical value of the specified column is tested and the corresponding record is only kept if the expression evaluates to true. The string needs to contain Python syntax that is valid for
pandas.Series
. Comparison operators like>
,>=
,<
,<=
,==
, or!=
can be used in combination with boolean operators like&
or|
, but notand
oror
. Note that using&
or|
requires the operands to be enclosed in round brackets in order to form a valid expression. As an example, the expression((x<=10) & (x>=0)) | (x>20)
would keep only those values that are between0
and10
(included) or greater than20
.
ruleIdentifier = string (default=x
)
Specifies the identifier that is used in the rule of a value filter.
Specify the trace level for debugging output. For
trace > 1
some scalar debugging output will be written to the log. Fortrace > 2
the intermediate data frames will be written abbreviated to the log. Fortrace > 3
the intermediate data frames will be written entirely to the log (potentially large output). Iftrace
has not been set, thetrace
value, set by the Options agent, will be used.
Used to keep (
True
) or reject (False
) special valueUNDF
.
valueFilters = list (optional)
A list containing value filters.
GAMSReader
The GAMSReader allows reading symbols from the GAMS database into the Connect database. Without GAMS context (e.g. when running the gamsconnect
script from the command line) this agent is not available and its execution will result in an exception.
Option | Scope | Default | Description |
---|---|---|---|
name | symbols | None | Specify the name of the symbol in the GAMS database. |
newName | symbols | None | Specify a new name for the symbol in the Connect database. |
readAll | root | auto | Indicate if all symbols in the GAMS database will be read into the Connect database. |
symbols | root | None | Specify symbol specific options. |
trace | root | inherited | Specify the trace level for debugging output. |
Detailed description of the options:
Specify the name of the symbol in the GAMS database.
Specify a new name for the symbol in the Connect database. Each symbol in the Connect database must have a unique name.
readAll = boolean or auto
(default=auto
)
Indicate if all symbols in the GAMS database will be read into the Connect database. If
True
, read all symbols into the Connect database and ignore symbols. The defaultauto
becomesTrue
if there are no symbol options specified, otherwiseFalse
.
A list containing symbol specific options. Allows to read a subset of symbols.
Specify the trace level for debugging output. For
trace > 1
some scalar debugging output will be written to the log. Fortrace > 2
the intermediate data frames will be written abbreviated to the log. Fortrace > 3
the intermediate data frames will be written entirely to the log (potentially large output). Iftrace
has not been set, thetrace
value, set by the Options agent, will be used.
GAMSWriter
The GAMSWriter allows writing symbols in the Connect database to the GAMS database. Without GAMS context (e.g. when running the gamsconnect
script from the command line) and as part of the connectOut command line option this agent is not available and its execution will result in an exception.
Option | Scope | Default | Description |
---|---|---|---|
domainCheckType | root/symbols | default | Specify if domain checking is applied or if records that would cause a domain violation are filtered. |
duplicateRecords | root/symbols | all | Specify how to deal with duplicate records. |
mergeType | root/symbols | default | Specify if data in a GAMS symbol is merged or replaced. |
name | symbols | None | Specify the name of the symbol in the Connect database. |
newName | symbols | None | Specify a new name for the symbol in the GAMS database. |
symbols | root | None | Specify symbol specific options. |
trace | root | inherited | Specify the trace level for debugging output. |
writeAll | root | auto | Indicate if all symbols in the Connect database will be written to the GAMS database. |
Detailed description of the options:
domainCheckType = string (default=default
)
Specify if Domain Checking is applied (
checked
) or if records that would cause a domain violation are filtered (filtered
). If left atdefault
it depends on the setting of $on/offFiltered if GAMS does a filtered load or checks the domains during compile time. During execution timedefault
is the same asfiltered
.
duplicateRecords = string (default=all
)
The Connect container can hold multiple records even for the same indexes. This is only a problem when exchanging the data with GAMS (and GDX). The attribute determines how the agent deals with duplicate records. With the default of
all
the GAMSWriter will fail in case duplicate records exist. Withfirst
the first record will be written to GAMS, withlast
the last record written to GAMS. Withnone
none of the duplicate records will be written to GAMS. Note that the agent currently deals with duplicate records in a case sensitive way.
mergeType = string (default=default
)
Specify if data in a GAMS symbol is merged (
merge
) or replaced (replace
). If left atdefault
it depends on the setting of $on/offMulti[R] if GAMS does a merge, replace, or triggers an error during compile time. During execution timedefault
is the same asmerge
.
Specify the name of the symbol in the Connect database.
Specify a new name for the symbol in the GAMS database. Note, each symbol in the GAMS database must have a unique name.
A list containing symbol specific options. Allows to write a subset of symbols.
Specify the trace level for debugging output. For
trace > 1
some scalar debugging output will be written to the log. Fortrace > 2
the intermediate data frames will be written abbreviated to the log. Fortrace > 3
the intermediate data frames will be written entirely to the log (potentially large output). Iftrace
has not been set, thetrace
value, set by the Options agent, will be used.
writeAll = boolean or auto
(default=auto
)
Indicate if all symbols in the Connect database will be written to the GAMS database. If
True
, write all symbols to the GAMS database and ignore symbols. The defaultauto
becomesTrue
if there are no symbol options specified, otherwiseFalse
.
GDXReader
The GDXReader allows reading symbols from a specified GDX file into the Connect database.
Option | Scope | Default | Description |
---|---|---|---|
file | root | None | Specify a GDX file path. |
name | symbols | None | Specify the name of the symbol in the GDX file. |
newName | symbols | None | Specify a new name for the symbol in the Connect database. |
readAll | root | auto | Indicate if all symbols in the GDX file will be read into the Connect database. |
symbols | root | None | Specify symbol specific options. |
trace | root | inherited | Specify the trace level for debugging output. |
Detailed description of the options:
Specify a GDX file path.
Specify the name of the symbol in the GDX file.
Specify a new name for the symbol in the Connect database. Each symbol in the Connect database must have a unique name.
readAll = boolean or auto
(default=auto
)
Indicate if all symbols in the GDX file will be read into the Connect database. If
True
, read all symbols into the Connect database and ignore symbols. The defaultauto
becomesTrue
if there are no symbol options specified, otherwiseFalse
.
A list containing symbol specific options. Allows to read a subset of symbols.
Specify the trace level for debugging output. For
trace > 1
some scalar debugging output will be written to the log. Fortrace > 2
the intermediate data frames will be written abbreviated to the log. Fortrace > 3
the intermediate data frames will be written entirely to the log (potentially large output). Iftrace
has not been set, thetrace
value, set by the Options agent, will be used.
GDXWriter
The GDXWriter allows writing symbols in the Connect database to a specified GDX file.
Option | Scope | Default | Description |
---|---|---|---|
duplicateRecords | root/symbols | all | Specify how to deal with duplicate records. |
file | root | None | Specify a GDX file path. |
name | symbols | None | Specify the name of the symbol in the Connect database. |
newName | symbols | None | Specify a new name for the symbol in the GDX file. |
symbols | root | None | Specify symbol specific options. |
trace | root | inherited | Specify the trace level for debugging output. |
writeAll | root | auto | Indicate if all symbols in the Connect database will be written to the GDX file. |
Detailed description of the options:
duplicateRecords = string (default=all
)
The Connect container can hold multiple records even for the same indexes. This is only a problem when exchanging the data with GDX (and GAMS). The attribute determines how the agent deals with duplicate records. With the default of
all
the GDXWriter will fail in case duplicate records exist. Withfirst
the first record will be written to GDX, withlast
the last record written to GDX. Withnone
none of the duplicate records will be written to GDX. Note that the agent currently deals with duplicate records in a case sensitive way.
Specify a GDX file path.
Specify the name of the symbol in the Connect database.
Specify a new name for the symbol in the GDX file. Note, each symbol in the GDX file must have a unique name.
A list containing symbol specific options. Allows to write a subset of symbols.
Specify the trace level for debugging output. For
trace > 1
some scalar debugging output will be written to the log. Fortrace > 2
the intermediate data frames will be written abbreviated to the log. Fortrace > 3
the intermediate data frames will be written entirely to the log (potentially large output). Iftrace
has not been set, thetrace
value, set by the Options agent, will be used.
writeAll = boolean or auto
(default=auto
)
Indicate if all symbols in the Connect database will be written to the GDX file. If
True
, write all symbols to the GDX file and ignore symbols. The defaultauto
becomesTrue
if there are no symbol options specified, otherwiseFalse
.
LabelManipulator
The LabelManipulator agent allows to modify labels of symbols in the Connect database. Four different modes are available:
- case: Applies either upper, lower, or capitalize casing to labels.
- code: Replaces labels using a Python code.
- map: Uses a 1-dimensional GAMS set to perform an explicit mapping of labels.
- regex: Performs a replacement based on a regular expression.
Here is a complete example that uses the LabelManipulator agent in all four modes:
Set i / seattle, san-diego /
j / new-york, chicago, topeka /
map / chicago 'Berlin', san-diego 'Oslo' /;
Parameter d(i,j) /
seattle.new-york 2.5
seattle.chicago 1.7
seattle.topeka 1.8
san-diego.new-york 2.5
san-diego.chicago 1.8
san-diego.topeka 1.4
/;
$onEmbeddedCode Connect:
- GAMSReader:
readAll: true
- LabelManipulator:
map:
setName: map
- LabelManipulator:
case:
rule: upper
- LabelManipulator:
symbols:
- name: d
code:
rule: x.split('-')[-1]
- LabelManipulator:
symbols:
- name: d
regex:
pattern: '[^O]$'
replace: '\g<0>X'
- PythonCode:
code: |
print("Set i:\n", connect.container["i"].records)
print("Set j:\n",connect.container["j"].records)
print("Parameter d:\n",connect.container["d"].records)
$offEmbeddedCode
The first LabelManipulator applies a mapping provided by the set map
to the labels of all symbols in the Connect database. This maps chicago
to Berlin
and san-diego
to Oslo
. The second LabelManipulator changes the labels of all symbols to upper case. The third LabelManipulator is only applied on symbol d
and splits labels at -
into a list and keeps the last entry. This changes NEW-YORK
to YORK
. The last LabelManipulator is also only applied on symbol d
and adds an X
to the end of all labels that do not end with an O
. The resulting symbols look as follows:
Set i: uni element_text 0 SEATTLE 1 OSLO Set j: uni element_text 0 NEW-YORK 1 BERLIN 2 TOPEKA Parameter d: i j value 0 SEATTLEX YORKX 2.5 1 SEATTLEX BERLINX 1.7 2 SEATTLEX TOPEKAX 1.8 3 OSLO YORKX 2.5 4 OSLO BERLINX 1.8 5 OSLO TOPEKAX 1.4
The following options are available for the LabelManipulator agent:
Option | Scope | Default | Description |
---|---|---|---|
case | root | None | Apply specified casing to labels. |
code | root | None | Replace labels using Python code. |
invert | map | False | Used to invert the mapping direction. |
map | root | None | Replace labels using a 1-dimensional GAMS set containing an explicit key-value mapping. |
name | symbols | None | Specify a symbol name for the Connect database. |
newName | symbols | None | Specify a new name for the symbol in the Connect database. |
outputSet | regex/case/code | None | Name of the output set that contains the applied mapping. |
pattern | regex | None | The regular expression that needs to match. |
regex | root | None | Replace labels using a regular expression. |
replace | regex | None | The rule used for replacing labels that match the given pattern. |
rule | case/code | None | case: The type of casing to be applied. code: Python function that defines the mapping behavior. |
ruleIdentifier | code | x | The identifier used for labels in the rule. |
setName | map | None | The name of the GAMS set used in map mode. |
symbols | root | None | Specify symbol specific options. |
trace | root | inherited | Specify the trace level for debugging output. |
writeAll | root | auto | Indicate if all symbols in the Connect database will be affected. |
Detailed description of the options:
Used for executing the LabelManipulator in case mode to apply a specified casing to labels. It is only allowed to specify either case, code, map or regex.
Used for executing the LabelManipulator in code mode to replace labels using a Python function. The given Python function is executed with each label and its return value is used as the replacement. It is only allowed to specify either case, code, map or regex.
invert = boolean (default=False
)
Used for controlling the mapping direction in map mode. If set to
False
(default), the labels that match a label of the provided 1-dimensional GAMS set are replaced by the corresponding set element text. If set toTrue
the direction is inverted, meaning all labels that match a set element text of the GAMS set are replaced by the corresponding label.
Used for executing the LabelManipulator in map mode to replace labels using a 1-dimensional GAMS set containing an explicit key-value mapping. It is only allowed to specify either case, code, map or regex.
Specify the name of the symbol in the GAMS database. Data of the symbol gets replaced if no newName is specified.
Specify a new name for the symbol in the Connect database. The original symbol specified under name remains unchanged. Each symbol in the Connect database must have a unique name.
Name of the output set that contains mappings that were actually applied on the symbol labels. Per default no output set is written. Providing a name for the output set indicates that an output set should be written to the Connect database. Note that each symbol in the Connect database must have a unique name. Supported by case, code and regex mode.
The regular expression that needs to match for a label to be replaced.
Used for executing the LabelManipulator in regex mode to replace labels using a regular expression. It is only allowed to specify either case, code, map or regex.
A string that specifies the replacement for all labels for which a given pattern in regex mode matches.
Can be specified in case and code mode.
case: The type of casing to be applied. Allowed values are:
lower
: Change all labels to lower case.upper
: Change all labels to upper case.capitalize
: Change all labels to a capitalized casing - first letter becomes upper case, all others become lower case.code: A Python function that defines the mapping behavior.
ruleIdentifier = string (default=x
)
Specifies the identifier that is used in the rule of the code mode.
Replace labels using a 1-dimensional GAMS set containing an explicit key-value mapping. A 1-dimensional GAMS set that contains an explicit key-value mapping to replace labels. If
invert: False
(default) all labels that match a label of the GAMS set will be replaced by the corresponding set element text.
symbols = dictionary (optional)
A list containing symbol specific options. Allows to execute the LabelManipulator on a subset of GAMS symbols in the Connect database.
Specify the trace level for debugging output. For
trace > 1
some scalar debugging output will be written to the log. Fortrace > 2
the intermediate data frames will be written abbreviated to the log. Fortrace > 3
the intermediate data frames will be written entirely to the log (potentially large output). Iftrace
has not been set, thetrace
value, set by the Options agent, will be used.
writeAll = boolean or auto
(default=auto
)
Indicate if the LabelManipulator is applied to all symbols of the database or only a subset of GAMS symbols in the Connect database. The default
auto
becomesTrue
if there are no symbol options specified, otherwiseFalse
.
Options
The Options agent allows to set more general options that can affect the Connect database and other Connect agents. More specifically, the value of an option set via the Options agent can be inherited as a default value to Connect agents that utilize the considered option.
Option | Default | Description |
---|---|---|
trace | 0 | Specify the trace level for debugging output. |
Detailed description of the options:
Specify the trace level for debugging output. A trace level of
0
(default) means no debugging output. Fortrace > 0
the Connect database will write some scalar debugging output to the log. The debugging output of Connect agents depends on their implementation oftrace
, please refer to the corresponding documentation.
PandasExcelReader
The PandasExcelReader agent allows to read symbols (sets and parameters) from an Excel file into the Connect database. Its implementation is based on the pandas.DataFrame
class and its I/O API method read_excel
. The PandasExcelReader primarily aims to read Excel files that have been written by its counterpart - the PandasExcelWriter. See Simple Connect Example for Excel for a simple example that uses the PandasExcelReader.
- Note
- The PandasExcelReader supports
.xlsx
and.xlsm
files. For other file formats, such as.xls
or.ods
files, it might be required to install additional Python packages. Please check the pandas.read_excel documentation for more information about supported file formats and required Python packages.
Option | Scope | Default | Description |
---|---|---|---|
columnDimension | root/symbols | 1 | Column dimension of the symbol. |
drop | symbols | None | Specify a string for dropping each row containing it in one of its labels. |
excelFileArguments | root | None | Dictionary containing keyword arguments for the pandas.ExcelFile constructor. |
file | root | None | Specify an Excel file path. |
multiColumnBlankRow | root/symbols | True | Indicator for existence of blank row after the column indexes (for columnDimension>1). |
name | symbols | None | Specify the name of the symbol in the Connect database. |
range | symbols | None | Specify the Excel range of a symbol. |
readExcelArguments | symbols | None | Dictionary containing keyword arguments for the pandas.read_excel method. |
rowDimension | root/symbols | 1 | Row dimension of the symbol. |
symbols | root | None | Specify symbol specific options. |
trace | root | inherited | Specify the trace level for debugging output. |
type | root/symbols | par | Control the symbol type. |
valueSubstitutions | symbols | None | Dictionary used for mapping in the value column of the DataFrame . |
Detailed description of the options:
columnDimension = integer (default=1
)
Column dimension: the number of rows in the data range that will be used to define the labels for columns. The first
columnDimension
rows of the data range will be used for labels.
Specify a string for dropping each row containing it in one of its labels. The specified string is interpreted as a regular expression.
excelFileArguments = dictionary (optional)
Dictionary containing keyword arguments for the
pandas.ExcelFile
constructor.
Specify an Excel file path.
multiColumnBlankRow = boolean (default=True
)
For symbols where more than one dimension is in the columns, i.e.
columnDimension>1
the PandasExcelReader expects a blank row before the data starts. This is also the shape the PandasExcelWriter writes:
Blank row between column headers and dataIf
multiColumnBlankRow
is set toFalse
, the PandasExcelReader expects for table withcolumnDimension>1
that this blank line is missing. This works properly with the exception of the following corner case:
Blank row between column headers and data missing and first data row is entirely blanktogether with the following Connect instructions
- PandasExcelReader: file: myfile.xlsx symbols: - name: s rowDimension: 1 columnDimension: 2 range: B2:E6 type: set multiColumnBlankRow: False valueSubstitutions: { .nan: '' } # read denseone would expect that the Connect database contains a set with the following elements
k1*k3.(i1.j1,i2.j2,i3.j3)
but Pandas interprets the row 4 (because it is entirely blank) as the row with the index name. Hence the data starts in row 5 and the Connect database is missing thek1
records:k2*k3.(i1.j1,i2.j2,i3.j3)
. In such a case one either needs the blank row between column indexes and data or manages to have this row not entirely empty (which can even be done outside the specified Excel range).
Specify a symbol name for the Connect database. Note that each symbol in the Connect database must have a unique name.
Specify the Excel range of a symbol using the format
sheet!range
.range
can be either a single cell also known as open range (north-west corner likeB2
) or a full range (north-west and south-east corner likeB2:D4
). For symbols withcolumnDimension=0
and/orrowDimension=0
, the ending row and/or ending column or the open range can be deduced and is used to restrict the data area.
readExcelArguments = dictionary (optional)
Dictionary containing keyword arguments for the pandas.read_excel method. Not all arguments of that method are exposed through the YAML interface of the PandasExcelReader agent. By specifying
readExcelArguments
, it is possible to pass arguments directly to thepandas.read_excel
method that is used by the PandasExcelReader agent. For example,readExcelArguments: {keep_default_na: False}
.
rowDimension = integer (default=1
)
Row dimension: the number of columns in the data range that will be used to define the labels for the rows. The first
rowDimension
columns of the data range will be used for the labels.
A list containing symbol specific options.
Specify the trace level for debugging output. For
trace > 1
some scalar debugging output will be written to the log. Fortrace > 2
the intermediate data frames will be written abbreviated to the log. Fortrace > 3
the intermediate data frames will be written entirely to the log (potentially large output). Iftrace
has not been set, thetrace
value, set by the Options agent, will be used.
Control the symbol type. Supported symbol types are
par
for parameters andset
for sets.
valueSubstitutions = dictionary (optional)
Dictionary used for mapping in the
value
column of theDataFrame
. Each key invalueSubstitutions
is replaced by its corresponding value. The replacement is only performed on thevalue
column of theDataFrame
which is the numerical value in case of a GAMS parameter and the set element text in case of a GAMS set. While it is possible to make arbitrary replacements this is especially useful for controlling sparse/dense reading. AllNaN
entries are removed automatically by default which results in a sparse reading behavior. Let's assume we have the following spreadsheet:
Two dimensional data containing NaN entriesReading this data into a 2-dimensional parameter results in a sparse parameter in which all
NaN
entries are removed:'i1'.'j1' 2.5, 'i1'.'j2' 1.7, 'i2'.'j2' 1.8, 'i2'.'j3' 1.4By specifying
valueSubstitutions: { .nan: eps }
we get a dense representation in which allNaN
entries are replaced by GAMS special valueEPS
:'i1'.'j1' 2.5, 'i1'.'j2' 1.7, 'i1'.'j3' Eps, 'i2'.'j1' Eps, 'i2'.'j2' 1.8, 'i2'.'j3' 1.4Beside
eps
there are the following other GAMS special values that can be used by specifying their string representation:inf
,-inf
,eps
,na
, andundef
. See the GAMS Transfer documentation for more information.Let's assume we have data representing a GAMS set:
Data representing a GAMS setReading this data into a 1-dimensional set results in a sparse set in which all
NaN
entries (those that do not have any set element text) are removed:'i1' 'text 1', 'i3' 'text 3'By specifying
valueSubstitutions: { .nan: '' }
we get a dense representation:'i1' 'text 1', 'i2', 'i3' 'text 3'It is also possible to use
valueSubstitutions
in order to interpret the set element text. Let's assume we have the following Excel data:
Data representing a GAMS setReading this data into a 2-dimensional set results in a dense set:
'i1'.'j1' No, 'i1'.'j2' Y, 'i1'.'j3' Y, 'i2'.'j1' Y, 'i2'.'j2' Y, 'i2'.'j3' Y, 'i3'.'j1' Y, 'i3'.'j2' Y, 'i3'.'j3' NBy specifying
valueSubstitutions: { 'N': .nan, 'No': .nan }
we replace all occurrences ofN
andNo
byNaN
which gets dropped automatically. Note thatNo
has to be quotes in order to not be interpreted asFalse
by the YAML parser:'i1'.'j2' Y, 'i1'.'j3' Y, 'i2'.'j1' Y, 'i2'.'j2' Y, 'i2'.'j3' Y, 'i3'.'j1' Y, 'i3'.'j2' Y
Fundamentals of Reading Data with PandasExcelReader
As mentioned at the start of this section, the PandasExcelReader works best with tables written by the PandasExcelWriter with a full range (north-west and south-east corner) specification. Nevertheless, the PandasExcelReader can also process tables not precisely in the format and shape given by PandasExcelWriter and also works with an open range (north-west corner only) specification. While the PandasExcelReader shares some functionality with the tool gdxxrw there are also significant differences and this section explains some of the perhaps unexpected behavior of this Connect agent.
- Symbols with
rowDimension=0
and/orcolumnDimension=0
have an artificial index in 0-dim index and the range specification need to include this artificial index. For example, if one wants to read a scalar, there are two artificial indexesrval
andcval
and the north-west corner starts in the artificial index row and column, i.e. range isB2
orB2:C3
in the following example:

The names of the artificial indexes are irrelevant, they can even be blank. But as a consequence of the artificial indexes, PandasExcelReader cannot read a scalar that is located in row 1 or column A. Similarly, symbols with dim>0
but either rowDimension
or columnDimension
equal to 0 need an artificial index:


Again, the name of the artificial index is irrelevant, it can even be blank. In both examples, the range starts in north-west corner B2
. For symbols with rowDimension>0
and columnDimension>0
there is no artificial index.
- Tables with more than one index in the columns can be read best if the column headers and the data are separated by a blank line (this is the way PandasExcelWriter writes such tables). The PandasExcelReader attribute multiColumnBlankRow allows some control. See the option descriptions for details.
Blank data is read as
nan
(not a number) and such records are dropped before the data is written to the Connect database. The valueSubstitutions attribute of PandasExcelReader allows to remapnan
to other values. Blank index positions are treated very differently. The behavior depends on multiple factors.If PandasExcelReader encounters a blank index at the beginning of the rows it fills the index with
nan
. With a blank row index in the middle of the table (i.e. if there was a good label before in this column), PandasExcelReader repeats the previous index of this column. In case ofcolumnDimension=1
blank column indexes are filled withUnnamed: n
(where n in the nth column (zero based) in the sheet). Hence the following table

is transformed into the following pandas.DataFrame (use PandasExcelReader attribute trace>2
to print intermediate data frames):
j1 Unnamed: 4 j3 NaN NaN NaN 6.0 NaN i2 i2 2.0 7.0 12.0 i2 i3 3.0 NaN 13.0 i2 i3 NaN NaN NaN i5 i3 5.0 10.0 15.0 i6 i6 6.0 11.0 16.0
which arrives after dropping the nan
values and indexes in the Connect database (here a display as a GAMS parameter) as:
j1 Unnamed: 4 j3 i2.i2 2 7 12 i2.i3 3 13 i5.i3 5 10 15 i6.i6 6 11 16
In case of columnDimension>1
blank column index at the beginning of the column index are filled with Unnamed: n_level_k
(where n in the nth column (zero based) in the sheet and k is the column dimension). With a blank column index in the middle of the index columns (i.e. if there was a good label before in this row), PandasExcelReader repeats the previous index but might add a suffix .1
, .2
, ... to disambiguate the column names. Hence the following table

is transformed into the following pandas.DataFrame (use PandasExcelReader attribute trace>2
to print intermediate data frames):
Unnamed: 3_level_0 k1 k1 k1 k4 Unnamed: 3_level_1 j1 j2 j2.1 j4 NaN NaN NaN NaN 6.0 NaN 17.0 i2 i2 NaN 2.0 7.0 12.0 18.0 i2 i3 NaN 3.0 NaN 13.0 19.0 i2 i3 NaN NaN NaN NaN NaN i5 i3 NaN 5.0 10.0 15.0 20.0 i6 i6 NaN 6.0 11.0 16.0 21.0
which arrives after dropping the nan
values and indexes in the Connect database (here a display as a GAMS parameter) as:
k1.j1 k1.j2 k1.j2.1 k4.j4 i2.i2 2 7 12 18 i2.i3 3 13 19 i5.i3 5 10 15 20 i6.i6 6 11 16 21
The PandasExcelReader attribute drop helps to get rid of unwanted indexes (e.g. drop: Unnamed
or drop: "\."
). Removing row indexes that result from the continuation of previous indexes is significantly harder. Hence such rows should be entirely empty (then the row is dropped because it has only nan values). It is best to avoid empty index rows and columns altogether.
- When one specifies an open range (north-west corner only) the PandasExcelReader will read from this north-west corner all the way to the end of the sheet (press Ctrl-End in Excel to locate the cursor into the last cell of a sheet). It does not stop at blank row or column indexes as the tool gdxxrw does. This aggravates the situation with empty index cells. Hence the following table

will result in the following pandas.DataFrame (use PandasExcelReader attribute trace>2
to print intermediate data frames):
j1 j2 j3 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8 i1 1.0 2.0 NaN NaN NaN NaN NaN i2 NaN 1.0 NaN NaN NaN NaN NaN i3 NaN NaN 1.0 NaN NaN NaN NaN i3 NaN NaN NaN NaN NaN NaN NaN i3 NaN NaN NaN NaN NaN NaN NaN i3 NaN NaN NaN NaN NaN NaN NaN i3 NaN NaN NaN NaN NaN NaN 999.0
which arrives after dropping the nan
values and indexes in the Connect database (here a display as a GAMS parameter) as:
j1 j2 j3 Unnamed: 8 i1 1 2 i2 1 i3 1 999
PandasExcelWriter
The PandasExcelWriter agent allows to write symbols (sets and parameters) from the Connect database to an Excel file. Variables and equations need to be turned into parameters with the Projection agent before they can be written. Its implementation is based on the pandas.DataFrame
class and its I/O API method to_excel
. If the Excel file exists, the PandasExcelWriter appends to the existing file and if sheets exist the PandasExcelWriter overlays, i.e. writes contents to the existing sheets without removing the old contents. This behavior can be changed via the option excelWriterArguments. See Simple Connect Example for Excel for a simple example that uses the PandasExcelWriter.
- Note
- The PandasExcelWriter supports
.xlsx
and.xlsm
files..xls
files are not supported. For other file formats, such as.ods
files, it might be required to install additional Python packages and adjust the writers behavior, e.g. change the mode from append to writeexcelWriterArguments: {mode: w}
. Please check the pandas.ExcelWriter documentation for more information about supported file formats and required Python packages. -
The PandasExcelWriter per default writes merged cells. To deactivate writing merged cells, use the option toExcelArguments as follows
toExcelArguments: {merge_cells: False}
. Please be aware that writing merged cells may cause problems in append mode and with sheets overlaying. The user may change the writers behavior, e.g. from overlaying to replacing the sheet if it exitsexcelWriterArguments: {if_sheet_exists: replace}
.
- Attention
- Please be aware of the following limitation when appending to an Excel file with formulas using the PandasExcelWriter: Whereas Excel stores formulas and the corresponding values, both Pandas I/O API methods
read_excel
andto_excel
read/store either formulas or values, not both. As a consequence, when appending to an Excel file with formulas using the PandasExcelWriter, all the cells with the formulas will not have values anymore and a subsequent read by the PandasExcelReader results intoNaN
for cells with formulas. To avoid this, one way is to have a "read-only" input Excel file and write to a separate output Excel file. On Windows one can merge both Excel files at the end using tool win32.ExcelMerge (see Connect Example for Excel (executeTool win32.ExcelMerge)).
Option | Scope | Default | Description |
---|---|---|---|
excelWriterArguments | root | None | Dictionary containing keyword arguments for the pandas.ExcelWriter constructor. |
file | root | None | Specify an Excel file path. |
name | symbols | None | Specify the name of the symbol in the Connect database. |
range | symbols | None | Specify the Excel range of a symbol. |
rowDimension | root/symbols | None | Row dimension of the symbol. |
symbols | root | None | Specify symbol specific options. |
toExcelArguments | symbols | None | Dictionary containing keyword arguments for the pandas.to_excel method. |
trace | root | inherited | Specify the trace level for debugging output. |
valueSubstitutions | symbols | None | Dictionary used for mapping in the value column of the DataFrame . |
Detailed description of the options:
excelWriterArguments = dictionary (optional)
Dictionary containing keyword arguments for the pandas.ExcelWriter constructor. For example,
excelWriterArguments: {if_sheet_exists: replace}
for deleting contents of an existing sheet before writing to it. To change the mode from append to write:excelWriterArguments: {mode: w}
.
Specify an Excel file path.
Specify a symbol name for the Connect database.
Specify the Excel range of a symbol using the format
sheet!range
.range
can be either a single cell (north-west corner likeB2
) or a full range (north-west and south-east corner likeB2:D4
). For writing purposes, the south-east corner is ignored. Please note, that PandasExcelWriter always writes an index in the row and column. So even for a scalar or an indexed symbol withrowDimension=0
ordim-rowDimension=0
there will be some index information in the0
index. For example, the following Connect scriptparameter x0 / 3.14 /, x1 / i1 1, i2 2, i3 3 /; $onEmbeddedCode Connect: - GAMSReader: readAll: True - PandasExcelWriter: file: x.xlsx symbols: - name: x0 range: Sheet1!a1 - name: x1 rowDimension: 1 range: Sheet1!d1 - name: x1 rowDimension: 0 range: Sheet1!g1 $offEmbeddedCode
creates the following Excel output:
Value index for 0 index dimension
rowDimension = integer (optional)
Row dimension: The first
rowDimension
index positions of the symbol to be written will written to the rows.dim-rowDimension
index positions will be written into the column headers.
A list containing symbol specific options.
toExcelArguments = dictionary (optional)
Dictionary containing keyword arguments for the pandas.DataFrame.to_excel method. Not all arguments of that method are exposed through the YAML interface of the PandasExcelWriter agent. By specifying
toExcelArguments
, it is possible to pass arguments directly to thepandas.to_excel
method that is used by the PandasExcelWriter agent. For example,toExcelArguments: {merge_cells: True}
.
Specify the trace level for debugging output. For
trace > 1
some scalar debugging output will be written to the log. Fortrace > 2
the intermediate data frames will be written abbreviated to the log. Fortrace > 3
the intermediate data frames will be written entirely to the log (potentially large output). Iftrace
has not been set, thetrace
value, set by the Options agent, will be used.
valueSubstitutions = dictionary (optional)
Dictionary used for mapping in the
value
column of theDataFrame
. Each key invalueSubstitutions
is replaced by its corresponding value. The replacement is only performed on thevalue
column of theDataFrame
which is the numerical value in case of a GAMS parameter, variable or equation and the set element text in case of a GAMS set.
Projection
The Projection agent allows index reordering and projection onto a reduced index space of a GAMS symbol. For variables and equations a suffix (.l
, .m
, .lo
, .up
, or .scale
) can be extracted and is written to a parameter. Otherwise, the type of the source symbol determines the type of the new symbol, unless asSet
is set to True
. Variables and equations can be turned into parameters with an extra index containing the labels level
, marginal
, lower
, upper
, and scale
if the attribute asParameter
is set to True
. Moreover, if name
is a list of scalar symbols of the same type (parameters, variables, or equations), they can be stored in a one-dimensional symbols of the same type with the index label being the name of the scalar symbol.
Option | Default | Description |
---|---|---|
aggregationMethod | first | Specify the aggregation method for the projection. |
asParameter | False | Indicate that variable or equation symbols will be turned into a parameter with an extra index that contains the suffix. |
asSet | False | Indicate that the new symbol is a set independent of the type of the source symbol. |
name | None | Specify a symbol name with index space and potentially suffix for the Connect database. |
newName | None | Specify a new name with index space for the symbol in the Connect database. |
text | None | Element text for resulting sets. |
trace | inherited | Specify the trace level for debugging output. |
Detailed description of the options:
aggregationMethod = string (default=first
)
Specify the method to aggregate when at least one index position is projected out. The default is
first
, meaning that the first record will be stored in the new symbol. For sets, variables, and equations (without a suffix specified) onlyfirst
andlast
are meaningful. For parameters, variables, and equations with suffix many other aggregation methods are available and meaningful:max
,mean
,median
,min
,prod
,sem
(unbiased standard error of the mean),sum
,std
(standard deviation),nunique
(number of distinct elements),first
,last
. The projection agent is based on pandas DataFrames and more detailed explanations of the aggregation method can be found at the pandas website.
asParameter = boolean (default=False
)
If the symbol specified by
name
is a variable or equation andasParameter
is set toTrue
, the new symbol (after potentialaggregationMethod
is applied) will be a parameter with an additional index (at the end of the index list) that contains the suffix labelslevel
,marginal
,lower
,upper
, andscale
and the corresponding suffix value.
asSet = boolean (default=False
)
Usually the type of the source symbol and the use of a suffix with variables and equations determine the type of the target symbol. With
asSet
set toTrue
the target symbol will be a set.
name = string or list of strings(required)
One either specifies a single symbol name with index space and potentially suffix for the Connect database or a list of symbol names of scalar symbols. In the prior case,
name
requires the formatsymname[.suffix](i1,i2,...,iN)
. The suffix is only allowed on variable and equation symbols and need to be eitherl
,m
,lo
,up
, orscale
. The list of indices does not need to coincide with the names of the actual GAMS domain sets. This index list together with the index list specified fornewName
is solely intended to establish the index order in the symbol specified bynewName
. In the latter case (a list of symbol names) the symbols need to be scalar symbols of the same type (parameter, variable, or equation) and a new one-dimensional symbol (of the same type) is created (usingnewName
) that holds the symbol names as labels.
Specify a new name with index space for the projected or reordered symbol in the Connect database. Note that each symbol in the Connect database must have a unique name.
newName
is given assymname(i1,i2,...,iN)
. The list of indices does not need to coincide with the names of the actual GAMS domain sets. This index list together with the index list specified forname
is solely intended to establish the index order. Hence, the names in the index list need to be unique and only names that are part of the index list specified forname
can be used. For example:name: p(I,j,k)
andnewName: q(k,i)
.
Control the handling of element text if the resulting symbol is a set. If set to
""
, the text will be dropped. When left at default (None
) and the projected symbol is a set, the element text of the original set will be used. For other symbols types, the text will be dropped. Iftext
is a string, this string will be assigned to all elements. The string can contain place holders{i1}
that will be replaced with the content of the matching index position. For example,text: "{i2} - {i1}: {element_text}"
, where{i1}
and{i2}
should be index space names in the symbol name with index space (attributename
).{element_text}
refers to original element text (set) or string representation of a numerical value (parameter or variable/equation with a given suffix) of the source symbol.
Specify the trace level for debugging output. For
trace > 1
some scalar debugging output will be written to the log. Fortrace > 2
the intermediate arrays and data frames will be written abbreviated to the log. Fortrace > 3
the intermediate arrays and data frames will be written entirely to the log (potentially large output). Iftrace
has not been set, thetrace
value, set by the Options agent, will be used.
PythonCode
The PythonCode agent allows to execute arbitrary Python code. From within the code, it is possible to access the GAMS database via gams.db
(if the PythonCode agent is running in a GAMS context) and the Connect database via connect.container
. The GAMS database is an instance of GamsDatabase, whereas the Connect database is a GAMS Transfer Container
object. Furthermore there is a predefined instructions
list that can be filled with tasks that are automatically executed.
Option | Default | Description |
---|---|---|
code | None | Python code to be executed. |
Detailed description of the options:
Python code to be executed. The YAML syntax offers the pipe character (
|
) for specifying multi-line strings:- PythonCode: code: | print("Print from Python") # insert more Python code hereIt is possible to generate instructions by appending tasks to the Python
instructions
list. A task is specified by using Python data structures that match the schema of a specific Connect agent. At the end of the Python code, all tasks in theinstructions
list are automatically generated and executed. The following example shows how to fill theinstructions
list with three PandasExcelWriter tasks that write different parameters (p0
,p1
,p2
) into separate Excel workbooks (data_p0.xlsx
,data_p1.xlsx
,data_p2.xlsx
).- GAMSReader: readAll: True - PythonCode: code: | symbols = [ 'p0', 'p1', 'p2' ] for s in symbols: instructions.append( { 'PandasExcelWriter': { 'file': 'data_{}.xlsx'.format(s), 'symbols': [{'name': s, 'rowDimension': connect.container.data[s].dimension, 'range': s+'!A1'}] } })Using
connect.container
allows to access the Connect database directly in the Python code. Theconnect.container
is a GAMS Transfer Container object and data within the container is stored as Pandas DataFrames. Please refer to the documentation of GAMS Transfer to learn more about GAMS Transfer and its functionalities.The following complete example shows how to access and modify Connect container data and manually add a new symbol with the modified data to the Connect container:
Set i /1, 2, 3/ i_mod; $onembeddedCode Connect: - GAMSReader: readAll: True - PythonCode: code: | i_mod_records = [ 'i'+n for n in connect.container.data["i"].records.iloc[:,0] ] connect.container.addSet("i_mod", ["*"], records=i_mod_records) - GAMSWriter: symbols: - name: i_mod $offembeddedCode display i, i_mod;
The first line takes the data of set
i
and adds an i at the beginning of each uel in the first column of the dataframe. The last line writes the modified dataframe as seti_mod
to the Connect database.Here is another complete example modifying Connect container data and adding a new symbol with the modified data:
Parameter p; $onEcho > p_raw.csv i,j,2000,2001 i1,j1,1,2 i2,j2,3,4 i3,j3,5,6 $offEcho $onEmbeddedCode Connect: - CSVReader: file: p_raw.csv name: p_raw header: True indexColumns: "1,2" valueColumns: "2:lastCol" - PythonCode: code: | p_records = [ [r[0] + '_' + r[1]] + list(r) for i,r in connect.container.data["p_raw"].records.iterrows() ] connect.container.addParameter("p", ["*"]*4, records=p_records) - GAMSWriter: symbols: - name: p $offEmbeddedCode display p;
In this example, we take the data of parameter
p_raw
and insert a column of the concatenated row dimensions into the first column of the dataframe. The modified dataframe is then added to the Connect container as records for the new symbolp
. Here is a display of GAMS parameterp
:INDEX 1 = i1_j1 2000 2001 i1.j1 1.000 2.000 INDEX 1 = i2_j2 2000 2001 i2.j2 3.000 4.000 INDEX 1 = i3_j3 2000 2001 i3.j3 5.000 6.000
RawCSVReader
The RawCSVReader allows reading of unstructured data from a specified CSV file into the Connect database. Due to performance issues this agent is recommended for small to medium sized unstructured CSV only. This reader works similarly compared to the RawExcelReader agent. It reads the entire CSV file and represents its content in a couple of GAMS sets:
r / r1, r2, ... /
(rows)c / c1, c2, ... /
(columns)vs(r,c) / s1.r1.c2 "cell text", ... /
(cells with explanatory text)vu(r,c,*) / s1.r1.c1."cell text" "cell text", ...
(cells with potential GAMS label)
and a parameter vf(r,c) / r2.c2 3.14, ... /
(cells with numerical values). Unlike RawExcelReader cells with a date will be not interpreted and stored in vs
and vu
. Cells with a string value will be stored in vs
. If the string length exceeds the maximum length allowed for elements text, it will be truncated. RawCSVReader will try to represent the cell value as a number and if this succeeds stores the number in vf
. Strings of GAMS special values names INF
, EPS
, NA
, and UNDEF
as well as TRUE
and FALSE
will be also converted to its numerical counterpart. It will also try to represent the cell value as a string and stores this as a label in the third position in vu
. GAMS labels have a length limitation and hence RawCSVReader automatically shortens the label to fit this limit. RawCSVReader will provide a unique label (ending in ~n
where n
is an integer for strings exceeding the label length limit) for each string in the CSV file. The full string (if it fits) will be available as the element text of the vu
record.
Option | Default | Description |
---|---|---|
cName | c | Symbol name for columns. |
columnLabel | C | Label for columns. |
file | None | Specify a CSV file path. |
readAsString | True | Control the automatic pandas type conversion of cells. |
readCSVArguments | None | Dictionary containing keyword arguments for the pandas.read_csv method. |
rName | r | Symbol name for rows. |
rowLabel | R | Label for rows. |
trace | inherited | Specify the trace level for debugging output. |
vfName | vf | Symbol name for cells with a numerical value. |
vsName | vs | Symbol name for cells with an explanatory text. |
vuName | vu | Symbol name for cells with a potential GAMS label. |
Detailed description of the options:
Control the name of the set of columns.
columnLabel = string (default=C
)
Control the labels for the set of columns (
c
).
Specify a CSV file path.
readAsString = boolean (optional)
Control the type of cells returned by pandas.read_csv. If this is set to
True
(default) all cells are returned as string and the agent tries to interpret the string itself. If this is set toFalse
pandas will try to infer the type. In such a case the agent can't distinguish between cells with1
and1.00
because pandas turned the integer1
already into a float and the agent has no way for recovering the original cell.
readCSVArguments = dictionary (optional)
Dictionary containing keyword arguments for the pandas.read_csv method. By specifying
readCSVArguments
, it is possible to pass arguments directly to thepandas.read_csv
method. For example,readCSVArguments:
{delimiter: ';'}.
Control the name of the set of rows.
Control the labels for the set of rows (
r
).
Specify the trace level for debugging output. For
trace > 1
some scalar debugging output will be written to the log. Fortrace > 2
the cell values and it's processing will be written entirely to the log (potentially large output). Iftrace
has not been set, thetrace
value, set by the Options agent, will be used.
Control the name of the parameter for cells with a numerical value.
Control the name of the set for cells with an explanatory text.
Control the name of the set for cells with a potential GAMS label.
RawExcelReader
The RawExcelReader allows reading of unstructured data from a specified Excel file into the Connect database. Due to performance issues this agent is recommended for small to medium sized unstructured Excel files only. This reader works similarly compared to the xlsdump tool. It reads the entire spreadsheet and represents its content in a couple of GAMS sets:
s /s1, s2,.../
(workbook sheets)w / Sheet1, Sheet2, ... /
(workbook sheets by name)ws(s,w) / s1.Sheet1, s2.Sheet2, ... /
(workbook map)r / r1, r2, ... /
(rows)c / c1, c2, ... /
(columns)vs(s,r,c) / s1.r1.c2 "cell text", ... /
(cells with explanatory text)vu(s,r,c,*) / s1.r1.c1."cell text" "cell text", ...
(cells with potential GAMS label)
and a parameter vf(s,r,c) / s1.r2.c2 3.14, ... /
(cells with numerical values). Cells with a date will be stored in it's string representation in vu
and as a Julian date in vf
. Cells with a string value will be stored in vs
. If the string length exceeds the maximum length allowed for elements text, it will be truncated. Excel offers many other cell value types. RawExcelReader will try to represent the cell value as a number and if this succeeds stores the number in vf
. Strings of GAMS special values names INF
, EPS
, NA
, and UNDEF
will be also converted to its numerical counterpart. It will also try to represent the cell value as a string and stores this as a label in the fourth position in vu
. GAMS labels have a length limitation and hence RawExcelReader automatically shortens the label to fit this limit. RawExcelReader will provide a unique label (ending in ~n
where n is an integer for strings exceeding the label length limit) for each string in the workbook. The full string (if it fits) will be available as the element text of the vu
record.
Option | Default | Description |
---|---|---|
cName | c | Symbol name for columns. |
columnLabel | C | Label for columns. |
file | None | Specify an Excel file path. |
mergedCells | False | Control the handling of empty cells that are part of a merged Excel range. |
rName | r | Symbol name for rows. |
rowLabel | R | Label for rows. |
sheetLabel | S | Label for workbook sheets. |
sName | s | Symbol name for workbook sheets. |
trace | inherited | Specify the trace level for debugging output. |
vfName | vf | Symbol name for cells with a numerical value. |
vsName | vs | Symbol name for cells with an explanatory text. |
vuName | vu | Symbol name for cells with a potential GAMS label. |
wName | w | Symbol name for workbook sheets by name. |
wsName | ws | Symbol name for workbook map. |
Detailed description of the options:
Control the name of the set of columns.
columnLabel = string (default=C
)
Control the labels for the set of columns (
c
).
Specify an Excel file path.
mergedCells = boolean (default=False
)
Control the handling of empty cells that are part of a merged Excel range. If
False
, the cells are left empty. IfTrue
, the merged value is used in all cells. Note that setting this option toTrue
has an impact on performance since the Excel workbook has to be opened in a non-read-only mode that results in non-constant memory consumption (no lazy loading).
Control the name of the set of rows.
Control the labels for the set of rows (
r
).
sheetLabel = string (default=S
)
Control the labels for the set of workbook sheet (
s
).
Control the name of the set of workbook sheets.
Specify the trace level for debugging output. For
trace > 1
some scalar debugging output will be written to the log. Fortrace > 2
the cell values and it's processing will be written entirely to the log (potentially large output). Iftrace
has not been set, thetrace
value, set by the Options agent, will be used.
Control the name of the parameter for cells with a numerical value.
Control the name of the set for cells with an explanatory text.
Control the name of the set for cells with a potential GAMS label.
Control the name of the set of workbook sheets by name.
Control the name of the set of the workbook map.
SQLReader
The SQLReader agent allows to read symbols (sets and parameters) from a specified database management system into the Connect database. It connects to MySQL, Postgres, MS-SQL (SQL-Server), SQLite and PyODBC through their respective python packages to provide native SQL query support. Further, it also utilizes pandas.DataFrame
class' I/O API method read_sql
to connect to any other database provided the relevant drivers are present on the system. See Simple Connect Example for SQL for a simple example that uses the SQLReader.
- Note
- The connectivity to MS-Access databases is available on Windows only and requires a 64-bit MS-Access ODBC driver. See connection for more information.
Option | Scope | Default | Description |
---|---|---|---|
connection | root | None | Connection dictionary to specify credentials for the database. |
connectionArguments | root | None | Dictionary containing keyword arguments for the connect constructor of the respective SQL library or for the sqlalchemy.create_engine constructor. |
connectionType | root | sqlite | Specify the connection type to be used in order to connect to that database. |
dTypeMap | root/symbols | None | Dictionary used to specify the dtype of columns. |
indexSubstitutions | symbols | True | Dictionary used for substitutions in the index columns. |
name | symbols | None | Specify the name of the symbol in the Connect database. |
query | symbols | None | Specify the SQL query. |
readSQLArguments | symbols | None | Dictionary containing keyword arguments for the pandas.read_sql method. |
symbols | root | None | Specify symbol specific options. |
trace | root | inherited | Specify the trace level for debugging output. |
type | root/symbols | par | Control the symbol type. |
valueColumns | symbols | inferred | Specify columns to get the values from. |
valueSubstitutions | symbols | None | Dictionary used for mapping in the value column of the DataFrame . |
Detailed description of the options:
Allows to specify the credentials to access the database. Below are examples for connection dictionaries based on the selected
connectionType
.SQLite:
connection: {'database': 'absolute//path//to//datafile.db'}Postgres/MySQL/SQLServer:
connection: {'user': <username of the database server>, 'password': <password>, 'host': <hostname or ip adress>, 'port': <port number of remote machine>, 'database': <database name which you want to connect to>}MS-Access:
connection: {'DRIVER': 'Microsoft Access Driver (*.mdb, *.accdb)', 'DBQ': 'absolute//path//to//datafile.db'}
- Note
- The connectivity to MS-Access databases is available on Windows only and requires a 64-bit MS-Access ODBC driver. If no MS-Access is installed or a 32-bit version of MS-Access, download and install a 64-bit MS-Access ODBC Driver as a redistributable package from MS (e.g. MS-Access 2013 Runtime_x64).
SQLAlchemy:
Use
connectionType: sqlalchemy
to connect to various databases. Note that the argumentdrivername: <dialect+driver>
is required.For dialect SQLite:
connection: {'drivername': 'sqlite', 'database': 'absolute//path//to//datafile.db'}For dialect Postgres:
connection: {'drivername': 'postgresql+psycopg2', 'username': <username of the database server>, 'password': <password>, 'host': <hostname or ip adress>, 'port': <port number of remote machine>, 'database': <database name which you want to connect to>}For dialect MySQL:
connection: {'drivername': 'mysql+pymysql', 'username': <username of the database server>, 'password': <password>, 'host': <hostname or ip adress>, 'port': <port number of remote machine>, 'database': <database name which you want to connect to>}For dialect MS-SQL(SQLServer):
connection: {'drivername': 'mssql+pymssql', 'username': <username of the database server>, 'password': <password>, 'host': <hostname or ip adress>, 'port': <port number of remote machine>, 'database': <database name which you want to connect to>}For dialect MS-Access:
connection: {'drivername': 'access+pyodbc', 'query': {'odbc_connect': 'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=absolute//path//to//datafile.db;'}}
- Note
- Connecting to databases other than SQLite, Postgres, MySQL and MS-SQL through SQLAlchemy would require the respective driver of that database to be present on the system. For example, in the above example
sqlalchemy-access
must be available in order to connect to MS-Access through SQLAlchemy.PyODBC:
Use
connectionType: pyodbc
to connect to various databases. Therefore, either specify the DSN (Data Source Name):connection: {'DSN': <Data Source Name>}or use:
connection: {'user': <username of the database server>, 'password': <password>, 'host': <hostname or ip adress>, 'port': <port number of remote machine>, 'database': <database name which you want to connect to>, 'driver': <name of the ODBC driver>}
connectionArguments = dict (optional)
Dictionary containing keyword arguments for the
connect
constructor of the respective SQL library or for thesqlalchemy.create_engine
constructor.
connectionType = string (default=sqlite
)
Following is the list of valid options for connection type.
sqlite
(default)postgres
mysql
sqlserver
access
sqlalchemy
pyodbc
A connection using
pyodbc
can be established to any database for which a DSN (Data Source Name) is available on the system. Thepyodbc
connection works readily on windows platform where DSN is already setup. Further, for non-windows platforms, the user is responsible for installing the pyodbc package and handling any other dependencies for the local python installation.
Allows to specify the dtype of columns in a dictionary as
key: value
pairs, i.e.<column>: <dtype>
.
indexSubstitutions = dict (optional)
Dictionary used for substitutions in the index columns. Each key in
indexSubstitutions
is replaced by its corresponding value. This option allows arbitrary replacements in the index columns.
Name of the symbol in the Connect database. The name must be unique for each symbol.
Specify the SQL query which will fetch the desired table from the database system.
readSQLArguments = dict (optional)
Dictionary containing keyword arguments for the pandas.read_sql method. Not all arguments of that method are exposed through the YAML interface of the SQLReader agent. By specifying
readSQLArguments
, it is possible to pass arguments directly to thepandas.read_sql
method that is used by the SQLReader agent ifconnectionType
issqlalchemy
. IfconnectionType
is notsqlalchemy
, this option can be used to parameterize the SQL query. A small example showing how this option can be used is given below.query: "SELECT i,j FROM stock_table WHERE value_col > %(value)s" readSQLArguments: {'value': 10}
- Note
- Different Database management systems use different parameter markers and some do not support named parameter markers, e.g.,
PyODBC
.
A list containing symbol specific options.
Specify the trace level for debugging output. For
trace > 1
some scalar debugging output will be written to the log. Fortrace > 2
the intermediate arrays and data frames will be written abbreviated to the log. Fortrace > 3
the intermediate arrays and data frames will be written entirely to the log (potentially large output). Iftrace
has not been set, thetrace
value, set by the Options agent, will be used.
Control the symbol type. Supported symbol types are
par
for GAMS parameters andset
for GAMS sets.
valueColumns = list or string (optional)
Specify columns to get the values from. The value columns contain numerical values in case of a GAMS parameter and set element text in case of a GAMS set. The columns are given as column names represented as a list of strings. For example:
valueColumns: ["i1", "i2"]
. If there is more than one value column specified, the column names are stacked to index automatically. As string one can specify the symbolic constantlastCol
(i.e.valueColumns: "lastCol"
) or an empty string (i.e.valueColumns: ""
). WhenlastCol
is passed the last column will be treated as a value column and all the other columns will be treated as index columns. When an empty list is passed all columns will be treated as index columns. Specifying an empty list is only valid for symbol typeset
since symbol typepar
requires at least one value column. The default for symbol typepar
islastCol
and the default for symbol typeset
is an empty string.
valueSubstitutions = dict (optional)
Dictionary used for mapping in the value column of the
DataFrame
. Each key invalueSubstitutions
is replaced by its corresponding value. The replacement is only performed on the value column of theDataFrame
which is the numerical value in case of a GAMS parameter and the set element text in case of a GAMS set. While it is possible to make arbitrary replacements this is especially useful for controlling sparse/dense reading.
SQLWriter
The SQLWriter agent allows to write symbols (sets and parameters) from the Connect database to a specified database management system. Variables and equations need to be turned into parameters with the Projection agent before they can be written. It connects to MySQL, Postgres, MS-SQL (SQL-Server), SQLite and PyODBC through their respective python packages to provide faster write operations. Further, it also utilizes pandas.DataFrame
class' I/O API method to_sql
to connect to any other database provided the relevant drivers are present on the system. See Simple Connect Example for SQL for a simple example that uses the SQLWriter.
- Note
- The connectivity to MS-Access databases is available on Windows only and requires a 64-bit MS-Access ODBC driver. See connection for more information.
Option | Scope | Default | Description |
---|---|---|---|
connection | root | None | Connection dictionary to specify credentials for the database. |
connectionArguments | root | None | Dictionary containing keyword arguments for the connect constructor of the respective SQL library or for the sqlalchemy.create_engine constructor. |
connectionType | root | None | Specify the connection type to be used in order to connect to that database. |
ifExists | root/symbols | fail | Specify the behavior when a table with the same name exists in the database/schema. |
insertMethod | root/symbols | default | Specify the insertion method to be used to write the table in the database. |
name | symbols | None | Specify the name of the symbol in the Connect database. |
schemaName | root/symbols | None | Specify the schema name. |
symbols | root | None | Specify symbol specific options. |
tableName | symbols | None | Specify the SQL table/relation in the provided database/schema. |
toSQLArguments | root/symbols | None | Dictionary containing keyword arguments for the pandas.to_sql method. |
trace | root | inherited | Specify the trace level for debugging output. |
unstack | root/symbols | False | Indicate if the last index column will be used as a header row. |
valueSubstitutions | root/symbols | None | Dictionary used for mapping in the value column of the DataFrame . |
writeAll | root | auto | Indicate if all set and parameter type symbols in the Connect database will be written to the specified DBMS. |
Detailed description of the options:
Allows to specify the credentials to access the database. See SQLReader connection for further details and some examples.
- Note
- The connectivity to MS-Access databases is available on Windows only and requires a 64-bit MS-Access ODBC driver. If no MS-Access is installed or a 32-bit version of MS-Access, download and install a 64-bit MS-Access ODBC Driver as a redistributable package from MS (e.g. MS-Access 2013 Runtime_x64).
connectionArguments = dict (optional)
Dictionary containing keyword arguments for the
connect
constructor of the respective SQL library or for thesqlalchemy.create_engine
constructor.
connectionType = string (default=sqlite
)
Following is the list of valid options for connection type.
sqlite
(default)postgres
mysql
sqlserver
access
sqlalchemy
pyodbc
A connection using
pyodbc
can be established to any database for which a DSN (Data Source Name) is available on the system. Thepyodbc
connection works readily on windows platform where DSN is already setup. Further, for non-windows platforms, the user is responsible for installing the pyodbc package and handling any other dependencies for the local python installation.
- Note
pyodbc
provides a simple and consistent API to connect to many different databases using the Open Database Connectivity (ODBC) interface. This introduces the challenge that database specific properties are not taken into consideration. For example, different databases have different escaping syntax in order to escape special characters in column names of a table. Thus, the implementation for connection typepyodbc
is kept as a general purpose connector to different databases while not escaping the special characters.
ifExists = string (default=fail
)
Specify the behavior when a table with the same name exists in the database/schema. Valid values are
fail
,replace
andappend
.
insertMethod = string (default=default
)
Specify the insertion method to be used to write the table in the database. Valid values are
bcp
,bulkInsert
, anddefault
.
- Note
- The options
bulkInsert
andbcp
is not available with connection typesqlalchemy
. Further, the optionbcp
is reserved for connection typesqlserver
. This is useful when the SQL-Server database is available remotely. Thebcp
method uses MS-SQL's command line utility of the same name to insert huge amount of data. In order to use this insertion method, the user must download and install the utility as per the system requirements. The utility is available for Windows as well as for Linux and macOS. On the other hand, if the SQL-Server database is available locally, then thebulkInsert
method can be used and thebcp
utility is not required.- The option
bulkInsert
uses special SQL queries for connection typemysql
,postgres
,sqlserver
andaccess
. It creates a temporary csv file and then imports the same in the database. It is to be noted that for MySQL the optionLOCAL_INFILE
must be enabled at the server side for this method to work successfully.
Specify the name of the symbol in the Connect database.
schemaName = string (optional)
Specify the schema name for writing the table to the correct location. In Postgres, by default, it writes to a public schema already present in every database.
A list containing symbol specific options.
Name of the SQL table/relation in the provided database/schema.
toSQLArguments = dict (optional)
Dictionary containing keyword arguments for the pandas.to_sql method. Not all arguments of that method are exposed through the YAML interface of the SQLWriter agent. By specifying
toSQLArguments
, it is possible to pass arguments directly to thepandas.to_sql
method that is used by the SQLWriter agent ifconnectionType
issqlalchemy
.
Specify the trace level for debugging output. For
trace > 1
some scalar debugging output will be written to the log. Fortrace > 2
the intermediate arrays and data frames will be written abbreviated to the log. Fortrace > 3
the intermediate arrays and data frames will be written entirely to the log (potentially large output). Iftrace
has not been set, thetrace
value, set by the Options agent, will be used.
unstack = boolean (default=False
)
Indicate if the last index column will be used as a header row.
valueSubstitutions = dict (optional)
Dictionary used for mapping in the value column of the
DataFrame
. Each key invalueSubstitutions
is replaced by its corresponding value. The replacement is only performed on the value column of theDataFrame
which is the numerical value in case of a GAMS parameter, variable or equation and the set element text in case of a GAMS set.
writeAll = string (default=auto
)
Indicate if all set and parameter type symbols in the Connect database will be written to the specified DBMS. If
True
, the table name in the database is defined by the name of the symbol that is written to that table.
Examples
This section provides a collection of more complex examples. For simple examples see section Getting Started.
Connect Example for Excel (executeTool win32.ExcelMerge)
The following example shows how to read and write Excel files in Connect. On Windows with Excel installed, the output sheets are merged back into the input workbook using tool win32.ExcelMerge. The entire code is listed at the end of the example. This model is part of DataLib as model connect05. First, the original matrix a
is read using the GAMSReader and is then written to input.xlsx using the PandasExcelWriter. After clearing symbols i
and a
in the GAMS database, the PandasExcelReader is used to read file input.xlsx back in and create parameter a
in the Connect database. The Projection agent extracts set i
from parameter a
. With the GAMSWriter, symbols i
and a
are written to the GAMS database. The tool linalg.invert calculates the inverse inva
of a
which is then written to output.xlsx using Connect's GAMSReader and PandasExcelWriter at execution time. The following lines then check if the code is not executed on a UNIX system and if Excel is available. If both is true, output.xlsx is merged into input.xlsx using tool win32.ExcelMerge and both symbols inva
and a
can be read from input.xslx with a single instance of the PandasExcelReader. If the code is executed on a UNIX system and/or Excel is not available, output.xlsx can not be merged into input.xlsx, and both files need to be read to create the symbols inva
and a
. The last part makes sure that inva
is the inverse of a
.
set i / i1*i3 /; alias (i,j,k);
table a(i,j) 'original matrix'
i1 i2 i3
i1 1 2 3
i2 1 3 4
i3 1 4 3
;
$onEmbeddedCode Connect:
- GAMSReader:
symbols:
- name: a
- PandasExcelWriter:
file: input.xlsx
symbols:
- name: a
$offEmbeddedCode
$onMultiR
$clear i a
$onEmbeddedCode Connect:
- PandasExcelReader:
file: input.xlsx
symbols:
- name: a
range: a!a1
- Projection:
name: a(i,j)
newName: i(i)
asSet: True
- GAMSWriter:
writeAll: True
$offEmbeddedCode i a
parameter
inva(i,j) 'inverse of a'
chk(i,j) 'check the product a * inva'
;
executeTool.checkErrorLevel 'linalg.invert i a inva';
EmbeddedCode Connect:
- GAMSReader:
symbols:
- name: inva
- PandasExcelWriter:
file: output.xlsx
symbols:
- name: inva
endEmbeddedCode
Scalar mergedRead /0/;
executeTool 'win32.msappavail Excel';
mergedRead$(errorLevel=0) = 1;
if (mergedRead,
executeTool.checkErrorLevel 'win32.excelMerge output.xlsx input.xlsx';
EmbeddedCode Connect:
- PandasExcelReader:
file: input.xlsx
symbols:
- name: a
range: a!a1
- name: inva
range: inva!a1
- GAMSWriter:
writeAll: True
endEmbeddedCode a inva
else
EmbeddedCode Connect:
- PandasExcelReader:
file: input.xlsx
symbols:
- name: a
range: a!a1
- PandasExcelReader:
file: output.xlsx
symbols:
- name: inva
range: inva!a1
- GAMSWriter:
writeAll: True
endEmbeddedCode a inva
);
chk(i,j) = sum{k, a(i,k)*inva(k,j)};
chk(i,j) = round(chk(i,j),15);
display a,inva,chk;
chk(i,i) = chk(i,i) - 1;
abort$[card(chk)] 'a * ainv <> identity';
Connect Example for Excel
The following example shows how to read and write Excel files in Connect. The entire code is listed at the end of the example. This model is part of DataLib as model connect01. The example (inspired by the model herves) reads a 3-dimensional parameter from a spreadsheet that has one row index (code
) at the left side of the table and the other row index (labId
) at the right of the table. A column index (cut
) is at the top of the table. The column index consists of floating-point numbers. The goal it to read the data into GAMS but modify the labels of some sets: Only the first two decimal digits of the elements in cut
are significant. Moreover, the labId
should be prefixed with an L
. A new spreadsheet with the new labels should be written. The layout of the table should remain with the exception of moving the labId
column also to the left. Here is a screenshot of the original table:

The following GAMS code uses a separate GAMS program (getdata.gms
) to get the raw data from the original spreadsheet. Connect runs inside a compile-time embedded code section and uses the Connect agent RawExcelReader to get the raw Excel data. In some subsequent GAMS code the sets rr
and cut[Id]
as well as the parameter raw
are filled knowing the layout of the table (the code is written in a way that the table can grow). This GAMS program gets executed and instructed to create a GDX file. In a compile-time embedded Connect section the relevant symbols (rr
, cutId
, and raw
) are read from this GDX file. The Projection agent extracts the domain labid
from the set rr
and some Python code using Connect agent PythonCode makes the label adjustments and sorts the data nicely. The Python code uses the connect.container methods to read from and write to the Connect database. Finally, the GAMSWriter agent sends the data to GAMS. In the main program at execution-time an embedded Connect code section exports the labdata
parameter in the required form (after reading it from GAMS with the GAMSReader agent). Here is a screenshot of the resulting table:

In the remainder of the GAMS code another execution-time embedded Connect code is used to read the data back from the newly created spreadsheet using Connect agent PandasExcelReader. The set rr
is created from parameter labdata
using the Projection agent and everything is written back to GAMS with Connect agent GAMSWriter. The original data and the data from the newly created spreadsheet are exported to GDX (using execute_unload) and compared to verify that the data is identical by calling gdxdiff.
Set code, labId, cut, rr(code<,labId);
parameter labdata(code,labid,cut);
$onEcho > getdata.gms
* Symbols for RawExcelReader
alias (u,*); Set s,w,r,c,ws(s,w),vs(s,r,c),vu(s,r,c,u); Parameter vf(s,r,c);
$onEmbeddedCode Connect:
- RawExcelReader:
file: labdata.xlsx
- GAMSWriter:
writeAll: True
$offEmbeddedCode
* Symbols to be filled
alias (*,code,labId,cut); Parameter raw(code,labId,cut); Set cutId, rr(code,labId)
Set cX(c,cut) 'column index', rX(r,code,labId) 'row index';
Singleton set cLast(c); Scalar lastPos;
loop(ws(s,'ZAg'),
lastPos = smax(vu(s,r,c,u), c.pos); cLast(c) = c.pos=lastPos;
loop(r$(ord(r)>4),
rX(r,code,labId) $= vu(s,r,'C1',code) and vu(s,r,cLast,labId));
loop(c$(ord(c)>1 and not cLast(c)),
cX(c,cut) $= vu(s,'R4',c,cut));
loop((rX(r,code,labId),cX(c,cut)),
raw(code,labId,cut) = vf(s,r,c))
loop(cX(c,cut),
cutId(cut) = yes)
);
option rr<rX;
$offEcho
$call.checkErrorLevel gams getdata.gms lo=%gams.lo% gdx=getdata.gdx
$onEmbeddedCode Connect:
- GDXReader:
file: getdata.gdx
symbols: [ {name: rr}, {name: raw}, {name: cutId, newName: cut} ]
- Projection:
name: rr(code,labid)
newName: labid(labid)
- PythonCode:
code: |
labid_records = sorted([ 'L'+t[0] for t in connect.container.data['labid'].records.values ], key=lambda t: int(t[1:]))
rr_records = sorted([ (t[0],
'L'+t[1]) for t in connect.container.data['rr'].records.values ], key=lambda t: int(t[0]))
# Trim elements of set cut to two decimal places
cut_records = sorted([ '{:.2f}'.format(float(t[0])) for t in connect.container.data['cut'].records.values ], key=float)
labdata_records = [ (t[0],
'L'+t[1],
'{:.2f}'.format(float(t[2])),
t[-1]) for t in connect.container.data['raw'].records.values ]
connect.container.addSet('labid_mod', ['*'], records=labid_records)
connect.container.addSet('rr_mod', ['*']*2, records=rr_records)
connect.container.addSet('cut_mod', ['*'], records=cut_records)
connect.container.addParameter('labdata', ['*']*3, records=labdata_records)
- GAMSWriter:
symbols: [ {name: labid_mod, newName: labid}, {name: rr_mod, newName: rr}, {name: cut_mod, newName: cut}, {name: labdata} ]
$offEmbeddedCode
execute_unload 'labdata.gdx', labdata, cut, rr;
* Reintroduce 0 (zeros)
labdata(rr,cut) = labdata(rr,cut) + eps;
execute 'rm -f labdatanew.xlsx';
* Write new workbook with good table
EmbeddedCode Connect:
- GAMSReader:
symbols: [ {name: labdata} ]
- PandasExcelWriter:
file: labdatanew.xlsx
symbols:
- name: labdata
rowDimension: 2
range: ZAg!A4
endEmbeddedCode
option clear=rr, clear=labdata;
EmbeddedCode Connect:
- PandasExcelReader:
file: labdatanew.xlsx
symbols:
- name: labdata
rowDimension: 2
columnDimension: 1
range: ZAg!A4
- Projection:
name: labdata(code,labid,cut)
newName: rr(code,labid)
asSet: True
- GAMSWriter:
writeAll: True
endEmbeddedCode
execute_unload 'labdatanew.gdx', labdata, cut, rr;
execute.checkErrorLevel 'gdxdiff labdata.gdx labdatanew.gdx > %system.NullFile%';
Connect Example for CSV
The following example shows how to read and write CSV files in Connect. The entire code is listed at the end of the example. This model is part of DataLib as model connect02. It starts out with defining some data (stockprice
) in a table statement in GAMS. With compile-time embedded Connect code utilizing the GAMSReader agent to bring this data into Connect and exporting it as a CSV file with agent CSVWriter. The GDXWriter agent also creates a GDX file with the data which is then used in a subsequent call to feed gdxdump that produces the same CSV file as CSVWriter. The text comparison tool diff
is used to compare the two CSV files. The CSV file look as follows:
"date_0","AAPL","GOOG","MMM","MSFT","WMT" "2012-20-11",12.124061,314.008026,60.966354,21.068886,46.991535 "2112-20-11",12.139372,311.741516,60.731037,20.850344,47.150307 "2212-20-11",12.203673,313.674286,61.467381,20.890808,46.991535 "2312-20-11",12.350039,315.387848,62.401108,21.068886,47.626663 "2712-20-11",12.448025,318.929565,62.461876,21.076981,47.499634 "2812-20-11",12.328911,318.655609,61.604042,20.898905,47.420238 "2912-20-11",12.404848,320.000549,62.332813,21.060795,47.626663 "3012-20-11",12.401172,321.744019,62.044331,21.012224,47.444057
In remainder of the example this CSV file is read back via the Connect agent CSVReader. The code also utilizes the tool csv2gdx to read the CSV file into a GDX file. The code compares the results of both methods. Csv2gdx also creates sets with the index elements as Dim1
, Dim2
, ... Therefore, Connect utilizes the Projection agent to extract the index sets date
and symbol
from the parameter stockprice
as sets Dim1
and Dim2
. The Connect agent GDXWriter creates a GDX file of the Connect database which then can be compared with the GDX file created by csv2gdx. The GDX comparison tool gdxdiff is used to compare the two GDX files.
Set date,symbol;
Table stockprice(date<,symbol<)
AAPL GOOG MMM MSFT WMT
2012-20-11 12.124061 314.008026 60.966354 21.068886 46.991535
2112-20-11 12.139372 311.741516 60.731037 20.850344 47.150307
2212-20-11 12.203673 313.674286 61.467381 20.890808 46.991535
2312-20-11 12.350039 315.387848 62.401108 21.068886 47.626663
2712-20-11 12.448025 318.929565 62.461876 21.076981 47.499634
2812-20-11 12.328911 318.655609 61.604042 20.898905 47.420238
2912-20-11 12.404848 320.000549 62.332813 21.060795 47.626663
3012-20-11 12.401172 321.744019 62.044331 21.012224 47.444057
;
* Use Connect CSVWriter to write GAMS data in CSV format moving the symbol index into the column (unstack: True)
$onEmbeddedCode Connect:
- GAMSReader:
symbols: [ {name: stockprice} ]
- GDXWriter:
file: sp_connect.gdx
writeAll: True
- CSVWriter:
file: sp_connect.csv
name: stockprice
header: True
unstack: True
quoting: 2
$offEmbeddedCode
* Use gdxdump to create a CSV file and text compare the Connect and gdxdump CSV files
$call.checkErrorLevel gdxdump sp_connect.gdx output=sp_gdxdump.csv symb=stockprice format=csv columnDimension=Y > %system.NullFile%
$call.checkErrorLevel diff -q sp_connect.csv sp_gdxdump.csv > %system.nullFile%
* Use Connect CSVReader to read the newly created CSV file and deposit the result in a csv2gdx compatible format
$onEmbeddedCode Connect:
- CSVReader:
file: sp_connect.csv
name: stockprice
indexColumns: 1
valueColumns: "2:lastCol"
- Projection:
name: stockprice(date,symbol)
newName: Dim1(date)
asSet: True
- Projection:
name: stockprice(date,symbol)
newName: Dim2(symbol)
asSet: True
- GDXWriter:
file: sp_connect.gdx
writeAll: True
$offEmbeddedCode
* Use csv2gdx to create a GDX file and compare the Connect and csv2gdx GDX files
$call.checkErrorLevel csv2gdx sp_connect.csv output=sp_csv2gdx.gdx id=stockprice index=1 value=2..lastCol useHeader=y > %system.NullFile%
$call.checkErrorLevel gdxdiff sp_connect.gdx sp_csv2gdx.gdx > %system.NullFile%
Text Substitutions in YAML File
In many cases one would like to parameterize the text in the Connect instruction file. For example, some of the Connect agents require a file name. Instead of hard coding the file name into the YAML instructions, text substitutions allow to have a place holder for the attribute that is substituted out before giving the instructions to Connect. The place holder in the YAML file uses the syntax %SOMETEXT%
, similar to the GAMS compile-time variables. For example:
- CSVReader: file: %MYFILENAME% name: distance indexColumns: [1, 2] valueColumns: [3]
Depending on how Connect runs, the substitution is done in various ways. The section Substitutions in Embedded Connect Code described the substitution mechanisms for embedded Connect code. When Connect is initiated via the command line parameters connectIn or connectOut, the user defined parameter specified by double-dash command line parameters and the given GAMS command line parameters, e.g. %gams.input%
will be substituted in the YAML file. The list of parameters available for substitution is printed to the GAMS log at the beginning of the job in the section GAMS Parameters defined
.
When Connect is initiated via the shell command gamsconnect
all substitutions need to be specified on the command line:
gamsconnect myci.yaml key1=val1 key2=val2 ...
key
can be just MYFILENAME
or be composed like gams.Input
or system.dirSep
.
Use Connect Agents in Custom Python Code
Instead of passing instructions via one of the Connect interfaces, users can execute tasks directly in their Python code by creating an instance of ConnectDatabase
and calling method .exec_task(task)
. The task
argument is expected to be a Python dictionary of form:
{ '<agent name>': { '<root option1>': <value>, '<root option2>': <value>, ... , '<root option3>': [ { '<option1>': <value>, '<option2>': <value>, ... }, { '<option1>': <value>, '<option2>': <value>, ... }, ... ] } }
Users can either construct the Python dictionary themselves or let YAML create the dictionary from a YAML script. The following example creates an instance of ConnectDatabase and executes two tasks: First, the CSV file stockprice.csv
is read into the Connect database and second, the symbol stockprice
is written to the GAMS database. In this example, the tasks are directly specified as Python dictionaries.
Set dates, stocks;
Parameter stockprice(dates<,stocks<);
$onEcho > stockprice.csv
date;symbol;price
2016/01/04;AAPL;105,35
2016/01/04;AXP;67,59
2016/01/04;BA;140,50
$offEcho
$onEmbeddedCode Python:
from gams.connect import ConnectDatabase
cdb = ConnectDatabase(gams._system_directory, gams)
cdb.exec_task({'CSVReader': {'file': 'stockprice.csv', 'name': 'stockprice', 'indexColumns': [1, 2],
'valueColumns': [3], 'fieldSeparator': ';', 'decimalSeparator': ','}})
cdb.exec_task({'GAMSWriter': {'symbols': [{'name': 'stockprice'}]}})
$offEmbeddedCode
display stockprice;
We can also construct the Python dictionaries by using YAML:
Set dates, stocks;
Parameter stockprice(dates<,stocks<);
$onEcho > stockprice.csv
date;symbol;price
2016/01/04;AAPL;105,35
2016/01/04;AXP;67,59
2016/01/04;BA;140,50
$offEcho
$onEmbeddedCode Python:
import yaml
from gams.connect import ConnectDatabase
cdb = ConnectDatabase(gams._system_directory, gams)
inst = yaml.safe_load('''
- CSVReader:
file: stockprice.csv
name: stockprice
indexColumns: [1, 2]
valueColumns: [3]
fieldSeparator: ';'
decimalSeparator: ','
- GAMSWriter:
symbols:
- name: stockprice
''')
for task in inst:
cdb.exec_Task(task)
$offEmbeddedCode
display stockprice;
Here YAML creates a list of dictionaries (i.e. a list of tasks) from the given YAML script.
Command Line Utility gamsconnect
The GAMS system directory contains the utility gamsconnect
to run Connect instructions directly from the command line. On Windows the utility has the callable extension .cmd
. This script wraps the Python script connectdriver.py
by calling the Python interpreter that ships with GAMS. gamsconnect
operates as the other Connect drivers on a YAML instruction file. The agents GAMSReader and GAMSWriter are not available from gamsconnect
and will trigger an exception. Substitutions can be passed to gamsconnect
via command line arguments as key=value
, e.g. filename=myfile.csv
and even gams.scrdir=/tmp/
. gamsconnect
is called like this:
gamsconnect <YAMLFile> [key1=value1 [key2=value2 [key3=value3 [...]]]]