Transfer

transfer is a tool to maintain GAMS data outside a GAMS script in a programming language like Python or Matlab. It allows the user to add GAMS symbols (Sets, Aliases, Parameters, Variables and Equations), to manipulate GAMS symbols, as well as read/write symbols to different data endpoints. transfer’s main focus is the highly efficient transfer of data between GAMS and the target programming language, while keeping those operations as simple as possible for the user. In order to achieve this, symbol records – the actual and potentially large-scale data sets – are stored in native data structures of the corresponding programming languages. The benefits of this approach are threefold: (1) The user is usually very familiar with these data structures, (2) these data structures come with a large tool box for various data operations, and (3) optimized methods for reading from and writing to GAMS can transfer the data as a bulk – resulting in the high performance of this package. This documentation describes, in detail, the use of transfer within a Python environment.

Data within transfer will be stored as Pandas DataFrame. The flexible nature of Pandas DataFrames makes them ideal for storing/manipulating sparse data. Pandas includes advanced operations for indexing and slicing, reshaping, merging and even visualization.

Pandas also includes a number of advanced data I/O tools that allow users to generate DataFrames directly from CSV (.csv), JSON (.json), HTML (.html), Microsoft Excel (.xls, .xlsx), SQL , pickle (.pkl), SPSS (.sav, .zsav), SAS (.xpt, .sas7bdat), etc.

Centering transfer around the Pandas DataFrame gives GAMS users (on a variety of platforms – macOS, Windows, Linux) access to tools to move data back and forth between their favorite environments for use in their GAMS models.

The goal of this documentation is to introduce the user to transfer and its functionality. This documentation is not designed to teach the user how to effectively manipulate Pandas DataFrames; users seeking a deeper understanding of Pandas are referred to the extensive documentation.

Experienced GAMS API users seeking detailed documentation and examples are directed to Main Classes and Additional Topics.

Recommended Import

Users can access the transfer sub-module with either of the following (equivalent) import statements once the GAMS Python API has been installed:

import gams.transfer as gt
from gams import transfer as gt

Design

Storing, manipulating, and transforming sparse data requires that it lives within an environment – this data can then be linked together to enable various operations. In transfer we refer to this "environment" as the Container, it is the main repository for storing and linking our sparse data. Symbols can be added to the Container from a variety of GAMS starting points but they can also be generated directly within the Python environment using convenient function calls that are part of the transfer package; a symbol can only belong to one container at a time.

The process of linking symbols together within a container was inspired by typical GAMS workflows but leverages aspects of object oriented programming to make linking data a natural process. Linking data enables data operations like implicit set growth, domain checking, data format transformations (to dense/sparse matrix formats), etc – all of these features are enabled by the use of ordered pandas.CategoricalDtype data types. All of these details will be discussed in the following sections.

Naming Conventions

Methods – functions that operate on a object – are all verbs (i.e., getMaxAbsValue(), getUELs(), etc.) and use camel case for identification purposes. Methods are, by convention, tools that "do things"; that is they involve some, potentially expensive, computations. Some transfer methods accept arguments, while others are simply called using the () notation. Plural arguments (columns) hint that they can accept lists of inputs (i.e., a list of symbol names) while singular arguments (column) will only accept one input at a time.

Properties – inherent attributes of an object – are all nouns (i.e., name, number_records, etc.) and use snake case (lower case words separated by underscores) for identification purposes. Object properties (or "object attributes") are fundamental to the object and therefore they are not called like methods; object properties are simply accessed by other methods or user calls. By convention, properties only require trival amounts of computation to access.

Classes – the basic structure of an object – are all singular nouns and use camel case (starting with a capital first letter) for identification purposes.

Install

The user must download and install the latest version of GAMS in order to install transfer. transfer is installed when the GAMS Python API is built and installed. The user is referred HERE for instructions on how to install the Python API files. transfer and all GAMS Python API files are compatible with environment managers such as Anaconda.

Examples

GDX Read

Reading in all symbols can be accomplished with one line of code (we reference data from the `trnsport.gms` example).

import gams.transfer as gt
m = gt.Container("trnsport.gdx")

All symbol data is organized in the data attribute – m.data[<symbol_name>].records (the Container is also subscriptable, m[<symbol_name>].records is an equivalent statement) – records are stored as Pandas DataFrames.

Write Symbol to CSV

Writing symbol records to a CSV can also be accomplished with one line.

m["x"].records.to_csv("x.csv")

Write a New GDX

There are six symbol classes within transfer: 1) Sets, 2) Parameters, 3) Variables, 4) Equations, 5) Aliases and 6) UniverseAliases. For purposes of this quick start, we show how to recreate the distance data structure from the `trnsport.gms` model (the parameter d). This brief example shows how users can achieve "GAMS-like" functionality, but within a Python environment – transfer leverages the object oriented programming to simplify syntax.

import gams.transfer as gt
import pandas as pd
m = gt.Container()
# create the sets i, j
i = gt.Set(m, "i", records=["seattle", "san-diego"], description="supply")
j = gt.Set(m, "j", records=["new-york", "chicago", "topeka"], description="markets")
# add "d" parameter -- domain linked to set objects i and j
d = gt.Parameter(m, "d", [i, j], description="distance in thousands of miles")
# create some data as a generic DataFrame
dist = pd.DataFrame(
[
("seattle", "new-york", 2.5),
("seattle", "chicago", 1.7),
("seattle", "topeka", 1.8),
("san-diego", "new-york", 2.5),
("san-diego", "chicago", 1.8),
("san-diego", "topeka", 1.4),
],
columns=["from", "to", "thousand_miles"],
)
# setRecords will automatically convert the dist DataFrame into a standard DataFrame format
d.setRecords(dist)
# write the GDX
m.write("out.gdx")

This example shows a few fundamental features of transfer:

  1. An empty Container is analogous to an empty GDX file
  2. Symbols will always be linked to a Container (notice that we always pass the Container reference m to the symbol constructor)
  3. Records can be added to a symbol with the setRecords() method or through the records constructor argument (internally calls setRecords()). transfer will convert many common Python data structures into a standard format.
  4. Domain linking is possible by passing domain set objects to other symbols
  5. Writing a GDX file can be accomplished in one line with the write() method.

Full Example

It is possible to use everything we now know about transfer to recreate the `trnsport.gms` results in GDX form. As part of this example we also introduce the write method (and generate new.gdx). We will discuss it in more detail in the following section: GDX Read/Write.

import gams.transfer as gt
# create an empty Container object
m = gt.Container()
# add sets
i = gt.Set(m, "i", records=["seattle", "san-diego"], description="supply")
j = gt.Set(m, "j", records=["new-york", "chicago", "topeka"], description="markets")
# add parameters
a = gt.Parameter(m, "a", ["*"], description="capacity of plant i in cases")
b = gt.Parameter(m, "b", j, description="demand at market j in cases")
d = gt.Parameter(m, "d", [i, j], description="distance in thousands of miles")
f = gt.Parameter(
m, "f", records=90, description="freight in dollars per case per thousand miles"
)
c = gt.Parameter(
m, "c", [i, j], description="transport cost in thousands of dollars per case"
)
# set parameter records
cap = pd.DataFrame([("seattle", 350), ("san-diego", 600)], columns=["plant", "n_cases"])
a.setRecords(cap)
dem = pd.DataFrame(
[("new-york", 325), ("chicago", 300), ("topeka", 275)],
columns=["market", "n_cases"],
)
b.setRecords(dem)
dist = pd.DataFrame(
[
("seattle", "new-york", 2.5),
("seattle", "chicago", 1.7),
("seattle", "topeka", 1.8),
("san-diego", "new-york", 2.5),
("san-diego", "chicago", 1.8),
("san-diego", "topeka", 1.4),
],
columns=["from", "to", "thousand_miles"],
)
d.setRecords(dist)
# c(i,j) = f * d(i,j) / 1000;
cost = d.records.copy(deep=True)
cost["value"] = f.records.loc[0, "value"] * cost["value"] / 1000
c.setRecords(cost)
# add variables
q = pd.DataFrame(
[
("seattle", "new-york", 50, 0),
("seattle", "chicago", 300, 0),
("seattle", "topeka", 0, 0.036),
("san-diego", "new-york", 275, 0),
("san-diego", "chicago", 0, 0.009),
("san-diego", "topeka", 275, 0),
],
columns=["from", "to", "level", "marginal"],
)
x = gt.Variable(
m, "x", "positive", [i, j], records=q, description="shipment quantities in cases",
)
z = gt.Variable(
m,
"z",
records=pd.DataFrame(data=[153.675], columns=["level"]),
description="total transportation costs in thousands of dollars",
)
# add equations
cost = gt.Equation(m, "cost", "eq", description="define objective function")
supply = gt.Equation(m, "supply", "leq", i, description="observe supply limit at plant i")
demand = gt.Equation(m, "demand", "geq", j, description="satisfy demand at market j")
# set equation records
cost.setRecords(
pd.DataFrame(data=[[0, 1, 0, 0]], columns=["level", "marginal", "lower", "upper"])
)
supplies = pd.DataFrame(
[
("seattle", 350, "eps", float("-inf"), 350),
("san-diego", 550, 0, float("-inf"), 600),
],
columns=["from", "level", "marginal", "lower", "upper"],
)
supply.setRecords(supplies)
demands = pd.DataFrame(
[
("new-york", 325, 0.225, 325),
("chicago", 300, 0.153, 300),
("topeka", 275, 0.126, 275),
],
columns=["from", "level", "marginal", "lower"],
)
demand.setRecords(demands)
m.write("new.gdx")

Extended Examples

Get HTML data

import gams.transfer as gt
import pandas as pd
url = "https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list"
dfs = pd.read_html(url)
# pandas will create a list of dataframes depending on the target URL, we just need the first one
df = dfs[0]
m = gt.Container()
b = gt.Set(m, "b", ["*"], records=df["Bank NameBank"].unique(), description="Bank Name")
s = gt.Set(
m,
"s",
["*"],
records=df["StateSt"].sort_values().unique(),
description="States (alphabetical order)",
)
c = gt.Set(
m,
"c",
["*"],
records=df["CityCity"].sort_values().unique(),
description="Cities (alphabetical order)",
)
c_to_s = gt.Set(
m,
"c_to_s",
[c, s],
records=df[["CityCity", "StateSt"]]
.drop_duplicates()
.sort_values(by=["StateSt", "CityCity"]),
description="City/State pair",
)
bf = gt.Parameter(
m,
"bf",
b,
records=df[["Bank NameBank", "FundFund"]]
.drop_duplicates(subset="Bank NameBank")
.sort_values(by=["Bank NameBank"]),
description="Bank Namd & Fund #",
)
In [1]: m.isValid()
Out[1]: True
Note
Users can chain Pandas operations together and pass those operations through to the records argument or the setRecords method.

Get PostgreSQL data (w/ sqlalchemy)

import gams.transfer as gt
from sqlalchemy import create_engine
import pandas as pd
# connect to postgres (assuming a localhost)
engine = create_engine("postgresql://localhost:5432/" + <database_name>)
df = pd.read_sql(<sql_table_name>, con=engine, index_col=0)
# create the Container and add symbol
m = Container()
p = Parameter(m, <sql_table_name>)
# we need to figure out the symbol dimensionality (potentially from the shape of the dataframe)
r, c = df.shape
p.dimension = c - 1
# set the records
p.setRecords(df)
# write out the GDX file
m.write("out.gdx")