GDX2SQLITE

A Tool to dump GDX contents into SQLite database file

Author
Erwin Kalvelagen
Date
November 30, 2015

Introduction

GDX2SQLITE.EXE is a tool to dump the complete contents of a GAMS GDX file (see GAMS Data eXchange (GDX)) into a SQLite database file (the website http://www.sqlite.org/ contains a wealth of information on SQLite).

A SQLite database is stored in a single file so it can be easily e-mailed or otherwise transmitted. The main advantages of using SQLite over other single file database systems such as MS Access is that SQLite is free and in the public domain and that it does not impose a 2 GB file size limit. For some large data sets this size limit present in MS Access and DBF database files causes problems. Another useful format is CSV files, but typically several CSV files are needed to store a data set stored in a GDX file. Many programs support reading SQLite database files, either through a native database access driver or via a standard ODBC interface. In summary: SQLite is a useful export format for GAMS solution data sets.

Usage

'GDX2SQLITE' is a command line tool that is best called from within a GAMS program using the $call or Execute statement, e.g.:

execute_unload "results.gdx", yield, price;
execute "gdx2sqlite -i results.gdx -o results.db";

The following options are available:

Options Description
-i gdxinputfile Specifies the input GDX file. Typically this is a file with a .gdx extension
-o sqloutputfile Specifies the output SQLite database. Typically this file has a .db extension.
-debug This is an optional flag that will cause gdx2sqlite to print additional debugging information.
-expltext This optional flag will export explanatory text for set elements.
-append Don't delete the database file before processing. This will allow adding new symbols in new tables. We will not allow adding data to existing tables.
-small Write data strings in a separate table. A user-friendly SQL VIEW is created to hide the complexities of the joins.
-fast Try to speed up writing the data using some non-standard pragmas. Using both -small -fast will write the data most efficiently.
-varchar String columns will have the type VARCHAR(255) instead of TEXT.

An example of explanatory text is:

set cty /
  AFG  Afghanistan
  AGO  Angola
  ALB  Albania
/;

In GAMS set elements have a maximum length of 63 characters. Explanatory text has a maximum length of 255 characters.

How data is stored

Gams issues

GAMS does not store zero values (or default records for variables and equations). Such non-existing records will not be exported to the GDX file and to the database either. To force a zero to be exported, set it to EPS in GAMS. E.g.:

p(i)$(p(i)=0) = EPS;

In case of doubt you are encouraged to inspect the GDX file.

Sets

n-dimensional sets are stored as tables with n text columns. In case the option -expltext is used, another column may be added with explanatory text.

GAMS SQLite
set
  month /jan,feb,mar/
  year /2013,2014/
  date(year,month) /(2013,2014).(jan,feb,mar) /
;
execute_unload "sets.gdx";
execute "gdx2sqlite -i sets.gdx -o sets.db";      
sqlite> select * from month;
month
----------
jan
feb
mar
sqlite> select * from year;
year
----------
2013
2014
sqlite> select * from date;
year        month
----------  ----------
2013        jan
2013        feb
2013        mar
2014        jan
2014        feb
2014        mar
set
  month /jan january
         feb fabruary
         mar march /
;
execute_unload "sets.gdx";
execute "gdx2sqlite -i sets.gdx -o sets.db -expltext";
sqlite> select * from month;
month       expltext
----------  ----------
jan         january
feb         fabruary
mar         march

Parameters

n-dimensional parameters will have n index columns plus a value column. Scalars are collected in a separate table.

GAMS SQLite
set i /i1*i4/;
parameter p(i); p(i) = uniform(0,1);
scalars
  s1 /10/
  s2 /20/
;
execute_unload "data.gdx";
execute "gdx2sqlite -i data.gdx -o data.db";
sqlite> select * from p;
i           value
----------  -----------
i1          0.171747132
i2          0.843266708
i3          0.550375356
i4          0.301137904
sqlite> select * from scalars;
name        value
----------  ----------
s1          10.0
s2          20.0

Variables and Equations

n-dimensional variables and equations have besides n index columns also columns for the level, the lower and upper-bound and the marginal. Scalars are collected in the tables scalarvariables and scalarequations. Note that INF and -INF are mapped to 1.0e100 and -1.0e100. The special value EPS is exported as zero. To be complete: UNDEF, NA and acronyms are exported as NULLs.

GAMS SQLite
set i /i1*i4/;
positive variable x(i);
x.l(i) = uniform(0,1);
variable z;
z.m=1;
execute_unload "data.gdx";
execute "gdx2sqlite -i data.gdx -o data.db";
sqlite> select * from x;
i           level        lo          up          marginal
----------  -----------  ----------  ----------  ----------
i1          0.171747132  0.0         1.0e+100    0.0
i2          0.843266708  0.0         1.0e+100    0.0
i3          0.550375356  0.0         1.0e+100    0.0
i4          0.301137904  0.0         1.0e+100    0.0
sqlite> select * from scalarvariables;
name        level       lo          up          marginal
----------  ----------  ----------  ----------  ----------
z           0.0         -1.0e+100   1.0e+100    1.0

Fixing up names

A database table is not allowed to have columns with the same name. If a name clash is detected new names may be invented.

GAMS SQLite
set i /i1*i4/;
parameter p(i,i);
p(i,i) = 1;
execute_unload "data.gdx";
execute "gdx2sqlite -i data.gdx -o data.db";
sqlite> select * from p;
i           i2          value
----------  ----------  ----------
i1          i1          1.0
i2          i2          1.0
i3          i3          1.0
i4          i4          1.0

Speeding up writing data

With the -small option we write data in a slightly different format. Instead of using strings for the GAMS indices we write integers. The integers can be looked up in a separate table UEL$ were the GAMS UELS (Unique Elements) are stored. We export an SQL view for each symbol to hide the complexities of the joins needed to replace the integers by strings.

For more information see: http://yetanothermathprogrammingconsultant.blogspot.com/2014/06/big-data-cubes.html.

The -fast option will set some SQLite pragmas that can speed up the inserts. Basically they will give up some consistency in case the program crashes, in which case the created database may be invalid.

For more information see: http://yetanothermathprogrammingconsultant.blogspot.com/2014/07/a-little-bit-extra-fine-tuning.html.

SQLite Browsers and compatible software

SQLite3.exe

From the distribution on http://www.sqlite.org/download.html a command line tool is available that functions as shell for SQLite. An example session can look like:

C:\projects\impact3\sqlite>sqlite3.exe data.db
SQLite version 3.8.0.2 2013-09-03 17:11:13
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .schema
CREATE TABLE [i]([i] TEXT);
CREATE TABLE [p]([i] TEXT,[i2] TEXT,[value] REAL);
sqlite> select * from p;
i1|i1|1.0
i2|i2|1.0
i3|i3|1.0
i4|i4|1.0
sqlite> .quit

C:\projects\impact3\sqlite>

SQLite Studio

A visual front-end can be downloaded from http://sqlitestudio.pl/.

image001.png

SQLite Database Browser

Another visual browser is available from http://sqlitebrowser.sourceforge.net/.

image002.png

SQLite and R

The statistical software R can conveniently use SQLite data, and can be called from a GAMS environment as follows:

$ontext

    Get data from GAMS into R via SQLite

$offtext

$set IMPACTPATH     c:\projects\impact3\impact_3\IMPACTv3.0
$set SCRIPT         script.R
$set RPATH          "C:\Program Files\R\R-3.0.2\bin\R.exe"
$set DB             mapdata.db

set cty;
$gdxin %IMPACTPATH%\GDXs\Sets.gdx
$load cty

display cty;

set maps /
  data1  "uniform random data between 0 and 1"
  data2  "uniform random data between 0 and 2"
/;

parameter mapdata(cty,maps);

mapdata(cty,"data1") = uniform(0,1);
mapdata(cty,"data2") = uniform(0,2);

execute_unload "mapdata.gdx";
execute "gdx2sqlite -i mapdata.gdx -o %DB% -expltext";
execute '"%RPATH%" --vanilla < %SCRIPT%';

$onecho > %SCRIPT%

if (!require(RSQLite)) {
   install.packages("RSQLite", repos="http://cran.r-project.org")
   library(RSQLite)
}

sqlite<-dbDriver("SQLite")
db <- dbConnect(sqlite,"%DB%")
dbListTables(db)

maps<-dbGetQuery(db,"select * from maps")
maps

mapdata<-dbGetQuery(db,"select * from mapdata")
mapdata

$offecho

SQLite and Python

Python has built-in support for SQLite:

import sqlite3

db = sqlite3.connect("turkey.db")

c = db.execute("select * from yieldl")
for row in c:
     print row

db.close()

SQLite ODBC Driver

ODBC is a database access layer for Windows. It allows many Windows programs that need to talk to databases to do this in a database independent manner. The SQLite ODBC driver can be downloaded from: http://www.ch-werner.de/sqliteodbc/.

SQLite and Excel

Excel can read SQLite database files through ODBC.

Import as Table

To import a table from a SQLite database perform the following steps:

  1. Select the Data tab and choose Get External Data From Other Sources
    excel1.png
  2. Choose the Data Connection Wizard
    excel2.png
  3. Choose ODBC DSN
    excel3.png
  4. Select SQLite3 Datasource
    excel4.png
  5. Enter the name and path of the database file and press OK.
    excel5.png
  6. Choose a table from the database.
    excel6.png
  7. Give this selection a name.
    excel7.png
  8. Choose Import as Table
    excel8.png

The result is a table:

excel9.png

Import as Pivot Table

The same steps can be used to import as Pivot Table. This way we can easily create summary reports, such as:

excel10.png