CSV2GDX

Table of Contents

CSV2GDX is a program that can read a CSV file (comma separated values) and writes to a GDX file.

Many CSV files can be read by a GAMS program directly using a table statement as illustrated in Example 21, but a number of features available in CSV2GDX make it possible to read a CVS file where the table statement cannot be used.

Sample call:

CSV2GDX data.csv ID=ID1 UseHeader=y Index=(1,2,6) Values=(4,5)

Read the file data.csv and write the file data.gdx. The csv file has a header row, columns one, two and six are to be used for the index. The values can be found in the columns four and five with the labels on the first row of columns four and five used for the last index.

Usage

CSV2GDX filename parameters

filename

The input file; the .csv file extension is assumed when no extension has been specified. Reading of a compressed input file with an optional password is supported. The gzip program in the gbin sub-directory can be used to compress a file.

Parameters can also be read from a text file; the use of a file for parameters is indicated by preceding the file name with a @ (At sign.). When reading parameters from a text file, lines starting with an asterisk (*) will be ignored and act as a comment.

Parameters

ID=<id>

Identifier for the symbol in the gdx file. Additional symbols, dim1, dim2, ... will be added to the gdx file, representing the domain sets for the symbol id.

Output=<filename>

Optional output filename. If no output file is specified, the program will use the input file name and change the file extension to .gdx. If a path is not specified, the output file will be created in the current directory

Trace=<integer> (Default=1)

Sets the amount of information written to the log. Higher values will generate more output. Valid range is 0..3.

Index=<list of columns>

Identify columns to get UELS from. The columns are represented as a list of integers separated by commas. For example Index=(1,2,3,4); in this case the notation (1..4) is allowed. The order in which the columns are specified is used to specify the sequence for the unique elements in the key to store the data.

Value=<integer>

Specify the column to get the value from. See also DecimalSep below.

Values=<list of columns>

Specify header columns to use for UEL/Value. When using a list of columns for the values, the field in the first line of the column is used as the unique element to store the value in that column. See also UseHeader and AutoCol below. If the number of fields varies, the symbolic constant LastCol can be useful: Values=(2..LastCol).

UseHeader=<boolean>

Indicate if the first row is a header row. The fields in the header row are used as unique elements when the Values option is used. A header row is not needed or can be ignored when using the AutoCol parameter.

StoreZero=<boolean>

Indicate if zero values are ignored or written as EPS; an empty field is always ignored.

ColCount=<integer>

Number of columns in the input file. This parameter is required if there is no header line.

CheckDate=<boolean>

Write gdx file only if the csv file is more recent than the gdx file

AutoRow=<string>

Generate automatic UELs for each row. The AutoRow string is used as the prefix for the row label numbers. The generated unique elements will be used in the first index position shifting other elements to the right. Using AutoRow can be helpful when there are no labels that can be used as unique elements but also to store entries that would be a duplicate entry without a unique row label. See Example2.

AutoCol=<string>

Generate automatic UELs for each column. The AutoCol value is used as the prefix for the column label numbers. This option overrides the use of a header line.

FieldSep=[Comma, SemiColon, Tab]

Specify a field separator. Fields are normally separated by a comma, but this parameter allows for some additional choices.

DecimalSep=[Period, Comma]

Specify a decimal separator. The decimal is normally a period, but this parameter allows a comma as the decimal. Special values recognized are Eps, NA, Inf, and Undef. A string that is not recognized as a valid number will be stored as Undef.

Password=<string>

Password for an encrypted input file.

Examples

Example 1

Reading the file data.csv shown below:

one,two,three,four,five,six
red,red,1.1,2.2,3.3,red
red,red,4.4,5.5,Eps,green
red,green,7.7,8.8,9.9,blue
blue,blue,10,0,NA,purple
csv2gdx data.csv id=A Index=(1,2,6) Values=(3..5) UseHeader=Y StoreZero=Y

Resulting GDX file shown in the IDE:

clip0013.gif

Note the sets Dim1 to Dim4 that correspond to the domains of index positions one to four.

Assuming we want to declare the parameter A as: A(color, color, color, number) we can proceed as follows:

$call csv2gdx data.csv id=A Index=(1,2,6) Values=(3..5) UseHeader=Y StoreZero=Y
set color(*), number(*);
$gdxin data.gdx
$load  color=dim1
$loadm color=dim2
$loadm color=dim3
$load number=dim4
parameter A(color, color, color, number);
$load A
$gdxin
display color, number, A;

Example 2

Consider the following input file; note the duplicate key in the first two lines:

red,red,1
red,red,2
red,green,3
blue,blue,4

The following call uses the AutoRow parameter to add a unique row label to each row. This way a GAMS program can deal with duplicate entries and prepare for better error messages.

csv2gdx example22 id=A Index=(1,2) Value=3 ColCount=3 Autorow=Row
clip0014.gif