seders.gms : ERS Data Manipulations with SED

Description

Data files are often not in the form that can be read by another
program and/or transformations have to be made. The example below
shows how to manipulate an input output matrix in comma delimited
form. A sample file is shown below:

BAS1,1 Dairyfarmprd,2 PoultryEggs,3 Meatanimals,4 livestckmisc, ... 512 Wat2,513 Air2,Total,
1 Dairyfarmprd,0,0,265.7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, ... 0,0,0,0,0,0,0,0,0,26452.6,
2 PoultryEggs,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,523.8,0,0, ... 0,0,0,0,0,0,,17623.5,
...
503 Air2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1.2,0.7,1.7,0.3, ... 8,18859.6,470.4,0,0.5,5.4,33699.3,
Total,18728.2,14814.9,50839.5,2670.9,2693.7,3421.0,1831, ... ,3593.4,12052.2,6520145.5,

This file is the result of a routine process to take a raw file from a
web site and attach custom labels to rows and columns
( https://www.bea.gov/industry/io_benchmark.htm ).  The rows and
columns have the form '<number> space <text>'. Some of the rows have
been dropped and the <number> on the row labels does not match the
corresponding column label.  Note that the row named Air2 has number
503, whereas the column Air2 has number 513.  Since the aggregations
are done by column numbers, the row numbers need to be reassigned to
use the column order. Furthermore, there are three row labels that
have no matching column labels.

Finally, the matrix needs to be aggregated from a size of 513 to a
much smaller size.

We use the unix tool SED to manipulate the text files into a format
suitable to be read by GAMS. For more details on SED please an SED
book.


References

  • Somwaru, A, Problem contribution, ERS, US Department of Agriculture, Washington, DC.
  • Doufherty, D, and Robbins, A, SED and AWK. O'Reilly & Associates, Sebastobol, 1997.

Small Model of Type : GAMS


Category : GAMS Model library


Main file : seders.gms   includes :  f001.inc

$title ERS Data Manipulations with SED (SEDERS,SEQ=297)
$ontext

Data files are often not in the form that can be read by another
program and/or transformations have to be made. The example below
shows how to manipulate an input output matrix in comma delimited
form. A sample file is shown below:

BAS1,1 Dairyfarmprd,2 PoultryEggs,3 Meatanimals,4 livestckmisc, ... 512 Wat2,513 Air2,Total,
1 Dairyfarmprd,0,0,265.7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, ... 0,0,0,0,0,0,0,0,0,26452.6,
2 PoultryEggs,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,523.8,0,0, ... 0,0,0,0,0,0,,17623.5,
...
503 Air2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1.2,0.7,1.7,0.3, ... 8,18859.6,470.4,0,0.5,5.4,33699.3,
Total,18728.2,14814.9,50839.5,2670.9,2693.7,3421.0,1831, ... ,3593.4,12052.2,6520145.5,

This file is the result of a routine process to take a raw file from a
web site and attach custom labels to rows and columns
( https://www.bea.gov/industry/io_benchmark.htm ).  The rows and
columns have the form '<number> space <text>'. Some of the rows have
been dropped and the <number> on the row labels does not match the
corresponding column label.  Note that the row named Air2 has number
503, whereas the column Air2 has number 513.  Since the aggregations
are done by column numbers, the row numbers need to be reassigned to
use the column order. Furthermore, there are three row labels that
have no matching column labels.

Finally, the matrix needs to be aggregated from a size of 513 to a
much smaller size.

We use the unix tool SED to manipulate the text files into a format
suitable to be read by GAMS. For more details on SED please an SED
book.


Doufherty, D, and Robbins, A, SED and AWK. O'Reilly & Associates,
Sebastobol, 1997.

This problem is due to: Agapi Somwaru, ERS, US Department of
Agriculture, Washington, DC

$offtext

$set raw f001.inc
$set out product

* We need the Posix compatible sed
$set sed sed
$ifi %system.platform% == "sol" $set sed /usr/xpg4/bin/sed

* 1. Extract column headers and write to colnumber

$onecho > sedscript
# insert before line 1
1i\
$offlisting
# drop first token
s/^[^,]*,//
# drop last item followed by comma
s/,[^,]*,$//
# replace comma with newline
s/,/\
/g
# quit after first line
1q
$offecho

$call %sed% -f sedscript %raw% > colnumber

* 2. Extract row names write to rowname

$onecho > sedscript
# insert before line 1
1i\
$offlisting
# drop first line
1d
# drop last line
$d
# remove row number
s/^[0-9]* //
# remove everything following the comma
s/,.*$//
$offecho

$call %sed% -f sedscript %raw% > rowname


* 3. Prepare matrix and write to mat1 using row names and column numbers

$onecho > sedscript
# insert before line 1
1i\
$offlisting\
$ondelim
# drop trailing comma
s/,$//
# remove the column names following the numbers on first line
1s/ [^,]*,/,/g
# remove numbers on other rows
1!s/^[^ ]* //
#insert after last line
$a\
;
$offecho

$call %sed% -f sedscript %raw% > mat1

* 4. Read transformed data and perform some basic checks

sets   j column numbers /
$      include colnumber
              /
       row row names /
$      include rowname
            /
      col column names  /
$         call %sed% -e "s/^[0-9]* //"  colnumber > temp
$         include temp
                       /
     jmap(j,col) number name map /
$                ondelim
$                include colnumber
$                offdelim
                          / ;

table mat1(*,*) data including row and column totals
$ include mat1

parameter rowtot(row),rowdiff(row),coltot(j),coldiff(j);

rowtot(row) = sum(j, mat1(row,j));
coltot(j) = sum(row, mat1(row,j));
rowdiff(row) = rowtot(row) - mat1(row,'total');
coldiff(j) = coltot(j) - mat1('total',j);

*display rowdiff,coldiff;

* check to make sure the totals are OK, note the rounding!

parameter rowerr(row) relative row errors
          colerr(j)   relative column errors ;

rowerr(row) = round(rowdiff(row)/(1+rowtot(row)),1);
colerr(j)   = round(coldiff(j)/(1+coltot(j)),1);

display rowerr,colerr;

abort$(card(rowerr)+card(colerr)) 'likely data errors';

* update row and column totals

mat1(row,'total') = rowtot(row);
mat1('total',j)   = coltot(j);

sets zerorows(row) rows with zero total;

zerorows(row) = rowtot(row) = 0;

display zerorows;

* 5. Find additional row names and add to row OthSLGentpr

set rowx(*) additional row names;

rowx(row) = yes - col(row); display rowx;

mat1('OthSLGentpr',j) = mat1('OthSLGentpr',j) + sum(rowx,
mat1(rowx,j));
mat1(rowx,j) = 0;

* 6. Map the rownames into the proper column numbers and store in mat2

alias (i,j);

Parameter mat2(i,j) matrix using column numbers;

mat2(i,j) = sum(jmap(i,col), mat1(col,j));

* 7. Final aggregation mapping and tests

    SET Iagg  AGGREGATION OF IC SECTORS /

ADRY  DAIRYFARPRD
AOMT  POULTRY
ACTL  MEATANIM
ACOT  COTTON
AFOD  FOODGRAIN

AFED  FEEDGRAIN
ATOB  TOBACCO
AFRT  FRUITS AND TREENUTS
AVEG  VEG AND GREENHOUSE
ASUG  SUGARCROPS

AOCR  OTHERCROPS
AOIS  OILBEARCROPS
ACMP  CATTLE PROD PROC
AOMP  OTHER MEAT PROC
ADRP  DAIRY PROD PROC

AFOP FOOD GRAIN PROC
AFEP FEED GRAIN PROC
ASUP SUGAR PROD PROC
AFSH FISH PRE PROC
AOIP OIL PROD PROC
AFRP FRUIT PROD PROC

ABEV BEVERAGES
ATOP TOBACCO PROC
AFDP OTHER FOOD PROC
ATEX TEXTILE
AAPR APPAREL

ALTP LEATHER PROD PROC

AACH  AGRCHEMICALS
AFFS  FOREST FISHING AGR SERV
AMNF  MANUFACTURING
ASRV  SERVICES
      /

    IaggMap(Iagg,J) aggregation Mapping  /

ADRY. (1)
AOMT. (2)
ACTL. (3,4)
ACOT. (5)
AFOD. (6)
AFED. (7)
ATOB. (9)
AFRT. (10,11)
AVEG. (12,17)
ASUG. (13)
AOCR. (8,14)

AOIS. (15)

AFFS. (16,18*21)
AACH. (30,179,180,181)
ACMP. (53,54)
AOMP. (55)
ADRP. (56*60)
AFSH. (61,66)
AFRP. (62,64*65,67)
AFOP. (69*71,76*78,96,97)
AFEP. (72*75)
ASUP. (79,82,80)
ABEV. (83*88,93)
AOIP. (89*92,94)
AFDP. (63,68,81,95,98)
ATOP. (99*102)
ATEX. (103*114)
AAPR. (115*127)
ALTP. (209*217)

AMNF.(22*29,31*52,128*178,182*208,218*407)
ASRV.(408*513)/ ;

set iaggTest(Iagg) unmapped aggregated sectors
    JZero(j)       sectors without mapping
    JTest(j)       sectors with multiple mappings;

iaggtest(iagg) =  sum(j$IaggMap(Iagg,j), 1) = 0;

JZero(j) =  sum(iagg$IaggMap(Iagg,j), 1) = 0;
JTest(j) =  sum(iagg$IaggMap(Iagg,j), 1) > 1;

display iaggtest,jzero,jtest;

abort$(card(iaggtest)+card(jzero)+card(jtest)) 'the aggregation mapping is incorrect';

* 8. Aggregate into amat and store all data in a gdx file

alias(iagg,jagg),(j,jj);

parameter amat(iagg,jagg) aggregate table;

amat(iagg,jagg) = sum((j,jj)$(iaggmap(iagg,j)*iaggmap(jagg,jj)), mat2(j,jj));
display amat;

execute_unload '%out%' i,amat;
$if %system.filesys% == UNIX $exit
$call msappavail -Excel
$if errorlevel 1 $exit

execute "gdxxrw %out%.gdx trace=0 par=amat rng=prod!b4"