herves.gms : Herves (Transposable Element) Activity Calculations
This program takes raw data from an Excel sheet, transforms the data
and finds the optimal values for alpha and beta following a procedure
outlined by Stephen Wright. An integral is computed numerically
using a trapezoidal approximations. The optimization results are then
written back into the Excel file.
This procedure was originally coded by David O'Brochta in Mathematica
and recast in a GAMS optimization framework by Wilhelmine Meeraus.
The Excel file contains the data from the field and labwork described
in W Meeraus, 2004.
References
Stephen I Wright, Quang Hien Le, Daniel J Schoen and Thomas E Bureau,
Population Dynamics of an Ac-like Transposable Element in Self- and
Cross-Pollinating Arabidopsis, Genetics 158: 127-1288 (July 2001).
David O'Brochta, Private Communications, University of Maryland Biotechnology
Institute, 2004.
Wilhelmine H Meeraus, A Combined Fieldwork and Laboratory Investigation into
the Behaviour of the Herves Transposable Element, London School of Hygiene
& Tropical Medicine, Theses, 2004.
only run on Windows platforms
References:
- Wright, S I, Le, Q H, Schoen, D J, and Bureau T E, Population Dynamics of an Ac-like Transposable Element in Self- and Cross-Pollinating Arabidopsis. Genetics 158 (2001), 1279-1288.
- O'Brochta, D, Private Communications, 2004.
- Meeraus, W H, A Combined Fieldwork and Laboratory Investigation into the Behaviour of the Herves Transposable Element. Masters thesis, London School of Hygiene and Tropical Medicine, 2004.
Small Model of Type: DNLP Includes: hervesio.xls
$title Herves (Transposable Element) Activity Calculations (HERVES,SEQ=305)
$ontext
This program takes raw data from an Excel sheet, transforms the data
and finds the optimal values for alpha and beta following a procedure
outlined by Stephen Wright. An integral is computed numerically
using a trapezoidal approximations. The optimization results are then
written back into the Excel file.
This procedure was originally coded by David O'Brochta in Mathematica
and recast in a GAMS optimization framework by Wilhelmine Meeraus.
The Excel file contains the data from the field and labwork described
in W Meeraus, 2004.
References
Stephen I Wright, Quang Hien Le, Daniel J Schoen and Thomas E Bureau,
Population Dynamics of an Ac-like Transposable Element in Self- and
Cross-Pollinating Arabidopsis, Genetics 158: 127-1288 (July 2001).
David O'Brochta, Private Communications, University of Maryland Biotechnology
Institute, 2004.
Wilhelmine H Meeraus, A Combined Fieldwork and Laboratory Investigation into
the Behaviour of the Herves Transposable Element, London School of Hygiene
& Tropical Medicine, Theses, 2004.
$offtext
$eolcom //
* only run on Windows platforms
$if %system.filesys% == UNIX $exit
$call msappavail -Excel
$if errorlevel 1 $goto noExcel
$goto Excel
$label noExcel
$log Microsoft Excel is not available!
$exit
$label Excel
* you can use command line --xls=xxx and --sheet=xxx which
* will allow further automation
* the sheet names are aa,ag,am,Zu,Zag
$if NOT set xls $set xls 'hervesio.xls' // work book name
$if NOT set sheet $set sheet 'Ag' // sheet name
$set rng '%sheet%!A4' // starting range to read data
$set rows '%sheet%!A5' // starting range to read row ids
$set gdx 'temp.gdx' // name of temp GDX file
$set rngout '%sheet%_Opt!A1' // starting range to store results
* 1. Extract raw data from an xls sheet into a GDX file
* and load into GAMS. The result of this step are
* the set of samples i and the counts cnt. Empty or all
* zero rows and columns will be dropped automatically from
* the xls data.
sets imax max sample size / 1*50 /
i(imax) samples;
parameter cnt(imax) counts;
alias (*,code,LabId,Cut);
parameter raw(code,LabId,Cut) Raw Observations;
set rr(code,LabID) row labels including empty rows;
* empty rows will be reported
$call =gdxxrw "%xls%" output="%gdx%" trace=0 par=raw rng="%rng%" rdim=2 cdim=1 set=rr rng="%rows%" rdim=2 values=NoData
$if errorlevel 1 $abort Something wrong with reading raw data from %xls%
$gdxin "%gdx%" // open GDX file
$load raw rr // load data into gams
$gdxin // close gdx file
sets rid(code,labID) row identifier
cid(Cut) column identifier;
parameter ct column totals;
option rid < raw, cid < raw, rid:0:0:1; // map domains and set print option
abort$(card(rid) > card(imax)) 'size of imax is to small', imax, rid;
ct(cid) = sum(rid, raw(rid,cid));
display raw,ct;
set rrdiff all zero rows;
rrdiff(rr) = yes - rid(rr); display rrdiff;
i(imax) = ord(imax) <= card(rid);
cnt(imax) = sum(cid, ct(cid) = ord(imax));
display i,cnt;
* 2. Prepare model constants and set up numerical integration
* data sets.
parameter diploid sample size
nhat
bin(imax) binomial for diploid ;
diploid = card(i);
bin(i(imax)) = fact(card(i))/fact(card(i)-ord(imax))/fact(ord(imax));
* numerical integration data
set s integral discritization steps / s0*s1000 /
ss(s) excluding first element ;
parameter xx(s) values for steps s (0-1)
ws(s) weight for trapezoid approximation
deltaxx;
ss(s) = ord(s) > 1; // drop values for x = 0
deltaxx = 1/(card(s)-1);
xx(s) = (ord(s)-1)*deltaxx;
ws(s) = 1; ws(s)$(ord(s) = 1) = 0.5; ws(s)$(ord(s) = card(s)) = 0.5;
parameter t0(s) intermediate value one
t1(imax,s) intermediate value two;
t0(s) = sqr(xx(s))+ 2*xx(s)*(1 - xx(s));
t1(i(imax),s) = ws(s)*power(t0(s),ord(imax))*power(1 - t0(s),card(i)-ord(imax));
* 3. Define optimization model and set relevant bounds.
* Note that not all parameters are known at this point.
variables chi sum of CHI squares
a alpha values
b beta values
est(imax) estimates
equations defchi definition of CHI
defest(imax) definition of estimates;
defchi.. chi =e= sum(i, sqr(cnt(i)-est(i))/est(i));
defest(i).. est(i) =e= nhat*(a+b)/a/beta(a,b)*bin(i)*deltaxx*sum(s, t1(i,s)*xx(s)**(a-1)*(1-xx(s))**(b-1));
model m / all /;
* set bounds and initial values
a.lo = 0.01; a.up = 1.0;
b.lo = 1; b.up = 10;
a.l = .1; b.l = 5;
est.l(i) = 1;
option domlim=1000,limcol=0,limrow=0,solprint=on;
parameter rep(*,*) summary report;
* 4. First Optimization
nhat = 4/3*sum(i(imax), ord(imax)*cnt(i))/card(i);
solve m min chi us dnlp;
display diploid, nhat, a.l,b.l,chi.l;
rep(i ,'w-obs') = cnt(i);
rep(i ,'w-est') = est.l(i);
rep(i ,'w-chi') = sqr(cnt(i)-est.l(i))/est.l(i);
rep('total','w-chi') = sum(i, rep(i,'w-chi'));
rep('total','w-obs') = sum(i, cnt(i));
rep('alpha','w-est') = a.l;
rep('beta' ,'w-est') = b.l;
rep('nhat' ,'w-est') = nhat;
rep('SStat' ,'w-est') = m.solvestat;
rep('MStat' ,'w-est') = m.modelstat;
* 5. Second solve with modified counts
// use a cutoff of 70% and solve again
cnt(i(imax))$(ord(imax) > 0.7*card(i)) = 0;
nhat = 4/3*sum(i(imax), ord(imax)*cnt(i))/card(i);
display nhat;
solve m min chi us dnlp;
display diploid, nhat, a.l,b.l,chi.l;
rep(i ,'wo-obs') = cnt(i);
rep(i ,'wo-est') = est.l(i);
rep(i ,'wo-chi') = sqr(cnt(i)-est.l(i))/est.l(i);
rep('total','wo-obs') = sum(i, cnt(i));
rep('total','wo-chi') = sum(i, rep(i,'wo-chi'));
rep('alpha','wo-est') = a.l;
rep('beta' ,'wo-est') = b.l;
rep('nhat' ,'wo-est') = nhat;
rep('SStat' ,'wo-est') = m.solvestat;
rep('MStat' ,'wo-est') = m.modelstat;
display rep;
* 6. Update xls sheet with results in a new sheet
execute_unload "%gdx%", rep; abort$errorlevel 'unload to %gdx% failed';
execute '=gdxxrw "%gdx%" output="%xls%" trace=0 par=rep rng="%rngout%" rdim=1 cdim=1';
abort$errorlevel 'write to %xls% failed';