herves.gms : Herves (Transposable Element) Activity Calculations

Description

This program takes raw data from an Excel sheet, transforms the data
and finds the optimal values for alpha and beta following a procedure
outlined by Stephen Wright. An integral is computed numerically
using a trapezoidal approximations. The optimization results are then
written back into the Excel file.

This procedure was originally coded by David O'Brochta in Mathematica
and recast in a GAMS optimization framework by Wilhelmine Meeraus.

The Excel file contains the data from the field and labwork described
in W Meeraus, 2004.

References

   Stephen I Wright, Quang Hien Le, Daniel J Schoen and Thomas E Bureau,
   Population Dynamics of an Ac-like Transposable Element in Self- and
   Cross-Pollinating Arabidopsis, Genetics 158: 127-1288 (July 2001).

   David O'Brochta, Private Communications, University of Maryland Biotechnology
   Institute, 2004.

   Wilhelmine H Meeraus, A Combined Fieldwork and Laboratory Investigation into
   the Behaviour of the Herves Transposable Element, London School of Hygiene
   & Tropical Medicine, Theses, 2004.

  only run on Windows platforms


References

  • Wright, S I, Le, Q H, Schoen, D J, and Bureau T E, Population Dynamics of an Ac-like Transposable Element in Self- and Cross-Pollinating Arabidopsis. Genetics 158 (2001), 1279-1288.
  • O'Brochta, D, Private Communications, 2004.
  • Meeraus, W H, A Combined Fieldwork and Laboratory Investigation into the Behaviour of the Herves Transposable Element. Masters thesis, London School of Hygiene and Tropical Medicine, 2004.

Small Model of Type : DNLP


Category : GAMS Model library


Main file : herves.gms   includes :  hervesio.xls

$title Herves (Transposable Element) Activity Calculations  (HERVES,SEQ=305)
$ontext
  This program takes raw data from an Excel sheet, transforms the data
  and finds the optimal values for alpha and beta following a procedure
  outlined by Stephen Wright. An integral is computed numerically
  using a trapezoidal approximations. The optimization results are then
  written back into the Excel file.

  This procedure was originally coded by David O'Brochta in Mathematica
  and recast in a GAMS optimization framework by Wilhelmine Meeraus.

  The Excel file contains the data from the field and labwork described
  in W Meeraus, 2004.

References

   Stephen I Wright, Quang Hien Le, Daniel J Schoen and Thomas E Bureau,
   Population Dynamics of an Ac-like Transposable Element in Self- and
   Cross-Pollinating Arabidopsis, Genetics 158: 127-1288 (July 2001).

   David O'Brochta, Private Communications, University of Maryland Biotechnology
   Institute, 2004.

   Wilhelmine H Meeraus, A Combined Fieldwork and Laboratory Investigation into
   the Behaviour of the Herves Transposable Element, London School of Hygiene
   & Tropical Medicine, Theses, 2004.
$offtext
$eolcom //

* only run on Windows platforms

$if %system.filesys% == UNIX $exit
$call msappavail -Excel
$if errorlevel 1 $goto noExcel
$goto Excel
$label noExcel
$log Microsoft Excel is not available!
$exit
$label Excel


* you can use command line --xls=xxx and --sheet=xxx which
* will allow further automation

* the sheet names are aa,ag,am,Zu,Zag

$if NOT set xls   $set xls    'hervesio.xls' // work book name
$if NOT set sheet $set sheet  'Ag'           // sheet name

$set rng      '%sheet%!A4'  // starting range to read data
$set rows     '%sheet%!A5'  // starting range to read row ids
$set gdx      'temp.gdx'    // name of temp GDX file
$set rngout   '%sheet%_Opt!A1' // starting range to store results

*   1. Extract raw data from an xls sheet into a GDX file
*      and load into GAMS. The result of this step are
*      the set of samples i and the counts cnt. Empty or all
*      zero rows and columns will be dropped automatically from
*      the xls data.

sets imax max sample size  / 1*50 /
     i(imax) samples;
parameter cnt(imax) counts;


alias     (*,code,LabId,Cut);
parameter raw(code,LabId,Cut) Raw Observations;
set       rr(code,LabID) row labels including empty rows;

* empty rows will be reported

$call =gdxxrw "%xls%" output="%gdx%" trace=0 par=raw rng="%rng%" rdim=2 cdim=1 set=rr rng="%rows%" rdim=2 values=NoData
$if errorlevel 1 $abort Something wrong with reading raw data from %xls%
$gdxin "%gdx%"    // open GDX file
$load raw rr      // load data into gams
$gdxin            // close gdx file

sets rid(code,labID) row identifier
     cid(Cut)        column identifier;
parameter ct column totals;

option rid < raw, cid < raw, rid:0:0:1; // map domains and set print option
abort$(card(rid) > card(imax)) 'size of imax is to small', imax, rid;
ct(cid) = sum(rid, raw(rid,cid));
display raw,ct;

set rrdiff all zero rows;
rrdiff(rr) = yes - rid(rr); display rrdiff;

i(imax)   = ord(imax) <= card(rid);

cnt(imax) = sum(cid, ct(cid) = ord(imax));
display i,cnt;

*   2. Prepare model constants and set up numerical integration
*      data sets.

parameter diploid sample size
          nhat
          bin(imax)  binomial for diploid ;

diploid = card(i);

bin(i(imax))  = fact(card(i))/fact(card(i)-ord(imax))/fact(ord(imax));

* numerical integration data
set s     integral discritization steps / s0*s1000 /
    ss(s) excluding first element ;
parameter xx(s) values for steps s (0-1)
          ws(s) weight for trapezoid approximation
          deltaxx;

ss(s)   = ord(s) > 1;   // drop values for x = 0
deltaxx = 1/(card(s)-1);
xx(s)   = (ord(s)-1)*deltaxx;
ws(s)   = 1; ws(s)$(ord(s) = 1) = 0.5; ws(s)$(ord(s) = card(s)) = 0.5;

parameter t0(s)   intermediate value one
          t1(imax,s) intermediate value two;

t0(s)      = sqr(xx(s))+ 2*xx(s)*(1 - xx(s));
t1(i(imax),s) = ws(s)*power(t0(s),ord(imax))*power(1 - t0(s),card(i)-ord(imax));

*   3. Define optimization model and set relevant bounds.
*      Note that not all parameters are known at this point.


variables chi  sum of CHI squares
          a    alpha values
          b    beta values
          est(imax)    estimates
equations defchi       definition of CHI
          defest(imax) definition of estimates;

defchi..  chi =e= sum(i, sqr(cnt(i)-est(i))/est(i));

defest(i)..  est(i) =e= nhat*(a+b)/a/beta(a,b)*bin(i)*deltaxx*sum(s, t1(i,s)*xx(s)**(a-1)*(1-xx(s))**(b-1));

model m / all /;

* set bounds and initial values

a.lo = 0.01; a.up = 1.0;
b.lo = 1;     b.up = 10;
a.l = .1;     b.l  = 5;
est.l(i) = 1;
option  domlim=1000,limcol=0,limrow=0,solprint=on;

parameter rep(*,*) summary report;

*   4. First Optimization

nhat    = 4/3*sum(i(imax), ord(imax)*cnt(i))/card(i);

solve m min chi us dnlp;

display  diploid, nhat, a.l,b.l,chi.l;

rep(i ,'w-obs') = cnt(i);
rep(i ,'w-est') = est.l(i);
rep(i ,'w-chi') = sqr(cnt(i)-est.l(i))/est.l(i);
rep('total','w-chi') = sum(i, rep(i,'w-chi'));
rep('total','w-obs') = sum(i, cnt(i));
rep('alpha','w-est') = a.l;
rep('beta' ,'w-est') = b.l;
rep('nhat' ,'w-est') = nhat;
rep('SStat' ,'w-est') = m.solvestat;
rep('MStat' ,'w-est') = m.modelstat;

*   5. Second solve with modified counts

// use a cutoff of 70% and solve again

cnt(i(imax))$(ord(imax) > 0.7*card(i)) = 0;
nhat    = 4/3*sum(i(imax), ord(imax)*cnt(i))/card(i);
display nhat;

solve m min chi us dnlp;

display  diploid, nhat, a.l,b.l,chi.l;

rep(i ,'wo-obs') = cnt(i);
rep(i ,'wo-est') = est.l(i);
rep(i ,'wo-chi') = sqr(cnt(i)-est.l(i))/est.l(i);
rep('total','wo-obs') = sum(i, cnt(i));
rep('total','wo-chi') = sum(i, rep(i,'wo-chi'));
rep('alpha','wo-est') = a.l;
rep('beta' ,'wo-est') = b.l;
rep('nhat' ,'wo-est') = nhat;
rep('SStat' ,'wo-est') = m.solvestat;
rep('MStat' ,'wo-est') = m.modelstat;

display rep;

*   6. Update xls sheet with results in a new sheet

execute_unload "%gdx%", rep; abort$errorlevel 'unload to %gdx% failed';
execute '=gdxxrw "%gdx%" output="%xls%" trace=0 par=rep rng="%rngout%" rdim=1 cdim=1';
abort$errorlevel 'write to %xls% failed';