csp.gms : Closest String Problem

**Description**

The closest-string problem (CSP) finds a center string that minimizes the Hamming distance between the center string and all other strings. The Hamming distance counts the nonmatching characters. For example, 'CATCC' and 'CTTGC' have a Hamming distance of 2. Three formulations are presented. Formulations 1 and 2 can only be used for small problems. Formulation 3 is the most intuitive formulation and works well with general purpose MIP codes. It should be noted that the root node heuristic of CPLEX performs very well. With an absolute gap of one or two, CPLEX finds solution within a few seconds for all suggested sizes. For example, the setting below will result in solutions that are either within 1.0 from he global optimum or less than 1 percent: option optcr=0.01, optca=1.99;

**Reference**

- Meneses, C N, Lu, Z, Oliveira, C A S, and Pardalos, P M, Optimal Solutions for the Closest-String Problem via Integer Programming. INFORMS Journal on Computing 16, 4 (2004), 419-429.

**Small Model of Type :** MIP

**Category :** GAMS Model library

**Main file :** csp.gms

```
$title Closest String Problem (CSP,SEQ=306)
$ontext
The closest-string problem (CSP) finds a center string that minimizes
the Hamming distance between the center string and all other strings.
The Hamming distance counts the nonmatching characters. For example,
'CATCC' and 'CTTGC' have a Hamming distance of 2.
Three formulations are presented. Formulations 1 and 2 can only be used
for small problems. Formulation 3 is the most intuitive formulation
and works well with general purpose MIP codes.
It should be noted that the root node heuristic of CPLEX performs
very well. With an absolute gap of one or two, CPLEX finds solution within
a few seconds for all suggested sizes. For example, the setting below
will result in solutions that are either within 1.0 from he global optimum
or less than 1 percent:
option optcr=0.01, optca=1.99;
Meneses, C N, Lu, Z, Oliveira, C A S, and Pardalos, P M,
Optimal Solutions for the Closest-String Problem via Integer Programming.
INFORMS Journal on Computing 16, 4 (2004), 419-429.
$offtext
$eolcom //
set n strings
m character sequence
a alphabet;
parameter x(n,m) string values;
$if NOT set letters $set letters 26
$if NOT set strings $set strings 4
$if NOT set length $set length 6
set n / s1*s%strings% /, m /c1*c%length% /, a / a1*a%letters% /;
* sample data from paper
table x(n,m)
c1 c2 c3 c4 c5 c6
s1 4 9 6 6 5 18 // differ
s2 13 5 4 9 1 14 // median
s3 12 5 14 7 20 8 // length
s4 13 5 4 9 21 13 ; // medium
* recognize sample data
$if NOT %strings%+%length%+%letters% == 4+6+26
x(n,m) = UniformInt(1,card(a));
if(card(n)*card(m) > 50,
option limcol=0,limrow=0,solprint=off, reslim=10
optcr=0.01, optca=1.999
else
option reslim=5, optcr=0.0, optca=0.999 );
*** Formulation P1
Variables d maximum difference between t and x
t(m) reference string
z(n,m) string is different;
binary variable z;
equations e1(n) lower bound for d
e2(n,m) lower bound on difference
e3(n,m) upper bound on difference;
e1(n).. sum(m, z(n,m)) =l= d;
* x <> t
e2(n,m).. t(m) =l= t.up(m)*z(n,m) + x(n,m)*(1-z(n,m));
e3(n,m).. t(m) =g= t.lo(m)*z(n,m) + x(n,m)*(1-z(n,m));
t.lo(m) = smin(n, x(n,m));
t.up(m) = smax(n, x(n,m));
model p1A / e1,e2,e3 /;
parameter report Summary report;
report(m,'t.lo') = t.lo(m);
report(m,'t.up') = t.up(m);
solve p1A min d us mip;
report(m,'p1A') = t.l(m);
report('objective','p1A') = p1a.objval;
report('Est Global','p1A') = ceil(p1a.objest-1e-6);
*** Formulation P2
set ma(m,a) possible characters by position;
ma(m,a) = sum(n, ord(a)=x(n,m));
*display ma;
binary variable v(m,a) selection of characters;
equations e4(m) select only one
e5(m) assign character value to t;
e4(m).. sum(ma(m,a), v(ma)) =e= 1;
e5(m).. t(m) =e= sum(ma(m,a), ord(a)*v(ma));
model p2 / e1,e2,e3,e4,e5 /;
solve p2 min d us mip;
report(m,'p2') = t.l(m);
report('objective','p2') = p2.objval;
report('Est Global','p2') = ceil(p2.objest-1e-6);
*** Formulation P3
equation e6(n) count matching characters ;
e6(n).. card(m) - sum(ma(m,a),(x(n,m)=ord(a))*v(ma)) =l= d;
model p3 / e4,e6 /;
solve p3 min d us mip;
t.l(m) = sum(ma(m,a), ord(a)*v.l(ma));
report(m,'p3') = t.l(m);
report('objective','p3') = p3.objval;
report('Est Global','p3') = ceil(p3.objest-1e-6);
display report;
```