csp.gms : Closest String Problem

**Description**

The closest-string problem (CSP) finds a center string that minimizes the Hamming distance between the center string and all other strings. The Hamming distance counts the nonmatching characters. For example, 'CATCC' and 'CTTGC' have a Hamming distance of 2. Three formulations are presented. Formulations 1 and 2 can only be used for small problems. Formulation 3 is the most intuitive formulation and works well with general purpose MIP codes. It should be noted that the root node heuristic of CPLEX performs very well. With an absolute gap of one or two, CPLEX finds solution within a few seconds for all suggested sizes. For example, the setting below will result in solutions that are either within 1.0 from he global optimum or less than 1 percent: option optcr=0.01, optca=1.99;

**Reference**

- Meneses, C N, Lu, Z, Oliveira, C A S, and Pardalos, P M, Optimal Solutions for the Closest-String Problem via Integer Programming. INFORMS Journal on Computing 16, 4 (2004), 419-429.

**Small Model of Type :** MIP

**Category :** GAMS Model library

**Main file :** csp.gms

$title Closest String Problem (CSP,SEQ=306) $ontext The closest-string problem (CSP) finds a center string that minimizes the Hamming distance between the center string and all other strings. The Hamming distance counts the nonmatching characters. For example, 'CATCC' and 'CTTGC' have a Hamming distance of 2. Three formulations are presented. Formulations 1 and 2 can only be used for small problems. Formulation 3 is the most intuitive formulation and works well with general purpose MIP codes. It should be noted that the root node heuristic of CPLEX performs very well. With an absolute gap of one or two, CPLEX finds solution within a few seconds for all suggested sizes. For example, the setting below will result in solutions that are either within 1.0 from he global optimum or less than 1 percent: option optcr=0.01, optca=1.99; Meneses, C N, Lu, Z, Oliveira, C A S, and Pardalos, P M, Optimal Solutions for the Closest-String Problem via Integer Programming. INFORMS Journal on Computing 16, 4 (2004), 419-429. $offtext $eolcom // set n strings m character sequence a alphabet; parameter x(n,m) string values; $if NOT set letters $set letters 26 $if NOT set strings $set strings 4 $if NOT set length $set length 6 set n / s1*s%strings% /, m /c1*c%length% /, a / a1*a%letters% /; * sample data from paper table x(n,m) c1 c2 c3 c4 c5 c6 s1 4 9 6 6 5 18 // differ s2 13 5 4 9 1 14 // median s3 12 5 14 7 20 8 // length s4 13 5 4 9 21 13 ; // medium * recognize sample data $if NOT %strings%+%length%+%letters% == 4+6+26 x(n,m) = UniformInt(1,card(a)); if(card(n)*card(m) > 50, option limcol=0,limrow=0,solprint=off, reslim=10 optcr=0.01, optca=1.999 else option reslim=5, optcr=0.0, optca=0.999 ); *** Formulation P1 Variables d maximum difference between t and x t(m) reference string z(n,m) string is different; binary variable z; equations e1(n) lower bound for d e2(n,m) lower bound on difference e3(n,m) upper bound on difference; e1(n).. sum(m, z(n,m)) =l= d; * x <> t e2(n,m).. t(m) =l= t.up(m)*z(n,m) + x(n,m)*(1-z(n,m)); e3(n,m).. t(m) =g= t.lo(m)*z(n,m) + x(n,m)*(1-z(n,m)); t.lo(m) = smin(n, x(n,m)); t.up(m) = smax(n, x(n,m)); model p1A / e1,e2,e3 /; parameter report Summary report; report(m,'t.lo') = t.lo(m); report(m,'t.up') = t.up(m); solve p1A min d us mip; report(m,'p1A') = t.l(m); report('objective','p1A') = p1a.objval; report('Est Global','p1A') = ceil(p1a.objest-1e-6); *** Formulation P2 set ma(m,a) possible characters by position; ma(m,a) = sum(n, ord(a)=x(n,m)); *display ma; binary variable v(m,a) selection of characters; equations e4(m) select only one e5(m) assign character value to t; e4(m).. sum(ma(m,a), v(ma)) =e= 1; e5(m).. t(m) =e= sum(ma(m,a), ord(a)*v(ma)); model p2 / e1,e2,e3,e4,e5 /; solve p2 min d us mip; report(m,'p2') = t.l(m); report('objective','p2') = p2.objval; report('Est Global','p2') = ceil(p2.objest-1e-6); *** Formulation P3 equation e6(n) count matching characters ; e6(n).. card(m) - sum(ma(m,a),(x(n,m)=ord(a))*v(ma)) =l= d; model p3 / e4,e6 /; solve p3 min d us mip; t.l(m) = sum(ma(m,a), ord(a)*v.l(ma)); report(m,'p3') = t.l(m); report('objective','p3') = p3.objval; report('Est Global','p3') = ceil(p3.objest-1e-6); display report;