README (added at UCSC)

"make" --> gcc readseq.c cpg_lh.c -o cpglh.exe

We've been running this on hard-masked sequence (RepeatMasker and TRF 
with period <= 12 results are masked to 'N') in order to avoid CpG 
islands in Alu repeats in human.

The original cpg.c was written by Gos Miklem from the Sanger Center.  
LaDeana Hillier added some modifications --> cpg_lh.c, and UCSC has 
added some further modifications to cpg_lh.c, so that its expected 
number of CpGs in an island is calculated as described in 
  Gardiner-Garden, M. and M. Frommer, 1987 
  CpG islands in vertebrate genomes. J. Mol. Biol. 196:261-282.

    Expected = (Number of C's * Number of G's) / Length

Instead of a sliding-window search for CpG islands, this cpg program
uses a running-sum score where a 'C' followed by a 'G' increases the
score by 17 and anything else decreases the score by 1.  When the
score transitions from positive to 0 (and at the end of the sequence),
the sequence in the current span is evaluated to see if it qualifies
as a CpG island (>200 bp length, >50% GC, >0.6 ratio of observed CpG
to expected).  Then the search recurses on the span from the position
with the max running score up to the current position.

