Library Header Image
LSE Research Online LSE Library Services

Euler circuits and DNA sequencing by hybridization

Arratia, Richard and Bollobás, Béla and Coppersmith, Don and Sorkin, Gregory B. (2000) Euler circuits and DNA sequencing by hybridization. Discrete Applied Mathematics, 104 (1-3). pp. 63-96. ISSN 0166-218X

Full text not available from this repository.
Identification Number: 10.1016/S0166-218X(00)00190-6


Sequencing by hybridization is a method of reconstructing a long DNA string — that is, figuring out its nucleotide sequence — from knowledge of its short substrings. Unique reconstruction is not always possible, and the goal of this paper is to study the number of reconstructions of a random string. For a given string, the number of reconstructions is determined by the pattern of repeated substrings; in an appropriate limit substrings will occur at most twice, so the pattern of repeats is given by a pairing: a string of length 2n in which each symbol occurs twice. A pairing induces a 2-in, 2-out graph, whose directed edges are defined by successive symbols of the pairing — for example the pairing ABBCAC induces the graph with edges AB, BB, BC, and so forth — and the number of reconstructions is simply the number of Euler circuits in this 2-in, 2-out graph. The original problem is thus transformed into one about pairings: to find the number fk(n) of n-symbol pairings having k Euler circuits. We show how to compute this function, in closed form, for any fixed k, and we present the functions explicitly for k=1,…,9. The key is a decomposition theorem: the Euler “circuit number” of a pairing is the product of the circuit numbers of “component” sub-pairings. These components come from connected components of the “interlace graph”, which has the pairing's symbols as vertices, and edges when symbols are “interlaced”. (A and B are interlaced if the pairing has the form ABAB or BABA.) We carry these results back to the original question about DNA strings, and provide a total variation distance upper bound for the approximation error. We perform an asymptotic enumeration of 2-in, 2-out digraphs to show that, for a typical random n-pairing, the number of Euler circuits is of order no smaller than 2n/n, and the expected number is asymptotically at least e−1/22n−1/n. Since any n-pairing has at most 2n−1 Euler circuits, this pinpoints the exponential growth rate.

Item Type: Article
Official URL:
Additional Information: © 2000 Elsevier Science B.V.
Subjects: Q Science > QA Mathematics
Sets: Research centres and groups > Management Science Group
Departments > Management
Date Deposited: 13 Apr 2011 14:29
Last Modified: 06 Jun 2012 17:16

Actions (login required)

View Item View Item