Cookies?
Library Header Image
LSE Research Online LSE Library Services

Euler circuits and DNA sequencing by hybridization

Arratia, Richard, Bollobás, Béla, Coppersmith, Don and Sorkin, Gregory B. ORCID: 0000-0003-4935-7820 (2000) Euler circuits and DNA sequencing by hybridization. Discrete Applied Mathematics, 104 (1-3). pp. 63-96. ISSN 0166-218X

Full text not available from this repository.
Identification Number: 10.1016/S0166-218X(00)00190-6

Abstract

Sequencing by hybridization is a method of reconstructing a long DNA string — that is, figuring out its nucleotide sequence — from knowledge of its short substrings. Unique reconstruction is not always possible, and the goal of this paper is to study the number of reconstructions of a random string. For a given string, the number of reconstructions is determined by the pattern of repeated substrings; in an appropriate limit substrings will occur at most twice, so the pattern of repeats is given by a pairing: a string of length 2n in which each symbol occurs twice. A pairing induces a 2-in, 2-out graph, whose directed edges are defined by successive symbols of the pairing — for example the pairing ABBCAC induces the graph with edges AB, BB, BC, and so forth — and the number of reconstructions is simply the number of Euler circuits in this 2-in, 2-out graph. The original problem is thus transformed into one about pairings: to find the number fk(n) of n-symbol pairings having k Euler circuits. We show how to compute this function, in closed form, for any fixed k, and we present the functions explicitly for k=1,…,9. The key is a decomposition theorem: the Euler “circuit number” of a pairing is the product of the circuit numbers of “component” sub-pairings. These components come from connected components of the “interlace graph”, which has the pairing's symbols as vertices, and edges when symbols are “interlaced”. (A and B are interlaced if the pairing has the form ABAB or BABA.) We carry these results back to the original question about DNA strings, and provide a total variation distance upper bound for the approximation error. We perform an asymptotic enumeration of 2-in, 2-out digraphs to show that, for a typical random n-pairing, the number of Euler circuits is of order no smaller than 2n/n, and the expected number is asymptotically at least e−1/22n−1/n. Since any n-pairing has at most 2n−1 Euler circuits, this pinpoints the exponential growth rate.

Item Type: Article
Official URL: http://www.elsevier.com/wps/find/journaldescriptio...
Additional Information: © 2000 Elsevier Science B.V.
Divisions: Management
Subjects: Q Science > QA Mathematics
Date Deposited: 13 Apr 2011 14:29
Last Modified: 21 Feb 2024 20:45
URI: http://eprints.lse.ac.uk/id/eprint/35505

Actions (login required)

View Item View Item