DNA sequencing by hybridization with additional information available
Institute of Computing Science, Poznań University of Technology, Piotrowo 3A, 60-965 Poznań, Poland
Institute of Bioorganic Chemistry, Polish Academy of Sciences
Noskowskiego 12/14, 61-704 Poznań, Poland
Received:
Rec. 24 January 2005
DOI: 10.12921/cmst.2005.11.01.21-29
OAI: oai:lib.psnc.pl:578
Abstract:
In classical DNA sequencing by hybridization it is assumed that the information obtained in the biochemical stage of the method is a set of the l-tuples composing the target sequence. It means that the information concerning the number of the repeated l-tuples is not available. Such an assumption was justified by the DNA chip technology constraints. However, nowadays some approximate information about l-tuple multiplicities can be obtained in the experiments, where DNA chips are used. It was a motivation for formulating combinatorial problems which arise when such additional information is taken into account. The goal of this paper is to formulate and classify these problems, what should establish a good starting point for further research concerning algorithmic methods solving DNA sequencing problems with multiplicity information. Moreover, the computational complexity of the new problems is determined, which in most cases is analogous to the complexity of their classical counterparts.
Key words:
combinatorial problems, computational complexity, DNA sequencing, l-tuple multiplicity
References:
[1] J. Błażewicz, P. Formanowicz, M. Kasprzak, W. T. Markiewicz and J. Węglarz, DNA sequencing with positive and negative errors, Journal of Computational Biology 6, 113-123 (1999).
[2] J. Błażewicz, P. Formanowicz, M. Kasprzak, P. Schuurman and G. J. Woeginger, DNA sequencing, Eulerian graphs, and the exact perfect matching problem, Lecture Notes in
Computer Science 2573, 13-24 (2002).
[3] J. Błażewicz and M. Kasprzak, Complexity of DNA sequencing by hybridization, Theoretical Computer Science 290, 1459-1473 (2003).
[4] V. Bryant, Aspects of Combinatorics. A Wide-Range Introduction, Cambridge University Press 1993.
[5] R. Drmanac, I. Labat, I. Brukner and R. Crkvenjakov, Sequencing of megabase plus DNA by hybridization: theory and the method, Genomics 4, 114-128 (1989).
[6] S. P. A. Fodor, J. L. Read, M. C. Pirrung, L. Stryer, A. Lu and D. Solas, Light-directed, spatially addressable parallel chemical synthesis, Science 251, 767-773 (1991).
[7] J. Gallant, D. Maier and J. A. Storer, On finding minimal length superstrings, Journal of Computer and System Sciences 20, 50-58 (1980).
[8] K. R. Khrapko, Y. P. Lysov, A. A. Khorlin, V. V. Shik, V. L. Florentiev and A. D. Mirzabekov, An oligonucleotide approach to DNA sequencing, FEBS Letters 256, 118-122 (1989).
[9] A. C. Pease, D. Solas, E. Sullivan, M. Cronin, C. Holmes and S. Fodor, Light-generated oligonucleotide arrays for rapid DNA sequence analysis, Proceedings of the National
Academy of Sciences of the USA 91, 5022-5026 (1994).
[10] P. A. Pevzner, l-tuple DNA sequencing: computer analysis, Journal of Biomolecular Structure and Dynamics 7, 63-73 (1989).
[11] P. A. Pevzner, Computational molecular biology. An algorithmic approach, The MIT Press, Cambridge, Massachusetts 2000.
[12] M. Schena, Microarray Analysis, Wiley-Liss, Hoboken, New Jersey 2003.
[13] E. M. Southern, U. Maskos and J. K. Elder, Analyzing and comparing nucleic acid sequences by hybridization to arrays of oligonucleotides: evaluation using experimental models, Genomics 13, 1008-1017 (1992).
[14] J. D. Watson and F. H. C. Crick, Genetic implications of the structure of deoxyribonucleic acid, Nature 171, 964-967 (1953).
In classical DNA sequencing by hybridization it is assumed that the information obtained in the biochemical stage of the method is a set of the l-tuples composing the target sequence. It means that the information concerning the number of the repeated l-tuples is not available. Such an assumption was justified by the DNA chip technology constraints. However, nowadays some approximate information about l-tuple multiplicities can be obtained in the experiments, where DNA chips are used. It was a motivation for formulating combinatorial problems which arise when such additional information is taken into account. The goal of this paper is to formulate and classify these problems, what should establish a good starting point for further research concerning algorithmic methods solving DNA sequencing problems with multiplicity information. Moreover, the computational complexity of the new problems is determined, which in most cases is analogous to the complexity of their classical counterparts.
Key words:
combinatorial problems, computational complexity, DNA sequencing, l-tuple multiplicity
References:
[1] J. Błażewicz, P. Formanowicz, M. Kasprzak, W. T. Markiewicz and J. Węglarz, DNA sequencing with positive and negative errors, Journal of Computational Biology 6, 113-123 (1999).
[2] J. Błażewicz, P. Formanowicz, M. Kasprzak, P. Schuurman and G. J. Woeginger, DNA sequencing, Eulerian graphs, and the exact perfect matching problem, Lecture Notes in
Computer Science 2573, 13-24 (2002).
[3] J. Błażewicz and M. Kasprzak, Complexity of DNA sequencing by hybridization, Theoretical Computer Science 290, 1459-1473 (2003).
[4] V. Bryant, Aspects of Combinatorics. A Wide-Range Introduction, Cambridge University Press 1993.
[5] R. Drmanac, I. Labat, I. Brukner and R. Crkvenjakov, Sequencing of megabase plus DNA by hybridization: theory and the method, Genomics 4, 114-128 (1989).
[6] S. P. A. Fodor, J. L. Read, M. C. Pirrung, L. Stryer, A. Lu and D. Solas, Light-directed, spatially addressable parallel chemical synthesis, Science 251, 767-773 (1991).
[7] J. Gallant, D. Maier and J. A. Storer, On finding minimal length superstrings, Journal of Computer and System Sciences 20, 50-58 (1980).
[8] K. R. Khrapko, Y. P. Lysov, A. A. Khorlin, V. V. Shik, V. L. Florentiev and A. D. Mirzabekov, An oligonucleotide approach to DNA sequencing, FEBS Letters 256, 118-122 (1989).
[9] A. C. Pease, D. Solas, E. Sullivan, M. Cronin, C. Holmes and S. Fodor, Light-generated oligonucleotide arrays for rapid DNA sequence analysis, Proceedings of the National
Academy of Sciences of the USA 91, 5022-5026 (1994).
[10] P. A. Pevzner, l-tuple DNA sequencing: computer analysis, Journal of Biomolecular Structure and Dynamics 7, 63-73 (1989).
[11] P. A. Pevzner, Computational molecular biology. An algorithmic approach, The MIT Press, Cambridge, Massachusetts 2000.
[12] M. Schena, Microarray Analysis, Wiley-Liss, Hoboken, New Jersey 2003.
[13] E. M. Southern, U. Maskos and J. K. Elder, Analyzing and comparing nucleic acid sequences by hybridization to arrays of oligonucleotides: evaluation using experimental models, Genomics 13, 1008-1017 (1992).
[14] J. D. Watson and F. H. C. Crick, Genetic implications of the structure of deoxyribonucleic acid, Nature 171, 964-967 (1953).