A Method for Nucleotide Sequence Analysis

Kozarzewski Bohdan

doi:10.12921/cmst.2012.18.01.5-10

A Method for Nucleotide Sequence Analysis

University of Information Technology and Management
ul. H. Sucharskiego 2, 35-225 Rzeszów, Poland
e-mai: bkozarzewski@wsiz.rzeszow.pl

Received:

(Received: 22 February 2012; revised: 29 May 2012; accepted: 7 June 2012; published online: 21 June 2012)

DOI: 10.12921/cmst.2012.18.01.5-10

OAI: oai:lib.psnc.pl:420

Abstract:

Symbolic sequence decomposition into a set of consecutive, distinct subsequences (mers) is presented. Several statistical distributions of nucleotide subsequences are defined and analysed. Sequence entropy and similarity between sequences in terms of mer lengths distribution are defined. An alignment-free method of phylogenetic tree
construction is proposed.

Key words:

phylogenetic tree, sequence parsing, similarity measure

References:

[1] A. Lempel, J. Ziv, On the complexity of finite sequences. IEEE Trans. Inform. Theory 22, 75-81 (1976).
[2] H.H. Out, K. Sayood, A new sequence distance measure for phylogenetic tree construction. Bioinformatics 19, 2122-2130 (2003).
[3] D.-G. Ke, Q.-Y. Tong, Easily adaptable complexity measure for finite time series. Phys. Rev. E77, 066215 (2008).
[4] Z. Kása, On the d-complexity of strings. http://arxiv.org/abs/1002.2721v1.
[5] C. Adami, N.J. Ceref, 1999. Physical complexity of symbolic sequences. arxiv: adap-org/9605002v3
[6] J. Wen, C. Li, Similarity analysis of DNA sequences based on the LZ complexity. Internet Electron. J. Mol. Des. 6, 1-12 (2007).
[7] B. Kozarzewski, Multilevel time series complexity. Journal of Applied Computer Science 19, 2, 61-71 (2011).
[8] J.-B. Brissaud, The meaning of entropy. Entropy 7, 68-96 (2005).
[9] Y.-H. Chen, S.-L. Nyeo, C.-Y. Yeh, Model for distribution of k-mers in DNA sequences. Physical Review E72, 011908 (2005).
[10] W.K. Brown, K.H. Wohletz, Derivation of the Weibull distribution based on physical principles and its connection to the Rossin-Rammler and lognormal distributions. Journal of Applied Physics 78, 2758-2763 (1995).
[11] M. van Oven, http://www.phylotree.org (2009).

Volume 18 (1) 2012, 5-10

A Method for Nucleotide Sequence Analysis

Received:

DOI: 10.12921/cmst.2012.18.01.5-10

OAI: oai:lib.psnc.pl:420

Abstract:

Key words:

References:

JOURNAL MENU

GALLERY

LAST ISSUE

MANUSCRIPT SUBMISSION

FUTURE ISSUES

ALL ISSUES

DATABASES