• NEWS
  • CURRENT ISSUE
  • CONTACT
GET_pdf delibra

Volume 19 (2) 2013, 99-105

A Representative Set Method for Symbolic Sequence Clustering

Kozarzewski Bohdan

University of Information Technology and Management
ul. H. Sucharskiego 2, 35-225 Rzeszów, Poland
E-mail: bkozarzewski@wsiz.rzeszow.pl

Received:

Received: 25 October 2012; revised: 24 February 2013; accepted: 1 March 2013; published online: 22 May 2013

DOI:   10.12921/cmst.2013.19.02.99-105

OAI:   oai:lib.psnc.pl:461

Abstract:

Sequence decomposition into a set of consecutive, distinct subsequences is crucial for symbolic sequence analysis. It reduces significantly the reference base of the recorded sequence for further retrieval and allows for original similarity and membership measures of the sequences. The introduced measures are a start point to a new algorithm for clustering sequences into groups of similar individuals. Algorithms that use the concept of a representative set achieved relatively good clustering results. The representative set that we have introduced is precisely and uniquely defined in contrast to that used in other applications.

Supplementary material:

  • data1.pdf
  • data2.pdf
  • data3.pdf
  • data4.pdf

Key words:

clustering, representative set, similarity and membership measures

References:

[1] M. Randić, S.C. Basak, Characterization of DNA Primary Sequences
Based on the Average Distances between Bases, J. Chem. Inf. Comput.
Sci. 41, 561-568 (2001).
[2] Y. Liu, The Numerical Characterization and Similarity Analysis of
DNA Primary Sequences, Internet Electronic Journal of Molecular
Design 1, 675-684 (2002).
[3] M-S. Yang and K-L. Wu, A Similarity-Based Robust Clustering
Method, IEEE Transactions on Pattern Analysis and Machine Intelligence
2(4), 434-448 (2004).
[4] J. Wen, C. Li, Similarity analysis of DNA sequences based on the
LZ complexity, Internet Electronic Journal of Molecular Design 6,
1-12 (2007).
[5] A. Kelil, S. Wang, Q. Jiang, R. Brzezinski, A general measure of
similarity for categorical sequences, Knowl. Inf. Syst. 24, 197-220 (2010), (DOI
10.1007/s10115-009-0237-8).
[6] M.R. Ackermann, J. Blömer, D. Kuntze, C. Sohler, Analysis of
Agglomerative Clustering,
http://arXiv.org/abs/1012.3697 (2012).
[7] P. Berkhin, Survey of Clustering Data Mining Techniques,1-56,
http://citeseerx.ist.psu.edu/viewauth/summary?aid=32145.
[8] R. Xu, D. Wunsch, Survey of clustering algorithms. IEEE Transactions
on Neural Networks 16(3), 645-678 (2005).
[9] P. Agrawal, M.A. Alam, R. Biswas, Analysing the agglomerative
hierarchical clustering algorithm for categorical attributes, International
Journal Innovation, Management and Technology 1(2), 186-190 (2010)
(and references quoted therein).
[10] N.S. Müller, A. Gabadinho, G. Ritschard, M. Studer, Extracting
knowledge from life courses: Clustering and visualization, In DAWAK
2008, volume LNCS 5182 of Lectures Notes in Computer Science,
176-185, Berlin Heidelberg Springer (2008).
[11] G.W. Milligan, M.C. Cooper, An examination of procedures for
determining the number of clusters in a data set, Psychometrika 50,
159-179 (1985).
[12] D.-G. Ke , Q.-Y. Tong, Easily adaptable complexity measure for
finite time series, Phys. Rev. E 77, 066215 (2008).
[13] B. Kozarzewski, A method for nucleotide sequence analysis, CMST
18(1), 5-10 (2012).
[14] L.R. Dice, Measures of the Amount of Ecologic Association Between
Species, Ecology 26(3), 297-302 (1945).
[15] M. Daszykowski, B. Walczak, D.L Massart, Representative subset
selection, Analytica Chimica Acta 468(1), 91-103 (2002).
[16] A. Gabadinho, G. Ritschard, M. Studer, N.S. Müller, Extracting
and Rendering Representative Sequences, in: Communications in
Computer and Information Science, Lecture Notes in Computer
Science, 94-106, Springer-Verlag Berlin Heidelberg (2011).
[17] C.D. Michener, R. R. Sokal, A quantitative approach to a problem of
classification, Evolution 11, 490-499 (1957).
[18] T. Calinski, J. Harabasz, A Dendrite Method for Cluster Analysis,
Communications in Statistics 3(1), 1-27 (1974).
[19] Q. Zhao, V. Hautamaki, P. Fränti, Knee point detection in BIC for
detecting the number of clusters, ACIVS 2008, volume LNCS 5295
of Lectures Notes in Computer Science, 664-673, Berlin Heidelberg.
Springer (2008).
[20] V. Granville, Identifying the number of clusters: final a solution,
http://www.analyticbridge.com/profile/Vincent.Granville
[21] M. Cameron, Y. Bernstein, H. Williams, Clustered sequence representation
for fast homology search, J. Comp. Biol. 14(5), 594-614 (2007).

  • JOURNAL MENU

    • AIMS AND SCOPE
    • EDITORS
    • EDITORIAL BOARD
    • NOTES FOR AUTHORS
    • CONTACT
    • IAN SNOOK PRIZES 2015
    • IAN SNOOK PRIZES 2016
    • IAN SNOOK PRIZES 2017
    • IAN SNOOK PRIZES 2018
    • IAN SNOOK PRIZES 2019
  • GALLERY

    vol_19_04_2013
    vol_19_03_2013
    volume_19_2_2013
    volume_19_1_2013
    vol_sp_2_2010
    vol_sp_2006
    vol_sp_1_2010
    vol_16_01_2010
    vol_18_2_2012
    vol_18_01_2012
    vol_17_01_02_2011
    vol_16_02_2010
    vol_15_02_2009
    vol_15_01_2009
    vol_14_02_2008
    vol_14_01_2008
    vol_13_02_2007
    vol_13_01_2007
    vol_12_02_2006
    vol_12_01_2006
    vol_11_02_2005
    vol_11_01_2005
  • CURRENT ISSUE

  • MANUSCRIPT SUBMISSION

    • SUBMIT A MANUSCRIPT
  • FUTURE ISSUES

    • ACCEPTED PAPERS
  • ALL ISSUES

    • 2020
      • Volume 26 (4)
      • Volume 26 (3)
      • Volume 26 (2)
      • Volume 26 (1)
    • 2019
      • Volume 25 (4)
      • Volume 25 (3)
      • Volume 25 (2)
      • Volume 25 (1)
    • 2018
      • Volume 24 (4)
      • Volume 24 (3)
      • Volume 24 (2)
      • Volume 24 (1)
    • 2017
      • Volume 23 (4)
      • Volume 23 (3)
      • Volume 23 (2)
      • Volume 23 (1)
    • 2016
      • Volume 22 (4)
      • Volume 22 (3)
      • Volume 22 (2)
      • Volume 22 (1)
    • 2015
      • Volume 21 (4)
      • Volume 21 (3)
      • Volume 21 (2)
      • Volume 21 (1)
    • 2014
      • Volume 20 (4)
      • Volume 20 (3)
      • Volume 20 (2)
      • Volume 20 (1)
    • 2013
      • Volume 19 (4)
      • Volume 19 (3)
      • Volume 19 (2)
      • Volume 19 (1)
    • 2012
      • Volume 18 (2)
      • Volume 18 (1)
    • 2011
      • Volume 17 (1-2)
    • 2010
      • Volume SI (2)
      • Volume SI (1)
      • Volume 16 (2)
      • Volume 16 (1)
    • 2009
      • Volume 15 (2)
      • Volume 15 (1)
    • 2008
      • Volume 14 (2)
      • Volume 14 (1)
    • 2007
      • Volume 13 (2)
      • Volume 13 (1)
    • 2006
      • Volume SI (1)
      • Volume 12 (2)
      • Volume 12 (1)
    • 2005
      • Volume 11 (2)
      • Volume 11 (1)
    • 2004
      • Volume 10 (2)
      • Volume 10 (1)
    • 2003
      • Volume 9 (1)
    • 2002
      • Volume 8 (2)
      • Volume 8 (1)
    • 2001
      • Volume 7 (2)
      • Volume 7 (1)
    • 2000
      • Volume 6 (1)
    • 1999
      • Volume 5 (1)
    • 1998
      • Volume 4 (1)
    • 1997
      • Volume 3 (1)
    • 1996
      • Volume 2 (1)
      • Volume 1 (1)
    • OLDER ISSUES
  • DATABASES

    • ARTICLES BASE
    • AUTHORS BASE
  • NEWS
  • CURRENT ISSUE
  • CONTACT

Institute of Bioorganic Chemistry Polish Academy of Sciences
Poznań Supercomputing and Networking Center

61-704 Poznań, Z. Noskowskiego 12/14
phone: (+48 61) 858-20-03
fax: (+48 61) 858-21-51