• CONTACT
  • LAST ISSUE
  • IN PROGRESS
  • EARLY VIEW
  • ACCEPTED PAPERS
GET_pdf delibra

Volume 19 (2) 2013, 99-105

A Representative Set Method for Symbolic Sequence Clustering

Kozarzewski Bohdan

University of Information Technology and Management
ul. H. Sucharskiego 2, 35-225 Rzeszów, Poland
E-mail: bkozarzewski@wsiz.rzeszow.pl

Received:

Received: 25 October 2012; revised: 24 February 2013; accepted: 1 March 2013; published online: 22 May 2013

DOI:   10.12921/cmst.2013.19.02.99-105

OAI:   oai:lib.psnc.pl:461

Abstract:

Sequence decomposition into a set of consecutive, distinct subsequences is crucial for symbolic sequence analysis. It reduces significantly the reference base of the recorded sequence for further retrieval and allows for original similarity and membership measures of the sequences. The introduced measures are a start point to a new algorithm for clustering sequences into groups of similar individuals. Algorithms that use the concept of a representative set achieved relatively good clustering results. The representative set that we have introduced is precisely and uniquely defined in contrast to that used in other applications.

Supplementary material:

  • data1.pdf
  • data2.pdf
  • data3.pdf
  • data4.pdf

Key words:

clustering, representative set, similarity and membership measures

References:

[1] M. Randić, S.C. Basak, Characterization of DNA Primary Sequences
Based on the Average Distances between Bases, J. Chem. Inf. Comput.
Sci. 41, 561-568 (2001).
[2] Y. Liu, The Numerical Characterization and Similarity Analysis of
DNA Primary Sequences, Internet Electronic Journal of Molecular
Design 1, 675-684 (2002).
[3] M-S. Yang and K-L. Wu, A Similarity-Based Robust Clustering
Method, IEEE Transactions on Pattern Analysis and Machine Intelligence
2(4), 434-448 (2004).
[4] J. Wen, C. Li, Similarity analysis of DNA sequences based on the
LZ complexity, Internet Electronic Journal of Molecular Design 6,
1-12 (2007).
[5] A. Kelil, S. Wang, Q. Jiang, R. Brzezinski, A general measure of
similarity for categorical sequences, Knowl. Inf. Syst. 24, 197-220 (2010), (DOI
10.1007/s10115-009-0237-8).
[6] M.R. Ackermann, J. Blömer, D. Kuntze, C. Sohler, Analysis of
Agglomerative Clustering,
http://arXiv.org/abs/1012.3697 (2012).
[7] P. Berkhin, Survey of Clustering Data Mining Techniques,1-56,
http://citeseerx.ist.psu.edu/viewauth/summary?aid=32145.
[8] R. Xu, D. Wunsch, Survey of clustering algorithms. IEEE Transactions
on Neural Networks 16(3), 645-678 (2005).
[9] P. Agrawal, M.A. Alam, R. Biswas, Analysing the agglomerative
hierarchical clustering algorithm for categorical attributes, International
Journal Innovation, Management and Technology 1(2), 186-190 (2010)
(and references quoted therein).
[10] N.S. Müller, A. Gabadinho, G. Ritschard, M. Studer, Extracting
knowledge from life courses: Clustering and visualization, In DAWAK
2008, volume LNCS 5182 of Lectures Notes in Computer Science,
176-185, Berlin Heidelberg Springer (2008).
[11] G.W. Milligan, M.C. Cooper, An examination of procedures for
determining the number of clusters in a data set, Psychometrika 50,
159-179 (1985).
[12] D.-G. Ke , Q.-Y. Tong, Easily adaptable complexity measure for
finite time series, Phys. Rev. E 77, 066215 (2008).
[13] B. Kozarzewski, A method for nucleotide sequence analysis, CMST
18(1), 5-10 (2012).
[14] L.R. Dice, Measures of the Amount of Ecologic Association Between
Species, Ecology 26(3), 297-302 (1945).
[15] M. Daszykowski, B. Walczak, D.L Massart, Representative subset
selection, Analytica Chimica Acta 468(1), 91-103 (2002).
[16] A. Gabadinho, G. Ritschard, M. Studer, N.S. Müller, Extracting
and Rendering Representative Sequences, in: Communications in
Computer and Information Science, Lecture Notes in Computer
Science, 94-106, Springer-Verlag Berlin Heidelberg (2011).
[17] C.D. Michener, R. R. Sokal, A quantitative approach to a problem of
classification, Evolution 11, 490-499 (1957).
[18] T. Calinski, J. Harabasz, A Dendrite Method for Cluster Analysis,
Communications in Statistics 3(1), 1-27 (1974).
[19] Q. Zhao, V. Hautamaki, P. Fränti, Knee point detection in BIC for
detecting the number of clusters, ACIVS 2008, volume LNCS 5295
of Lectures Notes in Computer Science, 664-673, Berlin Heidelberg.
Springer (2008).
[20] V. Granville, Identifying the number of clusters: final a solution,
http://www.analyticbridge.com/profile/Vincent.Granville
[21] M. Cameron, Y. Bernstein, H. Williams, Clustered sequence representation
for fast homology search, J. Comp. Biol. 14(5), 594-614 (2007).

  • JOURNAL MENU

    • AIMS AND SCOPE
    • EDITORS
    • EDITORIAL BOARD
    • NOTES FOR AUTHORS
    • CONTACT
    • IAN SNOOK PRIZES 2015
    • IAN SNOOK PRIZES 2016
    • IAN SNOOK PRIZES 2017
    • IAN SNOOK PRIZES 2018
    • IAN SNOOK PRIZES 2019
    • IAN SNOOK PRIZES 2020
    • IAN SNOOK PRIZES 2021
    • IAN SNOOK PRIZES 2024
  • GALLERY

  • LAST ISSUE

  • MANUSCRIPT SUBMISSION

    • SUBMIT A MANUSCRIPT
  • FUTURE ISSUES

    • ACCEPTED PAPERS
    • EARLY VIEW
    • Volume 31 (1) – in progress
  • ALL ISSUES

    • 2024
      • Volume 30 (3–4)
      • Volume 30 (1–2)
    • 2023
      • Volume 29 (1–4)
    • 2022
      • Volume 28 (4)
      • Volume 28 (3)
      • Volume 28 (2)
      • Volume 28 (1)
    • 2021
      • Volume 27 (4)
      • Volume 27 (3)
      • Volume 27 (2)
      • Volume 27 (1)
    • 2020
      • Volume 26 (4)
      • Volume 26 (3)
      • Volume 26 (2)
      • Volume 26 (1)
    • 2019
      • Volume 25 (4)
      • Volume 25 (3)
      • Volume 25 (2)
      • Volume 25 (1)
    • 2018
      • Volume 24 (4)
      • Volume 24 (3)
      • Volume 24 (2)
      • Volume 24 (1)
    • 2017
      • Volume 23 (4)
      • Volume 23 (3)
      • Volume 23 (2)
      • Volume 23 (1)
    • 2016
      • Volume 22 (4)
      • Volume 22 (3)
      • Volume 22 (2)
      • Volume 22 (1)
    • 2015
      • Volume 21 (4)
      • Volume 21 (3)
      • Volume 21 (2)
      • Volume 21 (1)
    • 2014
      • Volume 20 (4)
      • Volume 20 (3)
      • Volume 20 (2)
      • Volume 20 (1)
    • 2013
      • Volume 19 (4)
      • Volume 19 (3)
      • Volume 19 (2)
      • Volume 19 (1)
    • 2012
      • Volume 18 (2)
      • Volume 18 (1)
    • 2011
      • Volume 17 (1-2)
    • 2010
      • Volume SI (2)
      • Volume SI (1)
      • Volume 16 (2)
      • Volume 16 (1)
    • 2009
      • Volume 15 (2)
      • Volume 15 (1)
    • 2008
      • Volume 14 (2)
      • Volume 14 (1)
    • 2007
      • Volume 13 (2)
      • Volume 13 (1)
    • 2006
      • Volume SI (1)
      • Volume 12 (2)
      • Volume 12 (1)
    • 2005
      • Volume 11 (2)
      • Volume 11 (1)
    • 2004
      • Volume 10 (2)
      • Volume 10 (1)
    • 2003
      • Volume 9 (1)
    • 2002
      • Volume 8 (2)
      • Volume 8 (1)
    • 2001
      • Volume 7 (2)
      • Volume 7 (1)
    • 2000
      • Volume 6 (1)
    • 1999
      • Volume 5 (1)
    • 1998
      • Volume 4 (1)
    • 1997
      • Volume 3 (1)
    • 1996
      • Volume 2 (1)
      • Volume 1 (1)
  • DATABASES

    • AUTHORS BASE
  • CONTACT
  • LAST ISSUE
  • IN PROGRESS
  • EARLY VIEW
  • ACCEPTED PAPERS

© 2025 CMST