The subjects described are referred to
my publications in PDF format (page numbering may not coincide with
that of the published version). You will need Adobe(r) Acrobat(r) Reader(tm) in order to view them.
Classification as a scientific concept
I consider that principal aim of classification is to reveal and maintain
knowledge of interrelations between different aspects of a phenomenon. From
this perspective, classification is a concept for the 21st century. I study
what successful classifications in real life, sciences and industries have
in common. Clustering is just a stage in the classification process. My
contributions are reflected in:
- Monograph
Mathematical Classification and Clustering (1996).
- Inaugural lecture
Classifying by Computer: from the Periodic Table
to Scotch Malt Whisky (2000).
- Latest book
Clustering for Data Mining: A Data Recovery Approach (2005); most of the
contributions below are also described in this book;
its table of contents and introduction can be found
here . A presentation of the book can be found on the Publisher's web site
Chapman \& Hall/CRC; see also a review in
Biomedical Engineering Online.
A list of reviews
and quotations from them is
here.
Models and methods for revealing cluster structures in data
I am trying to develop clustering models that are both meaningful and
computationally efficient. Two approaches I am working on are: approximation
clustering and structural clustering. In approximation clustering, I develop
an approach that extracts clusters one by one; this generates a number of
different methods depending on the type of data and cluster structure
sought. Overall, this amounts to intelligent clustering providing both
automatic determination of principal parameters of algorithms (distances,
number of clusters, etc.) and rich interpretation aids. My contributions:
- sequential fitting approach to data analysis; see
B. Mirkin (1997) Approximation Clustering: a
Mine of Semidefinite Programming Problems,
in P.Pardalos and H.Wolkowich (Eds.) Topics in Semidefinite and
Interior-Point Methods, Fields Institute Communications Series, AMS:Providence, 167-180, and
B. Mirkin (1998) Least-Squares
Structuring, Clustering, and Data Processing Issues,
The Computer Journal, 1998, 41, no. 8, 519-536.
- anomalous pattern clustering; see
B. Mirkin (1999) Concept Learning and feature
selection based on square-error clustering, Machine Learning, 35, 25-40.
Mark MingTso Chiang and B. Mirkin (2010)
Intelligent choice of the number of clusters in K-Means clustering: an experimental study with different cluster spreads, Journal of Classification, 27, 1-38.
- linear embedding of hierarchies;
see B. Mirkin (1997)
Linear embedding of binary hierarchies and its
applications, in B. Mirkin, F. McMorris, F. Roberts, and A. Rzhetsky (Eds.)
Mathematical Hierarchies and Biology, DIMACS Series in Discrete Mathematics
and Theoretical Computer Science, V. 37, AMS: Providence, 331-356.
- fuzzy clustering with proportional membership; see
S. Nascimento, B. Mirkin, and F. Moura-Pires (2003)
Modeling proportional
membership in fuzzy clustering, IEEE Transactions on Fuzzy Systems, 11, no. 2, 173-186.
- biclustering and dual clustering in rectangular and contingency tables; see B. Mirkin,
P. Arabie, and L. Hubert (1995)
Additive two-mode clustering: the error-variance approach revisited,
Journal of Classification, 12, 243-263,
and B. Mirkin (2007)
Deviant box and dual clusters for the analysis of conceptual contexts,
Invited talk at Fifth International Conference on Concept Lattices and Their Applications (24-26 October 2007, Montpellier France), 12 p.
- aggregation of contingency tables; see
B. Mirkin (1999) Three Approaches to
Aggregation of Interaction Tables,
in H. Bacelar-Nicolau, F. Costa Nicolau and J. Janssen (Eds.)
Applied Stochastic Models and Data Analysis, Lisbon: National
Institute of Statistics, 30-35.
- structural clustering and layered clusters; see
Y. Kempner, B. Mirkin and I. Muchnik (1997)
Monotone linkage clustering and quasi-convex
set functions, Appl. Math. Letters, 10, no. 4, 19-24;
B. Mirkin and I. Muchnik (2002)Layered Clusters of Tightness Set Functions, Applied Mathematics
Letters, 15, 147-151,
and
Induced Layered Clusters, Hereditary
Mappings, and Convex Geometries, Applied Mathematics Letters, 15,
293-298.
- intelligent clustering: see Section 3.3 in
Clustering for Data Mining: A Data Recovery Approach (2005).
Mark MingTso Chiang and B. Mirkin (2010)
Intelligent choice of the number of clusters in K-Means clustering: an experimental study with different cluster spreads, Journal of Classification, 27, 1-38.
Models and methods for interpreting cluster structures
This subject borders with machine learning and knowledge discovery, but it
also properly belongs to clustering viewed from the perspective of
classification. In particular, I proposed a non-traditional decision rule: a
rectangular (comprehensive) description of a cluster. My contributions:
- association in contingency tables as the average Quetelet index
and contribution to the data scatter; see
B. Mirkin (2001) Eleven Ways to Look at
the Chi-Squared Coefficient for Contingency Tables, The American
Statistician, 55, no. 2, 111-120, and B. Mirkin (2001)
Reinterpreting the Category
Utility Function, Machine Learning, 45, 219-228.
- interpretation of single classes with the rectangular rule;
see
B. Mirkin (1999) Concept Learning and Feature
Selection Based on Square-Error Clustering, Machine Learning, 35, 25-40,
and
B. Mirkin and O. Ritter (2000) A Feature Based
Approach to Discrimination and Prediction of Protein Folding Groups,
in S. Suhai (Ed.), Genomics and Proteomics: Functional and Computational
Aspects, New York: Kluwer Academic/Plenum Publishers, 157-177.
Applications
I have participated in several large-scale applications of cluster analysis,
especially in analysis of questionnaire surveys, analysis of organisation
structures, structuring of genetic complementarity test data, etc. My
current projects involve:
- description of protein fold classes and enzymes; see
B. Mirkin and O. Ritter (2000) A Feature Based
Approach to Discrimination and Prediction of Protein Folding Groups,
in S. Suhai (Ed.), Genomics and Proteomics: Functional and Computational
Aspects, New York: Kluwer Academic/Plenum Publishers, 157-177.
- combinatorial modelling in analysis of evolution; see
O. Eulenstein, B. Mirkin, and M. Vingron (1997)
Comparison
of annotating duplication, tree mapping,
and copying as methods to compare gene trees with species trees,
in B. Mirkin, F. McMorris, F. Roberts, and A. Rzhetsky (Eds.)
Mathematical Hierarchies and Biology,
DIMACS Series, V. 37, Providence: AMS, 71-94 and
O. Eulenstein, B. Mirkin, and M. Vingron (1998)
Duplication-Based Measures of Difference Between Gene and
Species Trees, Journal of Computational Biology, 5, 135-148. More
recent is B. Mirkin, T. Fenner, M. Galperin and E. Koonin (2003)
Algorithms for computing parsimonious evolutionary scenarios
for genome evolution, the last universal common ancestor and dominance
of horizontal gene transfer in the evolution of prokaryotes,
BMC Evolutionary Biology 2003, 3:2, see also B. Mirkin (2004)
Mapping gene family data onto evolutionary trees, in M. Chavent, O. Dordan,
C. Lacomblez, M. Langlais, and
B. Patouille (Eds.), Comptes rendus des 11es
Rencontres de la Societe Francophone de Classification, University of Bordeaux,
61-68
B. Mirkin, R. Camargo, T. Fenner, G. Loizou and P. Kellam (2010)
Similarity clustering of proteins using substantive knowledge and
reconstruction of evolutionary gene histories in herpesvirus,
Theoretical Chemistry Accounts, 125, nn. 3-6, 569-581.
- comprehensive clustering in text analysis; see
B. Mirkin (2002)
Towards comprehensive clustering
of mixed-scale data with K-Means, Proceedings of the 2002 UK Workshop on
Computational Intelligence (Birmingham, September, 2002), 126-130,
and R. Pampapathi, B. Mirkin, M. Levene (2006)
A suffix tree approach to anti-spam email filtering, Machine Learning,
65, 309-338.
Other
- general views on data mining, statistics, clustering, etc.; see
B. Mirkin (2003-7)
Data Analysis: A Bird's Eye View, Notes to a lecture for research students
at SCSIS (London, Birkbeck, 6 November 2007) and
Clustering for Data Mining: A Data Recovery Approach (2005)