My academic interests

The subjects described are referred to my publications in PDF format (page numbering may not coincide with that of the published version). You will need Adobe(r) Acrobat(r) Reader(tm) in order to view them.

Classification as a scientific concept

I consider that principal aim of classification is to reveal and maintain knowledge of interrelations between different aspects of a phenomenon. From this perspective, classification is a concept for the 21st century. I study what successful classifications in real life, sciences and industries have in common. Clustering is just a stage in the classification process. My contributions are reflected in:

Monograph Mathematical Classification and Clustering (1996).
Inaugural lecture Classifying by Computer: from the Periodic Table to Scotch Malt Whisky (2000).
Latest book Clustering for Data Mining: A Data Recovery Approach (2005); most of the contributions below are also described in this book; its table of contents and introduction can be found here . A presentation of the book can be found on the Publisher's web site Chapman \& Hall/CRC; see also a review in Biomedical Engineering Online. A list of reviews and quotations from them is here.
Models and methods for revealing cluster structures in data
I am trying to develop clustering models that are both meaningful and computationally efficient. Two approaches I am working on are: approximation clustering and structural clustering. In approximation clustering, I develop an approach that extracts clusters one by one; this generates a number of different methods depending on the type of data and cluster structure sought. Overall, this amounts to intelligent clustering providing both automatic determination of principal parameters of algorithms (distances, number of clusters, etc.) and rich interpretation aids. My contributions:
sequential fitting approach to data analysis; see B. Mirkin (1997) Approximation Clustering: a Mine of Semidefinite Programming Problems, in P.Pardalos and H.Wolkowich (Eds.) Topics in Semidefinite and Interior-Point Methods, Fields Institute Communications Series, AMS:Providence, 167-180, and B. Mirkin (1998) Least-Squares Structuring, Clustering, and Data Processing Issues, The Computer Journal, 1998, 41, no. 8, 519-536.
anomalous pattern clustering; see B. Mirkin (1999) Concept Learning and feature selection based on square-error clustering, Machine Learning, 35, 25-40. Mark MingTso Chiang and B. Mirkin (2010) Intelligent choice of the number of clusters in K-Means clustering: an experimental study with different cluster spreads, Journal of Classification, 27, 1-38.
linear embedding of hierarchies; see B. Mirkin (1997) Linear embedding of binary hierarchies and its applications, in B. Mirkin, F. McMorris, F. Roberts, and A. Rzhetsky (Eds.) Mathematical Hierarchies and Biology, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, V. 37, AMS: Providence, 331-356.
fuzzy clustering with proportional membership; see S. Nascimento, B. Mirkin, and F. Moura-Pires (2003) Modeling proportional membership in fuzzy clustering, IEEE Transactions on Fuzzy Systems, 11, no. 2, 173-186.
biclustering and dual clustering in rectangular and contingency tables; see B. Mirkin, P. Arabie, and L. Hubert (1995) Additive two-mode clustering: the error-variance approach revisited, Journal of Classification, 12, 243-263, and B. Mirkin (2007) Deviant box and dual clusters for the analysis of conceptual contexts, Invited talk at Fifth International Conference on Concept Lattices and Their Applications (24-26 October 2007, Montpellier France), 12 p.
aggregation of contingency tables; see B. Mirkin (1999) Three Approaches to Aggregation of Interaction Tables, in H. Bacelar-Nicolau, F. Costa Nicolau and J. Janssen (Eds.) Applied Stochastic Models and Data Analysis, Lisbon: National Institute of Statistics, 30-35.
structural clustering and layered clusters; see Y. Kempner, B. Mirkin and I. Muchnik (1997) Monotone linkage clustering and quasi-convex set functions, Appl. Math. Letters, 10, no. 4, 19-24; B. Mirkin and I. Muchnik (2002)Layered Clusters of Tightness Set Functions, Applied Mathematics Letters, 15, 147-151, and Induced Layered Clusters, Hereditary Mappings, and Convex Geometries, Applied Mathematics Letters, 15, 293-298.
intelligent clustering: see Section 3.3 in Clustering for Data Mining: A Data Recovery Approach (2005). Mark MingTso Chiang and B. Mirkin (2010) Intelligent choice of the number of clusters in K-Means clustering: an experimental study with different cluster spreads, Journal of Classification, 27, 1-38.
Models and methods for interpreting cluster structures
This subject borders with machine learning and knowledge discovery, but it also properly belongs to clustering viewed from the perspective of classification. In particular, I proposed a non-traditional decision rule: a rectangular (comprehensive) description of a cluster. My contributions:
association in contingency tables as the average Quetelet index and contribution to the data scatter; see B. Mirkin (2001) Eleven Ways to Look at the Chi-Squared Coefficient for Contingency Tables, The American Statistician, 55, no. 2, 111-120, and B. Mirkin (2001) Reinterpreting the Category Utility Function, Machine Learning, 45, 219-228.
interpretation of single classes with the rectangular rule; see B. Mirkin (1999) Concept Learning and Feature Selection Based on Square-Error Clustering, Machine Learning, 35, 25-40, and B. Mirkin and O. Ritter (2000) A Feature Based Approach to Discrimination and Prediction of Protein Folding Groups, in S. Suhai (Ed.), Genomics and Proteomics: Functional and Computational Aspects, New York: Kluwer Academic/Plenum Publishers, 157-177.
Applications
I have participated in several large-scale applications of cluster analysis, especially in analysis of questionnaire surveys, analysis of organisation structures, structuring of genetic complementarity test data, etc. My current projects involve:
description of protein fold classes and enzymes; see B. Mirkin and O. Ritter (2000) A Feature Based Approach to Discrimination and Prediction of Protein Folding Groups, in S. Suhai (Ed.), Genomics and Proteomics: Functional and Computational Aspects, New York: Kluwer Academic/Plenum Publishers, 157-177.
combinatorial modelling in analysis of evolution; see O. Eulenstein, B. Mirkin, and M. Vingron (1997) Comparison of annotating duplication, tree mapping, and copying as methods to compare gene trees with species trees, in B. Mirkin, F. McMorris, F. Roberts, and A. Rzhetsky (Eds.) Mathematical Hierarchies and Biology, DIMACS Series, V. 37, Providence: AMS, 71-94 and O. Eulenstein, B. Mirkin, and M. Vingron (1998) Duplication-Based Measures of Difference Between Gene and Species Trees, Journal of Computational Biology, 5, 135-148. More recent is B. Mirkin, T. Fenner, M. Galperin and E. Koonin (2003) Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes, BMC Evolutionary Biology 2003, 3:2, see also B. Mirkin (2004) Mapping gene family data onto evolutionary trees, in M. Chavent, O. Dordan, C. Lacomblez, M. Langlais, and B. Patouille (Eds.), Comptes rendus des 11es Rencontres de la Societe Francophone de Classification, University of Bordeaux, 61-68 B. Mirkin, R. Camargo, T. Fenner, G. Loizou and P. Kellam (2010) Similarity clustering of proteins using substantive knowledge and reconstruction of evolutionary gene histories in herpesvirus, Theoretical Chemistry Accounts, 125, nn. 3-6, 569-581.
comprehensive clustering in text analysis; see B. Mirkin (2002) Towards comprehensive clustering of mixed-scale data with K-Means, Proceedings of the 2002 UK Workshop on Computational Intelligence (Birmingham, September, 2002), 126-130, and R. Pampapathi, B. Mirkin, M. Levene (2006) A suffix tree approach to anti-spam email filtering, Machine Learning, 65, 309-338.

Other

general views on data mining, statistics, clustering, etc.; see B. Mirkin (2003-7) Data Analysis: A Bird's Eye View, Notes to a lecture for research students at SCSIS (London, Birkbeck, 6 November 2007) and Clustering for Data Mining: A Data Recovery Approach (2005)