Modern data analysis is the product of the union of several disciplines: statistics, computer science, pattern recognition, machine learning, and others. Perhaps the oldest parent is statistics, being driven by the demands of the different areas to which it has been applied. More recently, however, the possibilities arising from powerful and available computers have stimulated a revolution. Data of new kinds and in unimaginable quantities now occur; they bring with them entirely new classes of problems, problems to which the classical statistical solutions are not always well-matched; these problems in turn require novel and original solutions. In this paper I look at some of these new kinds of data, and the associated problems and solutions. The data include data sets which are large in dimensionality or number of records, data which are dependent on each other, and that special kind of qualitative data known as metadata. New problems arising from these data include straightforward mechanical issues of how to handle them, how to estimate descriptors and parameters (adaptive and sequential methods are obviously more important than in classical statistics), the (ir)relevance of significance tests, and automatic data analysis (as in anomaly detection in large data sets or, quite differently, in automatic model fitting). Some of the new types of model which are becoming so important nicely illustrate the interdisciplinary nature of modern data analysis: rule-based systems, hidden Markov models, neural networks, genetic algorithms, and so on. These are briefly discussed. All of this leads us to consider more carefully the link between data and information and to recognise the complementary data analytic abilities and powers of humans and computers. But we can go too far. If there is 'intelligent data analysis' there is also 'unintelligent data analysis'. Two different manifestations of the latter are examined, and a cautionary note sounded.
David J. Hand is Professor of Statistics at the Open University in the UK. He has published over 150 papers and fourteen books, including Artificial Intelligence Frontiers in Statistics, Practical Longitudinal Data Analysis, and, most recently, Construction and Assessment of Classification Rules. He is founding editor and editor-in-chief of the journal Statistics and Computing. His research interests include developments at the interface between statistics and computing, multivariate statistics, the foundations of statistics, and applications in medicine, psychology, and finance.