Classification using Discernibility

Tuesday, 03 July 2007

Student

Zacharias Voulgaris

This e-mail address is being protected from spam bots, you need JavaScript enabled to view it

Supervisors

George Magoulas

Boris Mirkin

Project Details

PhD research, started 01/2005 expected to finish 04/2008

Keywords

Classification, Ensemble, Neural Networks, kth Nearest Neighbour

Download

Combining ANN (doc)

Combining ANN (pdf)

Combining ANNs and Nearest Neighbour Methods for Classification using the Discernibility Concept

Key themes

The key idea of this project is Discernibility, a concept depicting how separated the classes of a dataset are i.e. how easy it is to distinguish the elements of one from those of the other. For example, apples and oranges are two very discernible classes of fruit, while apples from region A and apples of region B may not be so discernible. For a given dataset, we measure its Discernibility by an Index of Discernibility, which is the average of the individual discernibilities of all of its elements i.e. how discernible each element is. Despite its simplicity, this concept has a variety of uses, ranging from the development of different, more efficient classifiers to feature selection i.e. finding which are the most important attributes of a dataset. The classifiers developed are based on kth Nearest Neighbour, a simple method that classifies an element by taking into account the elements surrounding it.

Applications of the research

This research aspires to be useful for real world datasets in application domains such as biology and finance. We therefore plan to apply the methods developed on some real case studies, for example on some biology datasets of high dimensionality. Since classification is something that occurs in many different problem areas, the proposed research aims to be practically applicable in a broad range of domains.

Project Aims

The aims of the project are the following:· Creation of alternatives to kth Nearest Neighbour using the concept of Discernibility· Development of an ensemble of classifiers using alternative measures for information fusion and dataset analysis· Creation of a new feature selection method based on the Index of Discernibility These aims have to do with applications of the concept of Discernibility, which is introduced in this project, as well as two other measures: the Degree of Confidence (measuring how certain each classification is) and Net Reliability (showing how reliable a classifier is based on one or more classifications).

Results to date

The experiments carried out so far have yielded some promising results, on the datasets used. These datasets are taken from the UCI depository and have been used extensively for classification. The results of one series of experiments show that the new methods proposed have a very good accuracy, high reliability and some of them also exhibit high speed. Other experimental results show quite a good classification performance in datasets with their features reduced by one of the methods developed in this project.

23-29 Emerald Street, London WC1N 3QS phone: +44 (0)20 7763 2174 www.lkl.ac.uk