Birkbeck College Logo

Zacharias Voulgaris

PhD Student, Teaching Assistant

Supervisors: Dr. George Magoulas and Prof. Boris Mirkin
Tel: +442077632110
Fax: +442072422754

London Knowledge Lab
23-29 Emerald Street
London WC1N 3QS

Research

My work is based on the Discernibility concept, which has been implemented as a measure for evaluating how distinguishable the classes of a dataset are. I have investigated approaches to feature selection and data reduction based on Discernibility and developed reliability heuristics that use this concept in a reject-option method based on a reliability evaluation thechnique (based on Discernibility). I have also developed a method that employs this concept in classifier ensembles that are based on diversity. I have recently finished my viva and resubmitted the revised version of my thesis.

Education

Jan 2005 - Jun 2009: PhD at Birkbeck College, University of London
Sept 2003 - Sept 2004: MSc Information Systems and Technology, City University of London
Sept 1996 - Feb 2003: BEng/MEng Production Engineering and Management, Technical University of Crete, Greece

Publications

1. Z. Voulgaris, G. D. Magoulas, "Extensions of the k nearest neighbour methods for classification problems", Proceedings of the 26th IASTED Conference on Artificial Intelligence and Applications, Innsbruck, Austria, Feb. 2008, pp. 23-28.
2. Z. Voulgaris, G. D. Magoulas, "A discernibility-based approach to feature selection for microarray data". CD proceedings of IEEE International Conference of Intelligent Systems, Varna, Bulgaria, Sept. 2008. [This paper received the Best Student Paper award in the conference]
3. Z. Voulgaris, G. D. Magoulas, "Dimensionality reduction for feature and pattern selection in classification problems". Proceeding of The Third International Multi-Conference on Computing in the Global Information Technology, Athens, Greece, July 2008, 160-165.
4. Z. Voulgaris, B. Mirkin, "Choosing a Discernibility Measure for Reject-Option at a Set of Classifiers" (journal article), under revision.
5. Z. Voulgaris, G. D. Magoulas, "Discernibility-Based Extensions of the K Nearest Neighbour Rule for Pattern Classification " (journal article), under review.
6. Z. Voulgaris, B. Mirkin, "Optimising a reliability measure for classification". Proceedings of the UKCI Conference, Leicester U.K., Sept. 2008, 43-46.
7. Z. Voulgaris, G. D. Magoulas, "Discernibility-based approach for creating ensembles in pattern classification applications". Proceedings of the UKCI Conference, Leicester U.K., Sept. 2008, 195-199.

Teaching

I worked as a sessional lecturer at Birkbeck College in the following courses:
  • Introduction to Computing - SPSS part (2007)
  • Analysis Planning and Control (2007)

    I also worked as a teaching assistant at Birkbeck College in the following courses:
  • Web developing (HTML/XHML/CSS) (2006)
  • Analysis Planning and Control (2006, 2007)
  • Introduction to Computing (2008)
  • Databases 1 (2006, 2008)

    In addition, I participated in the creation of the lab notes for the SPSS part of the "Introduction to Computing" module (2007)

    Software

    Over the past few years I developed the following programs in Matlab (Linux Version). Feel free to use them for your research, yet be sure to reference them in your work (see relevant paper of the list above). To download the file, right-click on the relevant link and select "save link as".
  • Spherical Index of Discernibility - This is one of the developments of the Discernibility concept. It uses hyperspheres around each pattern of a dataset, to assess how discernible it is. Also, it yields the overall discernibility of the dataset. First appeared in article 1.
  • Harmonic Index of Discernibility - This is another one of the developments of the Discernibility concept. It does the same but it uses the harmonic mean of the distances around each pattern of a dataset to the patterns of its class instead. First appeared in article 6.
  • Discernibility-based k Nearest Neighbour - This is a variation of the kNN classifier, taking into account the Discernibility of each neighbour as well as its distance, for the classification of each test pattern. First appeared in article 1.
  • Weight-based k Nearest Neighbour - This is another variation of the kNN classifier, using the discernibilities of the various features as weights for the distance calculation. First appeared in article 1.
  • Variable k Nearest Neighbour - This variation of the kNN classifier is one of the self-determining k algorithms developed. It finds the best k for each test pattern and applies it for the classification process. First appeared in article 1.
  • Class-Based k Nearest Neighbour - This is another self-determining k variation of the kNN classifier. It considers k neighbours for each class and uses the one that maximises its certainty. First appeared in article 1.
  • Feature Selector 3 - This is the simplest feature selection program developed in this project. It employs the Discernibility concept in order to assess the quality of each feature and then selects the features that have a discernibility over a given threshold. First appeared in article 2.
  • Feature Selector 10 - This is a more sophisticated feature selection program. It employs the Discernibility concept in order to assess the quality of groups of feature and then selects the group which has a discernibility at least as high as that of the original feature set. First appeared in article 2.
  • Data Reducer - This program was developed to demonstrate that Discernibility can be useful for reducing the number of patterns in a dataset, maintaining its structural properties. It reduces a given dataset using a user-defined ratio. First appeared in article 3.
  • Feature Subsets - This program is for comparison purposes only (no originality whatsoever). It randomly produces k subsets of a given feature set. It was used a basis for developing the other Feature Subsets programs.
  • Balanced Feature Subsets - This program produces k subsets of a given feature set, so that they have more or less the same classification quality, in terms of Discernibility. First appeared in article 7.
  • Diverse and Balanced Feature Subsets - This program produces k subsets of a given feature set, so that they are both diverse in terms of errors and balanced in terms of Discernibility. The diversity of errors is measured in terms of correlation of these errors. First appeared in article 7.
  • Reliability of Classification - This program takes a classifier and its predictions on a given dataset, and then calculates how reliable these predictions are, based on how compatible they are with the predictions made on the training set, using the original prediction as part of the input data. Also, it returns an elite set of predictions, based on the highest Reliability scores.
  • Distance Matrix - This is an auxiliary program used to calculate the distances among the different points of a given dataset.
  • K Fold Cross Validation partitioner - This is a useful program used in all the experiments conducted in this research. It takes a given dataset and partitions it into K equal parts, the K-1 of which are used as a training set and the remaining one as a testing set.
  • Net Reliability - This is an auxiliary program that calculates the Net Reliability of a classification based on the prediction of the classifier, the corresponding Degree of Certainty vector and the correct labels array.
  • Light version of kNN classifier - This is a very light version of the kNN classifier, used in many of the experiments. It is customised so as to be able to perform classification using a given subset of features (NFS).
  • All of these programs in a ZIP file