RBFMINE: Distributed Data Mining with Radial Basis

Functions on the Computational Grid

 

Investigators: George Roussos(SCSIS, BBK), Brad Baxter (Economics, BBK)

Sponsor: The Nuffield Foundation

Classification: distributed computing, distributed data mining, radial basis functions.

Recently there has been considerable interest in employing radial basis functions (RBFs) for data mining. The main reason for this development is the fact that radial basis functions do not suffer from the "dimensionality curse", that is the number of sample points required to achieve sufficiently accurate approximation of a target function increases only linearly with the number of dimensions (or independent variables). This property represents a significant advantage over more traditional regression methods.

On the other hand, until recently the Global RBF method has been associated with very high computational cost both for the computation of the RBF coefficient and for the estimation of its value at a particular point. However, during the past five years significant advances have been made towards rapid computation and evaluation schemes. Thus, today the computational complexity of RBF approximation has been reduced to linear dependence on the number of samples as well as on the number of evaluation points (compare this against the cubic order of computation and the square order of evaluation of the naïve approach).

Currently, whenever data mining software employs RBFs for their superior dimensionality properties it does so either at the cost of high computational cost (which effectively restricts the applicability of the method to relatively small data sets) or by using Locally Supported RBFs which arguably offer significantly lower accuracy. Indeed, compared against their Global counterparts, Locally Supported RBFs compute approximations taking into account sample points only in the region of the locality of their centres and thus missing out on global features of the data set.

In this project we propose investigation of the following issues: