Report

6.1.3. Recurrent MLPs

The main objective of investigating recurrent MLPs was to discover whether the introduction of temporal information would improve the recognition of the drum sounds. It was thought that the introduction of 1 extra recognition segment (2 time segments) would immediately improve results. However, as this was not the case (the results were still reasonable) it was decided not to pursue the investigation of the recurrent networks beyond 4 time segments.
As can be seen in figure 17, the classification errors of the recurrent network are higher than those of the feed forward networks but also quite variable. Further investigation into changes of the overall structure can be seen in figure 18. It was thought that adjusting the number of hidden nodes for a 30 input network would yield a good result, however the best classification was found in the simpler network structure of 20:20:5. This again can be attributed to a balance between complex decision surfaces (smaller hidden structures) and overfitting (larger hidden structures).
The overall increase in error compared with the feed forward MLPs could be due to the following: drums of similar type are subject to different durations of amplitude decay, due to the amount of damping that has been applied to the drum skin and also the amount of reverberation applied to the recording. This means that recognition segments analysed after the initial segment could vary greatly in volume (and also in noise content depending on recording compression). This phenomenon could be avoided by increasing the number of patterns in the training data in order to increase the diversity of amplitude decay lengths for drums of similar type. Another method would be to change the start position of the second and subsequent recognition segments depending on the speed of amplitude decay of that particular drum.

6.2. Kohonen SOM

6.2.1. Clustering

Figure 22 shows how clustering of the SOM takes place over the course of a training cycle. This is for a nominal 9 by 9 grid, with 30 inputs and 1000 epochs of training. It should be noted that although a node is 'labelled' depending on the predominant winner type, that node may contain winners of other types.
The clustering of nodes can be seen taking shape at around 500 epochs and the clusters becoming more evenly distributed at 1000. It appears that there is one cluster for each of the types conga and hat, and two clusters for each of the type tom, kick and snare.