Report

8. Further Developments

8.1. Recognition developments

As discussed in the last chapter, it is conceivable that classification result could be improved by introducing new elements to the recognition structure. The application proposed and the methods studied here look at recognising drum sounds within the context of the rhythm that they are a part of (this context is necessary for noise reduction and identifying soft hits and pitch groups). However this context could be used more fully.
When manually identifying drums within a break, the listener recognises each hit not only from its individual frequency make up but also from the sonic juxtaposition between the drum and other drums in the same break. The listener also takes into account where the drum appears rhythmically within that break.
Thus three classification support methods are proposed.
1. The first looks at all the drum hits within one break to see how closely the frequency patterns of each drum match each other drum. This would be a loose association, and could be achieved using a reduced input Kohonen map. Drums of similar nature are grouped and identification performed on the group (using the MLP). If the group recognises four out of the five hits as one drum type, then the whole group is classified as that type.
2. A frequency measurement indicating where the bulk of spectral energy for a particular drum lies will indicate if the drum is a low, medium or high pitch sound. (This is similar to looking at the frequency of peak magnitude but takes into account the whole spectrum). If two drums have both been classified as a kick, but one is of high pitch and the other low, the network has obviously made a misclassification. Reclassification can then be performed with filtering applied
3. The last method takes into account the rhythmic position of a drum within a break. Many rhythms could be learned by a neural network and the results use to determine how likely it is that a certain drum is going to appear at certain point in time.

8.2. Improvements on the existing structure

Many drums needing to be identified may well be very similar to drums the network has seen before, only differing in tuning. Its also possible, for various reasons, that the break differs in the speed that it is heard to that it was recorded at. One way of making an additional generalisation during identification is perhaps to shift the neural network input vector up and down the input nodes (see figure 25 below).


	If a pattern that the network had already seen was presented for recognition purposes at a different pitch, then shifting the input vector by one or two nodes would be equivalent to hearing the pattern at the original pitch.