7. Discussion and Conclusions

7.1. Overall performance of project

This project was an initial investigation into the recognition of drum sounds and so it must be assumed that the results could be improved upon with further study.
In terms of providing the recognition requirements of the application described in section 1.5
, I believe the techniques discussed could be used successfully and when improved upon could be highly functional.
As was shown in the previous chapter, the best recognition rate using the described methods was 74.4% (25.6% error). This error value is quite reasonable when you consider the following:

  • The vast majority of break beats used in a recording studio deal with rhythms just featuring kick, snare and hat drums. As shown is section 6.3 the error rate for these three types (for the best case) averages to 14.1%.

  • The inclusion of drum combination classes in this project was not entirely necessary, but included to show performance. It is envisaged that the tom/hat and conga/hat classes could be left out of the training and test data. This would limit the number of classes to be recognised and thus overall performance. It would not be of great hindrance to the user as tom/hat and conga/hat combinations are not that common (see further developments).

7.2. Spectral analysis

It was suggested in section 2.1.1
that drum sounds could be successfully recognised using one static spectral measurement taken from the start of each drum. The results of this project show that this is possible.
Frequency quantisation has been shown as a useful method of limiting the amount of processing required of the neural net (it should be stated here that recognition and training of a network in the order of 1000 inputs takes approximately ten times as long as a 60 input network). Quantisation also produces a kind of spectral generalisation which is of use in recognising new patterns.
The quantisation ranges were divided equally across the frequency spectrum, but there could be better ways of making these divisions. It is very plausible that some ranges of the frequency spectrum require closer observation than others for the specific task of recognising drums. Thus a predetermined non-uniform set of frequency boundaries could possibly improve recognition. The advantage of the uniform frequency split used in this project however is its flexibility.

7.3. Noise reduction

The noise reduction algorithm seems to have worked well across all network types. This is probably because it specifically looks at what noise will be present in a sound by looking 'before' the sound. There are however problems with this method. Consider two kick drum hits played in succession. If they have been substantially compressed or if the time between them is short, then the volume of the first kick will not have fallen by a great amount when the second kick occurs. If the second kick is of lower volume, then the noise reduction algorithm may simply remove a great deal of the spectral body from the recognition segment of the second. These cases would rely on a great deal of compression being present however and are probably not frequent enough to warrant changing the algorithm.
The time taken to perform noise reduction is a negligible addition to the whole recognition process. When performing recognition on one break, which is the most likely unit that the finished application will look at, the limiting speed factor will be the time taken to read the data from disk.