Report

The time segment we will be analysing in order to identify a drum is indicated at the start of each hit (we will call this segment the recognition segment). The first hit, which is a kick drum, is immediately followed by a high hat. As can be seen, the recognition segment of the hat is nearly completely obscured by the sound continuing from the kick drum. Thus the FFT of the recognition segment obtained from the hat will contain a significant amount of kick drum sound. It is therefore fair to assume that the spectrum of frequencies in a segment immediately preceding the high hat hit will also be present in the sound of the hat (we will call this segment the noise segment). Thus if the FFT of the noise segment immediately preceding the hat is subtracted from the FFT of the recognition segment obtained for the hat, hopefully mst of the elements o the kick drum frequencies will be removed from those of the high hat.
If the preceding drum is not as resonant as a kick drum, then the preceding noise segment should merely contain background noise, and it is also desirable that this be removed from the following drum.
Recordings of nearly all type of musical instrument, especially that of drum kits, are put through a degree of audio compression. Thus it can be assumed that the louder hits present in a break beat will have largely suppressed the elements of noise they contain. This will not be true for the quieter hits. Thus the amount of noise reduction applied to a particular drum should be inversely proportional to the relative amplitude of that drum within the break to which it belongs. This will decrease the chances of performing noise reduction on drums likely to have obscured most of the background noise.

This sort of noise reduction should thus be appropriate for the project and investigation of different parameter values should be performed.

3.2.4. Clean and noisy data

All of the sampled break beats constituting the data used in this project have come from similar sources to those a modern recording studio might use. In about 70% of these breaks there is therefore a certain degree of noise similar to that described in section 2.1.3. During the training of the neural networks we will refer to data that has had noise reduction performed as clean data and untreated data as noisy data. In order to further remove noise from the clean data during training, the drum sounds can be put through highpass filtering in order to remove any traces of low frequency noise (which in these cases carry a greater spectral energy than higher frequency noise). This can be performed by ignoring those coefficients generated by the FFT below a certain frequency. As the type of each drum is known during training, then filtering at different cutoff frequencies can be performed depending on the type. It is suggested that these cutoff frequencies nominally be: 40Hz for kicks, 150Hz for snares, 80Hz for toms, 400Hz for cymbals, and 250Hz for congas.
The highest frequency that this project will be looking at is 14kHz. This value was nominally chosen as it gives an adequate degree of sonic clarity, but keeps the amount of data to be processed to a minimum.