Hitting a brick wall in a Kaggle Competition

Here I am reviewing my experience in the VSB Power Line Fault Detection Kaggle Competition.

The aim of the competition was to “detect partial discharge patterns in signals acquired from these power lines with a new meter designed at the ENET Centre at VSB — Technical University of Ostrava”.

A Partial Discharge ‘PD” is “A localized and quick electric discharge that only partially bridges the insulation material between the conductor’s materials or electrodes”. They are identified by localised higher than normal amplitude, high frequency spikes in the signal. Several types of PD’s occur: internal, surface, corona or electrical treeing (see here for more details).

The data consisted of several thousand signal samples with “each signal contains 800,000 measurements of a power line’s voltage, taken over 20 milliseconds. As the underlying electric grid operates at 50 Hz, this means each signal covers a single complete grid cycle. The grid itself operates on a 3-phase power scheme, and all three phases are measured simultaneously.

The first thing to do in any competition is to explore the data. See below for an example of one signal.

Example of one sample signal, raw in blue and with different frequency filters applied.

Here is an example where we apply a 100Hz cutoff to divide the signal into the 50Hz base wave and higher-frequency component.

Signal separated into low and high frequency components

Initial analysis of signal data can be found here.

I looked at the relationship between high amplitude spikes and the position relative to the maximum part of the phase of the 50Hz carrier signal, and there do appear to be locations of concentration.

y axis=amplitude, x axis = distance to low frequency maximum; 0 = HF spike >4V located on maximum of LF signal

Early on in the competition I decided to convert the 1D signals into spectrograms and then use a CNN to classify the images.

1D single phase example converted to a spectrogram, Colour being amplitude of the signal.

I then used standard fastai CNN classification code, ResNet34 backbone, simple augmentation consisting of flipping about a vertical axis only. See code here.

I was in eighth position on the Kaggle leaderboard at around week 2 into the competion and I was thinking, OK, I’ve done it, I’m in the running to place, all I need to do is do a bit more augmentation, and train with a deeper network.

Well, I was really wrong about that. I stubbornly continued with more and more data augmentation and deeper networks and my results got worse and worse. I never did improve on the basic ResNet34+image flip methodology.

One of the issues with this dataset was that there were far more examples of good signals than bad (PD’s). Generating more realistic PD examples was a key part of my experimentation. I spend a significant amount of time separating out the HF and LF components, shifting the HF component by n samples, then recombining with a LF component (experimenting with different phase positions). Scripts for signal shifting, recombining and converting to spectrograms can be found here.

The deepest network I tried was Densenet121. I Had to reconfigure my machines (2 weeks setting up watercooling) to fit 3x GTX 1080ti’s to have sufficient GPU memory to run this network. I was using augmented signal data with additional PD samples (HF time shifted and recombined with LF component) and different percentages of augmented data — up to a 1:1 ratio of good samples and PD samples. Deeper ResNet networks were really spiky when analyzing learning rates using the fastai lr finder, DenseNet networks were smoother, but still somewhat spiky eg see below.

I found results to be really sensitive to learning rate, and after much experimentation with learning rates, only got fair results. As can bee seen in the matrix blow I was wrongly predicting a significant number of good samples (actual=0) as bad (PD).

My takeaways are: 1) Don’t assume that deeper networks and more GPU VRAM is the answer; 2) Really understanding the domain is important — if I had time again I would do more research into the physics of PD’s and their recognition — then I may have done a better job in generating synthetic PD examples for training; 3) Keep investing in learning — for several months before and during the competition I hadn’t been keeping up to date with learning deep learning best practices.

Geophysicist and Deep Learning Practitioner