2022-12-29
I finished the first pass over the data from my last trip on the 17th of December.
Prior to the trip I had trained my own binaray classifier on 160gb of mostly Pomona data. About 10% was from Secretary Island, in total 4% of the audio was actual kiwi calls. It had very good validation statistics on the 20% data held out from training to check results after each epoch. Opensoundscape/pytorch made very good use of compute once I put the data on the ssd instead of the hard drive. It completed an epoch, including validation every hour, using 12 cores and big chunks of 100% GPU. It trained for 95 epoch’s, but I lost the best model due to a mistake I made, and ended up using the best model from around 75 epochs, there was very little difference in their statistics, decimal places.
Predicting with the model was extremely impressive, it finished processing 800gb raw audio in hours, AviaNZ would have taken a week running 24/7. I ran it over all my data, close to 6TB in a few days, less time than I would have taken processing 1 trip worth of audio using AviaNZ. The results are unprocessed except for the new data fetched in December. Time is short.
I over ran my usage limits on Airtable which I was using for the first pass over detections. I found a very efficient workflow on my MacBook. I merge the spectrogram image and audio file into a video, all the details I need are in the file name, and I use tags in Finder to label files. The tag is written into the file metadata and I can later retrieve a long list of labels for each file. Working with Finder is extremely efficient, you can tag files with keyboard shortcuts, even tag multiple files at the same time, sort tags, etc. Finder is brilliant. The labels stick to the file because they are part of it.
When predicting, Opensoundscapes looks at 5s chunks of audio with a 2.5s overlap. I get a long list of segmeents for each file, with a 0 or 1 assigned. There is some stuff going on there, to get a binary result and I may need to do some refining but overall it is very good. I then take that list, exclude any detection that falls in the day, defined by civil twilight. I made a thing that chunks it all up into actual calls, essentially I discard any detection that has no other detections nearby, but anything within 10s of any other detections gets chunked up. It works extremely well, no more half calls, except when they overflow a file during recording.
I have simplified my labels, for kiwi, all I now label are Male/Female, and I mark Close calls so I can find them. I Plan to find duets algorithmically in future.
I had a mind boggling 10 716 detections to wade through:
11 files got a ?
7 Geese
22 Kaka, yes 22! C05, D03, F05, F09, H04, M04, S13T, T10.
5 Kea, D03 & D09
237 LTC (Long tail cuckoo, need to add them in to the model, its doing well though, mostly distant LTC detected.)
Many creaking trees from J11, also need to add them in to the model.
Loud close frogs are no longer a problem, but distant frogs now are! Also need to add them in to the model. The model pulls a lot of kiwi, even distant ones, from the din of frogs at N20
Only 22 morepork, my new model is right onto it.
A lot of dawn chorus, but also a lot of kiwi in the middle. Also need to add dawn chorus in to the model. Kiwi are active in the early mornings on Pomona, all the time.
This model detects a lot more calls than I am used to. Previously I had detected a total of around 5400 calls on Pomona. Must have been missing many many calls. I will find them.
First I need to reduce my false positives.
The Plan:
Finish labelling this data
Construct a new dataset
Train a new model, this one will likely classify Male/Female/NotKiwi
Go through all my data with hopefully fewer false positives.
Train a new new model t use on the next tranche of fresh data.
I am aiming to label calls automatically in future, just weed out a few exceptions. Or that is wnat my goal is anyway.
In future my skraak notbook will need to be re-written to accomodate a simpler, more automated labelling scheme.
Video
Audio