Something in advance
Anyone who has read my last blog posts will surely see parallels to this one here. I have been working on a Security/Surveilance Camera concept for some time and always try against different approaches. Here you can read again. AI P
owered Security Cam:
Now I tried to change the basic solution architecture. I would like to have an Object Detection that is cut to my environment. This gives me the hope for a better performance in detection (speed and hit rate). There are some models that recognize general objects (the most common are MobileNet SSD, Yolo and various other R-CNNs,…), but I would have liked to have recognized more specific labels, such as "mother-in-law", "post", "garbage bin" etc.
1) I need a lot of data/pictures of my farm entrance at different seasons, weather conditions and with different OOIs (Objects of Interests).
This is not an issue for me now, as I have been running my system for a number of years. My data portfolio now contains over 55,000 images and about the same number of short video sequences, which I would like to evaluate for other purposes at a later date.
2) Next, I have to choose a way to train a model. There are countless many of them. I ask myself the questions: "Should I train everything myself?" "Local on my PC?" "Remote, in the Cloud – Azure 😉 ?". I choose Microsoft's Custom Vision API. On the one hand I want to get results quickly for testing purposes, on the other hand, because I am simply too Lazy to set everything up by hand *LoL*. In addition, the Custom Vision now has a great feature that I wanted to try out (smart labeler) – more on that later. And last but not least, I would like to get to know the ONNX format better, which I get comfortably via export.
3) Finally, I have to think about how to prepare the data for the training. For my Custom Object Detection I need some pictures of the above categories (labels). …Should not be a problem with 55,000 images. However, preparing suitable data is already very expensive. If I don't want to screen every picture, I have to be creative. I decided to automate pre-sorting. Here's how I proceed:
Preparing data
I want to divide my pictures into categories such as "person", "car in general", "post", "garbage can" and various others. Since these categories fall into general categories, I can use various pre-trained models. I choose Tiny YoloV3 because it is one of the most famous, which makes a very fast prediction, is quite small and still produces quite reliable results. According to my tests, it is quite robust at dusk as opposed to, for example, MobileNet SSD (but I do not claim to be an academic measurement).
With this net I can now sort my pictures into general categories. I would like to briefly explain how I have resolved this.
If you are curious and would like to see what I am about, please check out this video.
I did the reading of the picture folder with a C-Console app and analyze the images via a Docker container where Yolo is used.
The app picks up a source folder and lets in every single image. To speed up, I implemented the pipeline pattern. The image is then transferred to the DOCKer container's REST api and processed in it by object detection. The console then gets the labels back and adds them as Exif Meta data to the respective image.
The Console application adds the found labels (EXIF) to the image as metadata, such as titles. In this way, I can conveniently sort and filter in File Explorer. After this step I slowly get order in the pictures and can now collect my training data together with much less effort. (in the above picture I once marked the found objects by hand).
Train the data
As a next step, I want to train my specialized Object Detection. To do this, I use Microsoft's Custom Vision API, as teased above. I've written about Custom Vision in the article "Azure Custom Vision-Technology Deep Dives" and like to refer to it when you need details on how to handle it. The following image demonstrates the labels I have created (it is currently a mixture of general objects and specialized objects).
As a specialization, I trained an object called "Tonne". In the later course of my project I would like to have more detailed information about my surroundings – for example, whether the garbage can has been emptied or even simply taken away by the neighbor 🙂 – actually happens.
After some trainings with enrichment of several pictures, I have now learned the following results:
Well, I can't be satisfied with this result, but I can use it for the first attempt. (Temporary, I would still be at the manual sorting of the pictures 🙂 )
By the way, the above-mentioned feature "Smart Labeler" comes into play here. This ensures that when I edit my untagged pictures, I get suggestions for already known tags in the image marked. To do this, I must have done at least one training run (iteration).
In the picture you can see that of the provided (and already trained) labels, the label "V50" was found in three pictures. The slider at the bottom helps with adjustment – I sometimes had to set a confidence of 13% so I could tag more comfortably.
With these steps, I have now taken a first step for my own Object Detection Model. Now it's time to consume/apply the result.
Apply the created model
I downloaded the finished model via the export of the Custom Vision API. Since I would like to work more with the ONNX format, I decide to export it.
For the application I have modified my previous solution (the video above was recorded for this change). Instead of the Docker container, I have now written another module – also in C – which inserts the images of the source directory and then analyzes them based on the downloaded ONNX model. For quick processing, I have added a parallel processing with four simultaneous processes (by core), each of which analyzes one image at the same time. The results are not so bad for this situation; I would have expected a lower quality. Here is the output of a pass with 1698 files for the first test.
The run required 75.23s , which in my opinion is not so bad – that's about 0.04s per image (without GPU support).
Here is an example of a tagged image that has been sorted into the Post folder:
The following image shows a recognized "van". What is probably hard to see: at the top right is an image-in-picture that shows the van from a different perspective. Here it was recognized as a car. Unfortunately, the people between the van and the cottage were not recognized, which is probably due to the small size.
The other picture shows a post bus and a garbage can.
I'm happy, so I let the application run over all the pictures… . The total result is as follows:
From here on, it is only a matter of diligence until the data model is sufficiently well trained and it can be installed in a proper application.
Alternative to Model Download
Of course, you can improve the model without this detour. The Custom Vision API has its name for good reason. The service can also be addressed via REST calls or with an SDK. Then the variant from above can also be implemented. The difference is that I now have to query everything online. In addition, I can't use the CustomVision.ai portal to select the objects, but i have to work with other labeling tools, which is certainly a good thing, but for me this project was not in the foreground.
If you would like to look into the code or expand it, please check here in the GitHub repo.