Erwan CADO & Alban PERRIER - Bordeaux INP 2021
Video analysis research project
Support for gripping by computer vision tools for amputees of the upper limbs, carriers of neuroprostheses.
Hypothesis: the direction of the gaze expresses the intention and makes it possible to know which object the user wants to grasp
Dataset GITW_light 5 objects (Bowl, Can of Coca Cola, Milk Bottle, Rice, Sugar) Manual annotation of the objects
We are looking to study lifelong learning. Given a model (CNN) trained to recognize objects, we try to adapt this model to new images arriving on the fly with the “Move-to-Data” method
-
Clone the repository :
git clone https://github.com/aperrier004/video-analysis.git
-
Modify every path required in the notebook and have the data (such as models, DB, VIDEOS, GT, ...)
Train the model on GITW_light (https://dept-info.labri.fr/~mansenca/GITW_light/)
We did the training and evaluating with a mobile net model on the dataset which gave us an accuracy of 0.96.
We implemented a tracker in order to use bounding box on videos.
This method gave us a poor classification since the objects have several shapes and colors one video from another
We chose the Channel and Spatial Reliability Tracker because it is the most accurate according to the benchmarks we found. The results of the OpenCV tracker are much better than those of the previous method because of its ability to better generalize its detections.
We tried to update the model at the end of each video by training it on the objects that were detected during the video. In order not to erase the old weights, we have chosen a very small learning rate. This update of the model will make it possible to strengthen it with new data.
It gaves us average results.
We tried to use contours detection method, which gaves us overfitted results, very good on CanOfCocaCola and the Bowl, but not usable on the other. It gave us a 20% accuracy.
Here is an example of some working outputs :
- Move to data : Manage to use this method
- Use polynomial interpolation to stabilize the tracking in our selective search method
- Try to use forward and backward predictions in order to have more precise predictions