Object detection using Keras : simple way for faster R-CNN or YOLO

Question

This question has maybe been answered but I didn't find a simple answer to this. I created a convnet using Keras to classify The Simpsons characters (dataset here).
I have 20 classes and giving an image as input, I return the character name. It's pretty simple. My dataset contains pictures with the main character in the picture and only have the name of the character as a label.

Now I would like to add an object detection ask i.e draw a bounding box around characters in the picture and predict which character it is. I don't want to use a sliding window because it's really slow. So I thought about using faster RCNN (github repo) or YOLO (github repo). Should I have to add the coordinates of the bounding box for each picture of my training set? Is there a way to do object detection (and get bounding boxes in my test) without giving the coordinates for the training set?

In sum, I would like to create a simple object detection model, I don't know if it's possible to create a simpler YOLO or Faster RCNN.

Thank you very much for any help.

score 13 · Answer 1 · answered Aug 09 '17 at 18:42

The goal of yolo or faster rcnn is to get the bounding boxes. So in short, yes you will need to label the data to train it.

Take a shortcut:

1) Label a handful of bounding boxes for (lets say 5 per character).
2) Train faster rcnn or yolo on the very small dataset.
3) Run your model against the full dataset
4) It will get some right, get alot of it wrong.
5) Train the faster rcnn on the ones that are correctly bounded, your training set should be much bigger now.
6) repeat until you have your desired result.

score 4 · Answer 2 · answered Mar 07 '18 at 18:26

You may already have a suitable architecture in mind already: "Now I would like to add an object detection ask i.e draw a bounding box around characters in the picture and predict which character it is."

So you just split the task in two parts:
1. Add an object detector for person detection to return bounding boxes
2. Classify bounding boxes using the convnet you already trained

For part 1 you should be good to go by using a feature detector (for example a convnet pretrained on COCO or Imagenet) with an object detector (still YOLO and Faster-RCNN) on top to detect people. However, you may find that people in "cartoons" (let's say Simpsons are people) are not properly recognized because the feature detector is not trained on cartoon-based images but on real images. In that case, you could try to re-train a few layers of the feature detector on cartoon pictures in order to learn cartoon features, according to the transfer learning methodology.

Object detection using Keras : simple way for faster R-CNN or YOLO

2 Answers2