This question has maybe been answered but I didn't find a simple answer to this. I created a convnet using Keras to classify The Simpsons characters (dataset here).
I have 20 classes and giving an image as input, I return the character name. It's pretty simple. My dataset contains pictures with the main character in the picture and only have the name of the character as a label.
Now I would like to add an object detection ask i.e draw a bounding box around characters in the picture and predict which character it is. I don't want to use a sliding window because it's really slow. So I thought about using faster RCNN (github repo) or YOLO (github repo). Should I have to add the coordinates of the bounding box for each picture of my training set? Is there a way to do object detection (and get bounding boxes in my test) without giving the coordinates for the training set?
In sum, I would like to create a simple object detection model, I don't know if it's possible to create a simpler YOLO or Faster RCNN.
Thank you very much for any help.