11

I was trying to use the object detection API of Tensorflow to train a model. And I was using the sample config of faster rcnn resnet101 (https://github.com/tensorflow/models/blob/master/object_detection/samples/configs/faster_rcnn_resnet101_voc07.config).
The following code was part of the config file I didn't quite understand:

image_resizer {
  keep_aspect_ratio_resizer {
    min_dimension: 600
    max_dimension: 1024
  }
}

My questions were:

  1. What was the exact meaning of min_dimension and max_dimension? Did it mean the size of input image would be resized to 600x1024 or 1024x600?
  2. If I had different size of image and maybe some of them are relatively larger than 600x1024 (or 1024x600), could/should I increase the value of min_dimension and max_dimension?

The reason why I had such question was from this post: TensorFlow Object Detection API Weird Behaviour

In this post, the author itself gave an answer to the question:

Then I decided to crop the input image and provide that as an input. Just to see if the results improve and it did!
It turns out that the dimensions of the input image were much larger than the 600 x 1024 that is accepted by the model. So, it was scaling down these images to 600 x 1024 which meant that the cigarette boxes were losing their details :)

It used the same config as I used. And I was not sure if I could change these parameters if they were default or recommended setting to this special model, faster_rcnn_resnet101.

yuhow5566
  • 529
  • 3
  • 20

1 Answers1

12

After some tests, I guess I find the answer. Please correct me if there is anything wrong.

In .config file:

image_resizer {
  keep_aspect_ratio_resizer {
    min_dimension: 600
    max_dimension: 1024
  }
}

According to the image resizer setting of 'object_detection/builders/image_resizer_builder.py'

if image_resizer_config.WhichOneof(
    'image_resizer_oneof') == 'keep_aspect_ratio_resizer':
  keep_aspect_ratio_config = image_resizer_config.keep_aspect_ratio_resizer
  if not (keep_aspect_ratio_config.min_dimension
          <= keep_aspect_ratio_config.max_dimension):
    raise ValueError('min_dimension > max_dimension')
  return functools.partial(
      preprocessor.resize_to_range,
      min_dimension=keep_aspect_ratio_config.min_dimension,
      max_dimension=keep_aspect_ratio_config.max_dimension)

Then it tries to use 'resize_to_range' function of 'object_detection/core/preprocessor.py'

  with tf.name_scope('ResizeToRange', values=[image, min_dimension]):
    image_shape = tf.shape(image)
    orig_height = tf.to_float(image_shape[0])
    orig_width = tf.to_float(image_shape[1])
    orig_min_dim = tf.minimum(orig_height, orig_width)

    # Calculates the larger of the possible sizes
    min_dimension = tf.constant(min_dimension, dtype=tf.float32)
    large_scale_factor = min_dimension / orig_min_dim
    # Scaling orig_(height|width) by large_scale_factor will make the smaller
    # dimension equal to min_dimension, save for floating point rounding errors.
    # For reasonably-sized images, taking the nearest integer will reliably
    # eliminate this error.
    large_height = tf.to_int32(tf.round(orig_height * large_scale_factor))
    large_width = tf.to_int32(tf.round(orig_width * large_scale_factor))
    large_size = tf.stack([large_height, large_width])

    if max_dimension:
      # Calculates the smaller of the possible sizes, use that if the larger
      # is too big.
      orig_max_dim = tf.maximum(orig_height, orig_width)
      max_dimension = tf.constant(max_dimension, dtype=tf.float32)
      small_scale_factor = max_dimension / orig_max_dim
      # Scaling orig_(height|width) by small_scale_factor will make the larger
      # dimension equal to max_dimension, save for floating point rounding
      # errors. For reasonably-sized images, taking the nearest integer will
      # reliably eliminate this error.
      small_height = tf.to_int32(tf.round(orig_height * small_scale_factor))
      small_width = tf.to_int32(tf.round(orig_width * small_scale_factor))
      small_size = tf.stack([small_height, small_width])

      new_size = tf.cond(
          tf.to_float(tf.reduce_max(large_size)) > max_dimension,
          lambda: small_size, lambda: large_size)
    else:
      new_size = large_size

    new_image = tf.image.resize_images(image, new_size,
                                       align_corners=align_corners)

From the above code, we can know if we have an image whose size is 800*1000. The size of final output image will be 600*750.

That is, this image resizer will always resize your input image according to the setting of 'min_dimension' and 'max_dimension'.

yuhow5566
  • 529
  • 3
  • 20
  • I want to keep keep_aspect_ratio_resizer {min_dimension: 2976 max_dimension: 4464 } what all should I change to do that – Ajinkya Feb 16 '18 at 21:50
  • 1
    Do you mean the size of your input image is (2976 x 4464)? – yuhow5566 Feb 17 '18 at 23:19
  • Yes, input images are 1000+ each has the resolution (2976 x 4464). And I have marked and labelled multiple boxes per image around the object to train – Ajinkya Feb 18 '18 at 02:35
  • 2
    Well... this is really a hard question. As I had mentioned before, your image would be always resized according to the min and max dimension. If you really wanted your object detector learn the correct size of the object in your input image. You maybe could try to train your model with several kinds of min and max dimensions. That is, it could recognize the object with different dimensions. BUT! If all of your input images had the same size, you could simply set min=2976 and max=4464. It should work. – yuhow5566 Feb 19 '18 at 16:26