7

I would like to do a comparison from a query with pictures in a database (about 2000).

Before posting on this website i read a lot of papers concerning methods for matching a picture in a big database and read a lot of posts on stackOverflow.

Concerning papers, there are some stuff interesting but quite technical and difficult to understand well the algorithms. (I just began to specialize myself in this field)

Posts (the most interesting) :

Simple and fast method to compare images for similarity ;

Nearest neighbors in high-dimensional data? ;

How to understand Locality Sensitive Hashing? ;

Image fingerprint to compare similarity of many images ;

C++/SIFT/SQL - If there a way to compare efficiently a SIFT descriptor of an image with a SIFT descriptor in a SQL database?

Papers :

Object retrieval with large vocabularies and fast spatial matching,

Image Similarity Search with Compact Data Structures,

LSH,

Near Duplicate Image Detection min-Hash and tf-idf Weighting

Vocabulary tree

Aggregating locals descriptors

But i'm still confusing.

The first thing i did is to implement BoW. I trained the Bag of Words (with ORB as detector and descriptor ,and used VLAD features) with 5 class in order to test its efficiency. After a long training, i launched it. It functioned well with an accuracy of 94 %. That's pretty good.

But there is a problem for me:

  • I don't want to do a classification. In my database, i'll have about 2000 differents pictures. I just want to find the best matches between my query and the database. So if i have 2000 differents pictures,if i'm logical i have to consider these 2000 pictures as 2000 differents class and obviously that's impossible...

For this first thing, are you agree with me ? It's not obviously the best method to do what i would like ? Maybe there is another way to use BoW in order to find similarities in the database ?

The second thing i did is « more simpler ». I compute the descriptors of my query. Then i did a loop over all my database and i computed the descriptors of each picture and then added each descriptors in a vector.

std::vector<cv::Mat> all_descriptors_database;
for (i → 2000) :
    cv::Mat request=cv::imread(img);
    computeKeypoints(request) ;
    computeDescriptors(request) ;
    all_descriptors_database.pushback(descriptors_of_request)

At the end i have a big vector which contains all the descriptors of the all database. (The same with all the keypoints)

Then, this is here where i get confused. At the beginning, i wanted to compute the matching inside the loop that is to say, for each image in the database, compute its descriptors and do a match with the query. But it tooks a lot of time.

So after reading a lot of paper about how find similarities in big databases, i found the LSH algorithm which seems to be appropriate for that kind of search.

Therefore i wanted to use this method. So inside my loop i did something like that :

//Create Flann LSH index
cv::flann::Index flannIndex(all_descriptors_database.at(i), cv::flann::LshIndexParams(12, 20, 2), cvflann::FLANN_DIST_HAMMING);
        cv::Mat results, dists;
        int k=2; // find the 2 nearest neighbors
        // search (nearest neighbor)
        flannIndex.knnSearch(query_descriptors, results, dists, k, cv::flann::SearchParams() );

However i have some questions :

  • It tooks more than 5 seconds to loop all my database (2000) whereas i thought it will take less 1s (on the papers, they have huge databases not like me and LSH is more efficient). Did i do something wrong ?

  • I found on the internet some libraries which implement LSH like http://lshkit.sourceforge.net/ or http://www.mit.edu/~andoni/LSH/ . So what is the difference between these libraries and the four line of code i wrote using OpenCV ? Because i checked the libraries and for a kind of beginner like me, it was so difficult to try to use it.I got a bit confused.

The third thing :

I wanted to do a kind of fingerprint of each descriptors for each picture (in order to compute the Hamming distance with the database) but it seems to be impossible to do that. OpenCV / SURF How to generate a image hash / fingerprint / signature out of the descriptors?

So since 3 days, i'm blocked on that task. I don't know if i'm on the wrong way or not. Maybe i missed something.

I hope it will be enough clear for you. Thank for reading

Community
  • 1
  • 1
Joker
  • 91
  • 1
  • 5

1 Answers1

0

Your question is kind of big. I'll give you some hints, though.

  1. Bag of Words can work but classification is unnecessary. BoW pipeline typically consists of:

    • keypoint detection - ORB
    • keypoint description (feature extraction) - ORB
    • quantization - VLAD (fisher encoding might be better, but plain old kmeans might be enough in your case)
    • classification - you probably can skip this stage

    You can treat quantization results (e.g. VLAD encoding) for each image as its fingerprint. Computing distance between fingerprints will yield a similarity measure. Still you have to do a 1 vs all matching, which is going to be tremendously expensive when your database gets big enough.

  2. I didn't get your point.

  3. I'd suggest reading G. Hinton's papers (e.g. this one) on dimensionality reduction with deep autoencoders and convolutional neural networks. He boasts of beating LSH. As for the tools, I'd recommend taking a look on BVLC's Caffe, a great neural network library.

Adam Kosiorek
  • 1,288
  • 1
  • 12
  • 16