0

I am interested in building a domain-specific image search application capable of searching for images similar to a given image. With a little google-fu I managed to find this question on this site. If I understand the top rated answer correctly then what I am looking to do is achievable by storing the luminosity data for each image in my library.

This is all well and good, but I need a way to quickly search through and compare against 25,000+ records. I have used PostgreSQL and so I immediately thought of it. The problem I find myself facing is that to store luminance data for 256 discrete possible values across 3 colors, I would need a table with 768 columns (r0,g0,b0,...,r255,g255,b255) and in order to effectively search across all records for similarities I would need 768 indices. I have never really worked with large scale data at this level before but that number seems a little unwieldy to me (although I don't know, my experience doesn't extend into this realm).

My other idea is to store the luminance data in one large text column (formatted like this: r0:rrr g0:ggg b0:bbb ... r255:rrr g255:ggg b255:bbb) and construct a full text search index on that column in order to allow searches across the data for similar images.

Another possibility is using the hamming distance between a query histogram and a stored histogram, but I do not believe that is possible to do quickly against all records in the database.

Am I even approaching this the right way? I am also open to any alternatives to relational databases that could provide fast, real-time search across my dataset.

Community
  • 1
  • 1

1 Answers1

0

It looks like you are putting each image into a 3 dimensional space -- have you tried looking at any geospatial / multidimensional query engines. Similar images should be near each other in 3-space with your approach.

J Chris A
  • 2,158
  • 15
  • 14