2

The website is sort of a gallery. But to prevent duplicate entries. I want to match them. It wont be 100% bulletproof image match, but for my needs its absolutely perfect solution.

Only problem is, that I don't know the correct way, to get a sha1 from Imagick $image object.

This is what I kinda have right now, and it does produce a hash. But it doesn't match with the ones, that I have in the server. And in the server, its the same course of optimizing the image down to the smallest thumbnail. Except, ad the end there is file_put_contents($root, $image); at the end of each image manipulation block. But I don't think the problem is there, I think the problem might be, that I'm missing something from the $image object inside the sha1() function. Like something like sha1($image->rendercurrentimage())..

<?
$img_url = 'someimgfile.jpg';

# Step 1 = Original file hash - This is all ok
$source_hash = sha1_file($img_url);

$image = new Imagick($img_url);
# file_put_contents($source_root, $image);

$image->gaussianBlurImage(0, 0.05);
$image->setCompression(Imagick::COMPRESSION_JPEG);
$image->setCompressionQuality(90);
$image->setImageFormat('jpeg');
$image->scaleImage(215, 0);
# file_put_contents($thumbnail_root, $image);

# Step 2 = Get the thumbnail hash - results in a non matching hash vs. DB hash
$thumbnail_hash = sha1($image);

$image->setCompressionQuality(75); 
$image->cropThumbnailImage(102, 102);
# file_put_contents($smallthumbnail_root, $image);

# Step 3 = Get the even smaller thumbnail hash - results in a non matching hash vs. DB hash
$smallthumbnail_hash = sha1($image);

# now query to DB to check against all 3 hashes: $source_hash | $thumbnail_hash | $smallthumbnail_hash
# DB has lets say 1000 images, with source hash, thumbnail hash and small thumbnail hash saved in them

# NOTE: The process of scaling images as they enter the DB, is exactly the same, expect there are file_put_contents($root, $image); in between them.. I put them in and commented out, to show you the locations

As I said above. I have the match-against hashes located in the server 3 ways. So original, thumbnail and even smaller thumbnail. And Those were created with sha1_file() function. I would like to mimic the hole process basically, but not to save the file in the $root, in case its a duplicate and there for will be denied and redirected to the matched-against entry.

If you are wondering, why I want to match the thumbnails. Its because, my tests shows, that if the original file might be different in size and etc. Then the thumbnails created, matched kinda well. Or am I wrong? If I have the same image, in 3 different sizes. And I scale them down to let say 100px width. Will their hashes be the same?

Conclusion I had to rewrite the original image handler a bit. But basically I think there was still a piece missing in my code like $image->stripImage();. Or something. While it started getting better results. It seems the most optimal way to keep hashes in the server is to:

$hash = sha1(base64_encode($image->getImageBlob()));

My tests also confirmed, that file_put_contents($thumbnail_root, $image); and then getting the hash via sha1_file($image_root); will not change the hash values.

I also got more matching results from bigger images scaled down to thumb sizes.

Kalle H. Väravas
  • 3,429
  • 4
  • 27
  • 46
  • Why the effort? If the image is really the *same*, then just overwrite it. – hek2mgl Sep 03 '14 at 10:52
  • Well, if the image is the same, then I don't need it in the system again. But, while the original source image match will already help me. I need to match the thumbnails as well, as this triple check, will give better results. – Kalle H. Väravas Sep 03 '14 at 11:01
  • possible duplicate of [Image fingerprint to compare similarity of many images](http://stackoverflow.com/questions/596262/image-fingerprint-to-compare-similarity-of-many-images) – Danack Sep 03 '14 at 11:50
  • @Danack, if it would be duplicate. I would have my answer already. My problem is different and deals with imagic object. Which I strongly believe, will help searchers in the future. As I searched for a solution for hours, before posting -- just as I always do, just as the SO guidelines recommend. – Kalle H. Väravas Sep 03 '14 at 12:00
  • The library you're using is an implementation detail - the theory is the same. – Danack Sep 03 '14 at 12:23
  • @Danack, yes but will it help the next person creates the same topic, because that suggested topic didn't help. Give me some time. I'm currently doing some tests. We can have a conclusion here and then close it or deleted. But if you leave it open, I promise, it will be the first search result for the next person who searches "how to sha1 imagemagic object". – Kalle H. Väravas Sep 03 '14 at 12:28

2 Answers2

1

As your problem is that you don't want to create a file on the file system for each step that you are going through then I would suggest that you grab the blob content for the steps and create a hash of that. For example:

<?php
//quick and dirty image creation to demonstrate my point
$image = new Imagick();
$image->newImage(100, 100, new ImagickPixel('red'));
$image->setImageFormat('png');

//base64 encode our blob and then generate a sha1 hash
$thumbnail = base64_encode( $image->getImageBlob() );
echo sha1($thumbnail);

If you are trying to match two different (original) sized images against each other then you may come up against resampling problems. e.g. I have a picture of a monkey that is 200px square, another, seemingly identical that is 400px square, if I do a resample down to 200px the images will not always match.

David Long
  • 171
  • 4
  • Interesting idea. Well, `sha1($image->getImageBlob())` didn't give the exact hash anyways. But I'm wondering, if could not hash the root file itself, when its done, but the get the blob, between the DB entry.. So I could make a perfect conditions for the matching in the future. – Kalle H. Väravas Sep 03 '14 at 11:37
  • In response to your edit. I'm aware of that fact. My problem is, that the images are basically identical, like I have instances of 20 identical images in the DB. But I think if I could get the thumbnails to the right hash, it would either prove, that its not necessary or, that it brings up more results. And all in all, I need to figure out which is the way to correctly sha1 a imagic object. I feel, that your blob function might be the answer. – Kalle H. Väravas Sep 03 '14 at 11:40
  • It might be an idea to store each hash against your database entries and grab the blob content for the image itself in the process. If storage space isn't limited this might be an option. – David Long Sep 03 '14 at 11:44
  • Looking over your code again it might be the lossy format and blurring you are using which is causing the change in hash? Have you considered testing the use of a separate thumbnail generation for the hash process (e.g. lossless, etc) – David Long Sep 03 '14 at 11:50
  • I'm currently testing against the same exact source file from db, not any of its duplicates. And hashes, no matter what I do, do not match. I'm gonna test a little further, but I guess it might not be possible, as the way I'm doing this right now. The finished, created file at the end, will be different. Storage limits are not issue by far, if anything, I should find something for mysql to do quickly, as its using 5% of the resources we could use total :S – Kalle H. Väravas Sep 03 '14 at 11:51
  • Well, its exactly the same way in the database entry. If I change the compression, then it wont match, what I have in the database. I'm gonna try all sort of stuff out and run some more tests, this is an interesting topic anyways. Let see what the conclusion will be. – Kalle H. Väravas Sep 03 '14 at 11:53
  • Anyways thank you. I figured out that one line was missing, and then some weird anomalies happened that first started matching the smaller thumb and then finally after base64 and blob, i got the best and static results. – Kalle H. Väravas Sep 03 '14 at 13:09
0

Just use this:

$sha1 = sha1_file($img_url);

But be careful to get the sha1 before processing the image! All your hashes should be generated based on the images as the users uploaded them so you can compare them with the hashes of future images without the need of processing them first.

Note! The hash will change even if you rescale the image, keeping proportions. Even if you open the file in a text editor and add a blank space, the hash changes.

Your idea with scaling the image to the same width might work, but only if they were scaled using the same function or parameters. It's not 100% trustable.

Traian Tatic
  • 648
  • 5
  • 18
  • Yes, this is what I'm doing it in the first step `$source_hash = sha1_file($img_url);`. But I also need to match the scaled ones. – Kalle H. Väravas Sep 03 '14 at 11:24
  • EDit to your edit: yes. I'm basically, hoping to get an answer, that confirms that `sha1($image)` is the correct way and your `file_put_contents($root, $image);` adds something to the file and there for they don't match. Or that im doing the $image object hashing wrong somehow. – Kalle H. Väravas Sep 03 '14 at 11:26
  • file_put_contents, as you have it, rewrites the file at the adress $root with the content of the var $image ($image can be either a string, an array or a stream resource) – Traian Tatic Sep 03 '14 at 11:40