The website is sort of a gallery. But to prevent duplicate entries. I want to match them. It wont be 100% bulletproof image match, but for my needs its absolutely perfect solution.
Only problem is, that I don't know the correct way, to get a sha1 from Imagick $image
object.
This is what I kinda have right now, and it does produce a hash. But it doesn't match with the ones, that I have in the server. And in the server, its the same course of optimizing the image down to the smallest thumbnail. Except, ad the end there is file_put_contents($root, $image);
at the end of each image manipulation block. But I don't think the problem is there, I think the problem might be, that I'm missing something from the $image
object inside the sha1()
function. Like something like sha1($image->rendercurrentimage())
..
<?
$img_url = 'someimgfile.jpg';
# Step 1 = Original file hash - This is all ok
$source_hash = sha1_file($img_url);
$image = new Imagick($img_url);
# file_put_contents($source_root, $image);
$image->gaussianBlurImage(0, 0.05);
$image->setCompression(Imagick::COMPRESSION_JPEG);
$image->setCompressionQuality(90);
$image->setImageFormat('jpeg');
$image->scaleImage(215, 0);
# file_put_contents($thumbnail_root, $image);
# Step 2 = Get the thumbnail hash - results in a non matching hash vs. DB hash
$thumbnail_hash = sha1($image);
$image->setCompressionQuality(75);
$image->cropThumbnailImage(102, 102);
# file_put_contents($smallthumbnail_root, $image);
# Step 3 = Get the even smaller thumbnail hash - results in a non matching hash vs. DB hash
$smallthumbnail_hash = sha1($image);
# now query to DB to check against all 3 hashes: $source_hash | $thumbnail_hash | $smallthumbnail_hash
# DB has lets say 1000 images, with source hash, thumbnail hash and small thumbnail hash saved in them
# NOTE: The process of scaling images as they enter the DB, is exactly the same, expect there are file_put_contents($root, $image); in between them.. I put them in and commented out, to show you the locations
As I said above. I have the match-against hashes located in the server 3 ways. So original, thumbnail and even smaller thumbnail. And Those were created with sha1_file()
function. I would like to mimic the hole process basically, but not to save the file in the $root, in case its a duplicate and there for will be denied and redirected to the matched-against entry.
If you are wondering, why I want to match the thumbnails. Its because, my tests shows, that if the original file might be different in size and etc. Then the thumbnails created, matched kinda well. Or am I wrong? If I have the same image, in 3 different sizes. And I scale them down to let say 100px width. Will their hashes be the same?
Conclusion
I had to rewrite the original image handler a bit. But basically I think there was still a piece missing in my code like $image->stripImage();
. Or something. While it started getting better results. It seems the most optimal way to keep hashes in the server is to:
$hash = sha1(base64_encode($image->getImageBlob()));
My tests also confirmed, that file_put_contents($thumbnail_root, $image);
and then getting the hash via sha1_file($image_root);
will not change the hash values.
I also got more matching results from bigger images scaled down to thumb sizes.