1

I am trying to write a script that compares images, and tells me whether or not the images are the same. Here is some minimal code:

import requests

url1 = 'https://scontent-lga3-1.cdninstagram.com/vp/b4577921aa35369af8980a3d563e4373/5DAE3C31/t51.2885-15/fr/e15/s1080x1080/66126877_342437073345261_1373504971257332049_n.jpg?_nc_ht=scontent-lga3-1.cdninstagram.com'
url2 = 'https://scontent-lga3-1.cdninstagram.com/vp/fab3372181d5ad596280d2c095a3496e/5DE99775/t51.2885-15/e35/67547020_369706770411768_8601267197685673619_n.jpg?_nc_ht=scontent-lga3-1.cdninstagram.com'
print(requests.Session().get(url1).content == requests.Session().get(url2).content)

However, if you manually navigate to each url you will see that the photos are the same. My question; Can I compare these images WITHOUT having to save them to a directory? I was thinking about perhaps reading these images both as binary, then doing the comparison, however I have no idea how to do that on the fly. Thanks for all of those who reply in advance.

Kyle
  • 321
  • 2
  • 4
  • 15

2 Answers2

0

If you want to see if two images are exactly the same, you can use BytesIO and PIL

import requests
from io import BytesIO
from PIL import Image

def get_image_data(img_url):
  img = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
  byteio = BytesIO()
  img.save(byteio, format='PNG')
  return byteio.getvalue()

url1 = 'https://scontent-lga3-1.cdninstagram.com/vp/b4577921aa35369af8980a3d563e4373/5DAE3C31/t51.2885-15/fr/e15/s1080x1080/66126877_342437073345261_1373504971257332049_n.jpg?_nc_ht=scontent-lga3-1.cdninstagram.com'
url2 = 'https://scontent-lga3-1.cdninstagram.com/vp/fab3372181d5ad596280d2c095a3496e/5DE99775/t51.2885-15/e35/67547020_369706770411768_8601267197685673619_n.jpg?_nc_ht=scontent-lga3-1.cdninstagram.com'

print(get_image_data(url1)==get_image_data(url2))

Though it seems these images have a slight difference between them, and this code returns false.

Conch
  • 56
  • 4
0

It would be better to take an approach which does not depend on the size of the image. Your two images could be a thumbnail and a full-sized image.

I would combine the approaches from: @Tanner Clark Image comparison - fast algorithm and the approach above:

import requests
from io import BytesIO
from PIL import Image, ImageFilter
import imagehash

def get_image(img_url):
    img = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
    byteio = BytesIO()
    img.save(byteio, format='PNG')

    return img

url1 = 'http://scontent-lga3-1.cdninstagram.com/vp/b4577921aa35369af8980a3d563e4373/5DAE3C31/t51.2885-15/fr/e15/s1080x1080/66126877_342437073345261_1373504971257332049_n.jpg?_nc_ht=scontent-lga3-1.cdninstagram.com'
url2 = 'http://scontent-lga3-1.cdninstagram.com/vp/fab3372181d5ad596280d2c095a3496e/5DE99775/t51.2885-15/e35/67547020_369706770411768_8601267197685673619_n.jpg?_nc_ht=scontent-lga3-1.cdninstagram.com'


def compare_images(url1, url2):
    img1 = get_image(url1)
    img2 = get_image(url2)

    if img1.width<img2.width:
        img2=img2.resize((img1.width,img1.height))
    else:
        img1=img1.resize((img2.width,img2.height))
    img1=img1.filter(ImageFilter.BoxBlur(radius=3))
    img2=img2.filter(ImageFilter.BoxBlur(radius=3))
    phashvalue=imagehash.phash(img1)-imagehash.phash(img2)
    ahashvalue=imagehash.average_hash(img1)-imagehash.average_hash(img2)

    threshold = 1 # some experimentally valid value

    totalaccuracy=phashvalue+ahashvalue
    print(totalaccuracy)
    return totalaccuracy <= threshold

print(compare_images(url1, url2))
Nathan P
  • 24
  • 3