Find the right url for image

I have a list of urls with images on it(there are around 170 000 urls)
and i have a folder with images named by those urls and they are messed up:
the image named “url3” is actualy “url10000” and etc.
for example:

I have 3 urls represented in list and 3 images in my folder

list_of_urls = ['url1.com', 'url2.com', 'url3.com']

name of images in my folder:

url1.jpg, url2.jpg, url3.jpg

and heres the problem, if you try to open the URL1.com and try to match with the image named URL1.JPG you will notice that they are not the same and with the brute force opening every url you will get the actual URL of the image is URL3 and instead of 3 urls I have around 170 000

I cant just redownload the images with correct names because images have been marked with ML

What would be the best way to handle this?

I did try getting hash of the image using imagehash and saving it as a key value and then comparing with images from urls like this:

        img = Image.open(image_path)
        img_hash = str(imagehash.average_hash(img))

  • 1

    How do you determine which image gets which name? I’m not sure I understand what you mean by “correct names” for the images.

    – 

  • the correct name will be its acual url, as i mentioned i have a folder wtih 170 000 images and a python list with 170 000 urls its 100% that there is a correct url for each image but for now they are messed up @Ada

    – 

  • How do you associate the URLs from the list with the correct image? Can you add an example to your question with (part of) your list of URLs and the messed up image names and how you know which image should get which URL as its name?

    – 

  • For example i have URL1.com, URL2.com, URL3.com from the list and in my folder i have three images their names will be URL1, URL2, URL3 and heres the problem if you try to open URL1.com and match it with the image(in the folder) named URL1 you will see that the image you opened from folder is actualy an URL3 not URL1 @Ada

    – 

  • Please add all necessary information to your question directly, that makes it much more readable and easier for people to answer without having to read all comments.

    – 

Leave a Comment