Sorting a list based on the similarity of the list elements with a given word

Question 1

I am getting a list from the database. My task is to sort this list according to the highest similarity to a given word.
Example:
I get a list:

list_name = [‘Alex Trucking’, ‘ALEX TRUCKING’, ‘Alex K Trucking’, ‘ALEX F TRUCKING’, ‘Alex FC Trucking’, ‘ALE TRUCKING’, ‘ALE trucking’, ‘ALEX TRUCKING LLC’, ‘Alex Trucking LLC’, ‘AL trucking’]

Word to compare:

word_to_compare = “ALEX TROCKING” (Yes, this is an intentional mistake. Although sometimes I may come across a word_to_compare that is exactly the same as one of the words in the list)

My task is to sort this list like this output:

[‘ALEX TRUCKING’, ‘ALEX TRUCKING LLC’, ‘Alex Trucking’, ‘Alex Trucking LLC’, ….. and anothers elements]

That is, the main idea for sorting is case and maximum similarity with word_to_compare

Question 2

It all depends on how you define “similarity”.

The Levenshtein module has numerous functions that you could try. There may be one that suits your exact use-case.

This example does not produce your required output but (hopefully) will help you to understand the kind of approach that could work for you. For this example I’ll use ratio()

First of all (if you don’t have it already):

pip install levenshtein

Then…

from Levenshtein import ratio

MASTER = 'ALEX TROCKING'

def key(s):
    return ratio(MASTER, s)

list_name = ['Alex Trucking', 'ALEX TRUCKING', 'Alex K Trucking', 'ALEX F TRUCKING', 'Alex FC Trucking', 'ALE TRUCKING', 'ALE trucking', 'ALEX TRUCKING LLC', 'Alex Trucking LLC', 'AL trucking']

print(sorted(list_name, key=key, reverse=True))

Output:

['ALEX TRUCKING', 'ALE TRUCKING', 'ALEX F TRUCKING', 'ALEX TRUCKING LLC', 'ALE trucking', 'Alex Trucking LLC', 'AL trucking', 'Alex Trucking', 'Alex K Trucking', 'Alex FC Trucking']

Leave a Comment Cancel reply