I am getting a list from the database. My task is to sort this list according to the highest similarity to a given word.
Example:
I get a list:
list_name = [‘Alex Trucking’, ‘ALEX TRUCKING’, ‘Alex K Trucking’, ‘ALEX F TRUCKING’, ‘Alex FC Trucking’, ‘ALE TRUCKING’, ‘ALE trucking’, ‘ALEX TRUCKING LLC’, ‘Alex Trucking LLC’, ‘AL trucking’]
Word to compare:
word_to_compare = “ALEX TROCKING” (Yes, this is an intentional mistake. Although sometimes I may come across a word_to_compare that is exactly the same as one of the words in the list)
My task is to sort this list like this output:
[‘ALEX TRUCKING’, ‘ALEX TRUCKING LLC’, ‘Alex Trucking’, ‘Alex Trucking LLC’, ….. and anothers elements]
That is, the main idea for sorting is case and maximum similarity with word_to_compare
It all depends on how you define “similarity”.
The Levenshtein module has numerous functions that you could try. There may be one that suits your exact use-case.
This example does not produce your required output but (hopefully) will help you to understand the kind of approach that could work for you. For this example I’ll use ratio()
First of all (if you don’t have it already):
pip install levenshtein
Then…
from Levenshtein import ratio
MASTER = 'ALEX TROCKING'
def key(s):
return ratio(MASTER, s)
list_name = ['Alex Trucking', 'ALEX TRUCKING', 'Alex K Trucking', 'ALEX F TRUCKING', 'Alex FC Trucking', 'ALE TRUCKING', 'ALE trucking', 'ALEX TRUCKING LLC', 'Alex Trucking LLC', 'AL trucking']
print(sorted(list_name, key=key, reverse=True))
Output:
['ALEX TRUCKING', 'ALE TRUCKING', 'ALEX F TRUCKING', 'ALEX TRUCKING LLC', 'ALE trucking', 'Alex Trucking LLC', 'AL trucking', 'Alex Trucking', 'Alex K Trucking', 'Alex FC Trucking']
There are several python libraries that implement various string distances. Find one you like, then sort the list with
list_name.sort(key=lambda x: my_cool_distance(x, word_to_compare))
. See for instance pypi.org/search/?q=string+distance