I’m working with two datasets that contain country names with slight variations, and neither dataset covers all possible countries. For example, one dataset might have “Russia” while the other has “Russian Federation (Soviet Union)”. I need to create a common key for these countries to merge them, but I’m facing the following challenges:
Variations in country names: How can I handle these variations effectively to ensure accurate matching?
Incomplete country coverage: Not all countries are present in both datasets. How can I manage this discrepancy during key creation?
I’d like to establish a key mechanism that can handle this and later join these datasets on the key and year. I’m not even sure where to get started looking for answers to this, google has so far been unhelpful.
on which datqabase system should it run
It is running in SQL as of now, I’m using SSMS to interact and house the database. Both datasets were imported as CSVs.
you can rtake a look at stackoverflow.com/questions/39251168/sql-fuzzy-join-mssql
Build yourself a complete country table and for each country store multiple variations which you manually add over time as you pick up mismatches.
This gets me more or less where I need to be, thanks. I appreciate it.