Merge dataframes by the nearest lower or equal date

I have two data frames:

df1 = pd.DataFrame({'ID':['A', 'A', 'B', 'B'], 
    'Date':['31.08.2023', '12.09.2023', '13.09.2023', '20.08.2023']})

df2 = pd.DataFrame({'ID':['A', 'A', 'A', 'B', 'B'], 
        'Date':['30.08.2023', '14.09.2023', '10.09.2023', '28.09.2023', '19.08.2023']})

I want to add two new columns to df1 with the nearest higher and lower date in every ID, so the desire output in this example will look like this:

ID DATE DATE_UP DATE_DOWN
A 31.08.2023 10.09.2023 30.08.2023
A 12.09.2023 14.09.2023 10.09.2023
B 13.09.2023 28.09.2023 19.08.2023
B 20.08.2023 28.09.2023 19.08.2023

I know there is a pandas function merge_asof(), but it work only for the nearest merge.
I will be greatfull for any ideas how to do it efficiently.

Code

you can solve it by merge_asof. merge_asof doesnt work only for the nearest merge. Even nearest is not the default, the default is backward.


at first make datetime column named ‘Date1’ and sort

df1['Date1'] = pd.to_datetime(df1['Date'], dayfirst=True)
df1 = df1.sort_values('Date1')
df2['Date1'] = pd.to_datetime(df2['Date'], dayfirst=True)
df2 = df2.sort_values('Date1')

df1:

    ID  Date        Date1
3   B   20.08.2023  2023-08-20
0   A   31.08.2023  2023-08-31
1   A   12.09.2023  2023-09-12
2   B   13.09.2023  2023-09-13

df2:

    ID  Date        Date1
4   B   19.08.2023  2023-08-19
0   A   30.08.2023  2023-08-30
2   A   10.09.2023  2023-09-10
1   A   14.09.2023  2023-09-14
3   B   28.09.2023  2023-09-28

next use merge_asof forward & backward(default)

tmp = pd.merge_asof(df1.reset_index(), df2, on='Date1', by='ID', 
                    direction='forward', suffixes=['', '_up'])
out = pd.merge_asof(tmp, df2, on='Date1', by='ID', suffixes=['', '_down'])\
        .drop('Date1', axis=1).set_index('index').sort_index().rename_axis('')

out:

    ID  Date        Date_up     Date_down
0   A   31.08.2023  10.09.2023  30.08.2023
1   A   12.09.2023  14.09.2023  10.09.2023
2   B   13.09.2023  28.09.2023  19.08.2023
3   B   20.08.2023  28.09.2023  19.08.2023

Create the dataframes

import pandas as pd

df1 = pd.DataFrame({
    'ID': ['A', 'A', 'B', 'B'],
    'Date': ['31.08.2023', '12.09.2023', '13.09.2023', '20.08.2023']
})

df2 = pd.DataFrame({
    'ID': ['A', 'A', 'A', 'B', 'B'],
    'Date': ['30.08.2023', '14.09.2023', '10.09.2023', '28.09.2023', '19.08.2023']
})

Convert the ‘Date’ columns to datetime

df1['Date'] = pd.to_datetime(df1['Date'], dayfirst=True)
df2['Date'] = pd.to_datetime(df2['Date'], dayfirst=True)

Sort the dataframes by ‘ID’ and ‘Date’

df1 = df1.sort_values(by=['ID', 'Date'])
df2 = df2.sort_values(by=['ID', 'Date'])

Find the nearest higher date (DATE_UP)

df1['DATE_UP'] = df1.apply(lambda row: df2[(df2['ID'] == row['ID']) & (df2['Date'] > row['Date'])]['Date'].min(), axis=1)

Find the nearest lower date (DATE_DOWN)

df1['DATE_DOWN'] = df1.apply(lambda row: df2[(df2['ID'] == row['ID']) & (df2['Date'] < row['Date'])]['Date'].max(), axis=1)

Convert ‘Date’, ‘DATE_UP’, ‘DATE_DOWN’ back to the original string format

df1['Date'] = df1['Date'].dt.strftime('%d.%m.%Y')
df1['DATE_UP'] = df1['DATE_UP'].dt.strftime('%d.%m.%Y')
df1['DATE_DOWN'] = df1['DATE_DOWN'].dt.strftime('%d.%m.%Y')

Leave a Comment