I have a table with records identified by a combination of email and phone. The goal is to handle changes to these identifiers, where initially incomplete records are later completed, and sometimes records need to be merged due to identifying the same person. I have dependent tables that reference these changing IDs, and I want to minimize direct updates to the dependent tables to maintain data integrity.
Person:
id | phone | |
---|---|---|
1 | NULL | 123 |
2 | [email protected] | NULL |
A new record comes in:
phone | |
---|---|
[email protected] | 123 |
Records with IDs 1 and 2 are the same due to matching email and phone. So I want to update the table, by creating a new id or merging it. I need to update the dependent tables that reference these IDs.
Order
order_id | person_id | product |
---|---|---|
100 | 1 | book |
101 | 2 | pen |
How do I handle this with minimal direct updates to the dependent tables, maintaining data consistency, performance, and maintainability?
I’ve considered surrogate keys, mapping tables, and historical data.
Where & how are you 1st stuck/unsure, following what information modeling & DB design method, from what reference? How are you stuck deciding, applying what criteria? Why is asking a question on “best practice” a bad thing? Strategy for “Which is better” questions How to Ask Help center