How can I perform an anti-join (or where clause filtering) in SNOSQL to find rows where a field doesn’t contain values from another table?

I have two tables with the following schemas and data

(
sale_id VARCHAR(18),
list_of_email_addresses VARCHAR(8000)
);

INSERT INTO sales (sale_id, list_of_email_addresses) VALUES ('12345', '[email protected], [email protected], [email protected]');
INSERT INTO sales (sale_id, list_of_email_addresses) VALUES ('12346', '[email protected], [email protected], [email protected]');
INSERT INTO sales (sale_id, list_of_email_addresses) VALUES ('12347', '[email protected], [email protected], [email protected]');

CREATE TABLE accounts (
account_id VARCHAR(18),
domain_name VARCHAR(255));

INSERT INTO accounts ('A-000001', 'canada.ca');
INSERT INTO accounts ('A-000002', 'thisfuntime.com');
INSERT INTO accounts ('A-000003', 'bang.org');
INSERT INTO accounts ('A-000004', 'snug.net');

I want to identify records out of the sales table that do not have an account where the business_domain_name partially matches, for example I would like to find all of the sales records that don’t have a domain_name field that matches, i.e. If I can do the join successfully, it would return

'12346', '[email protected], [email protected], [email protected]'

Note: We wouldn’t return the sales row for sales_id = 12345, b/c snug.net matches account A-000004, because there isn’t any account with either ‘muse.com’ or ‘muse.net’ – so it’s an ANY match, not an ALL.

And we could ask the inverse question, which accounts don’t have any sales associated with them, which would return

'A-000001', 'canada.ca'
'A-000002', 'thisfuntime.com'

I can’t do this with an LIKE ANY because I that will only match on a single record at a time, and it will only work w/ single records.

It is not an option to take the sales table and explode it so we get one row per email address, from the above example I want to exclude any of the sales that would match sale_id = ‘12345’, because I’m trying to exclude any sale to snugs, and all of those email addresses are associated with the same company- but the underlying data model that I have to work with is broken- each of those ‘snug’ accounts are associated with different companies, even though they are in fact the same company.

Does that make sense?

I would use a not exists to filter out records using some regex and array functions.

select *
from sales t1
where not exists(select * 
                 from accounts t2 
                 where array_contains(t2.domain_name::variant, regexp_extract_all(t1.emails,'@([^,]+)',1,1,'e')))

Suggest you do flatten/explode to one row per list item. Then accumulate the subset of sales ids with an account match. Finally, select from sales where sales id not in the subset list. All can be done in a CTE query. A similar query will work for accounts with no sales.

Leave a Comment