Select a sample of records distributed over several distinct values

I have a table like:

Name Product Code Payment Type Member
Bob Code A Type 2 Yes
Kat Code B Type 2 Yes
Greg Code C Type 1 No
David Code A Type 1 Yes
Simon Code A Type 2 Yes

etc. Let’s say there’s 10k rows in this table.

Across this table, Name is the only unique column. I would like to select, for example, 120 rows, but in such a way that the 120 is split as evenly across the possible distinct combinations of Product Code, Payment Type, and Member as possible.

I.e. assuming there are 3 different Product Codes, 2 different Payment Types, and 2 different Member values (resulting in 12 distinct combinations), the result would be:

  • 10 with Code A, Type 1, Yes
  • 10 with Code A, Type 1, No
  • 10 with Code A, Type 2, Yes
  • 10 with Code A, Type 2, No
  • 10 with Code B, Type 1, Yes
  • 10 with Code B, Type 1, No
  • 10 with Code B, Type 2, Yes
  • 10 with Code B, Type 2, No
  • 10 with Code C, Type 1, Yes
  • 10 with Code C, Type 1, No
  • 10 with Code C, Type 2, Yes
  • 10 with Code C, Type 2, No

I need to do this without using WITH or any @variables or stored procedures etc.

  • What is your DBMS? SQL Server? I suggest you add an appropriate tag for greater visibility. What version? If SQL Server, run the command SELECT @@version and add the version from the first row to your question.

    – 

I am going to assume the latest version of SQL Server for now.

You can get what you want by using the ROW_NUMBER() window function to assign sequence numbers withing each group of rows having the same [Product Code], [Payment Type], and [Member], sorting by that sequence number, and then selecting the TOP 120 rows.

This will first select all of the “1” rows across all distinct [Product Code], [Payment Type], and [Member] combinations, then all of the “2” rows, etc. This should yield a roughly equal distribution. If one set runs out, the others will fill the difference.

Your query would look something like:

SELECT TOP 120 *
FROM (
    SELECT
        *,
        ROW_NUMBER() OVER(
            PARTITION BY [Product Code], [Payment Type], Member
            ORDER BY (SELECT NULL) -- Arbitrary order withing the partition
            ) AS RowNum 
    FROM Data
) D
ORDER BY D.RowNum

See this db<>fiddle.

Leave a Comment