I have this data:
ARRIVAL,ITEM_TYPE,ITEM
1,0,Cat
2,0,Dog
3,1,Horse
4,1,Cow
5,0,Fish
6,0,Barn
7,0,Potato
I would like to query for this result:
0,2
1,2
0,3
… showing that in order by ARRIVAL there was a group of 2 items with ITEM_TYPE = 0, then 2 with ITEM_TYPE = 1, then another group with ITEM_TYPE = 0. The goal is to provide information about the arrival in a system of similar groups of records; Cat and Dog are in the same ITEM_TYPE as Fish, Barn, and Potato, but they didn’t arrive in the same time-series group. I can’t figure out how to use COUNT,DISTINCT,and GROUP BY in a way that doesn’t collapse the two groups of zeroes into a single line like this:
0,5
I will appreciate any help.
— Bob
You have a gaps and islands problem here, you could use the difference between two row_numbers approach to solving it :
SELECT ITEM_TYPE, COUNT(*) AS CNT
FROM (
SELECT *,
ROW_NUMBER() OVER (ORDER BY ARRIVAL)
- ROW_NUMBER() OVER (PARTITION BY ITEM_TYPE ORDER BY ARRIVAL) AS grp
FROM mytable
) t
GROUP BY grp, ITEM_TYPE
ORDER BY ARRIVAL;
Results :
ITEM_TYPE CNT
0 2
1 2
0 3
We should use window function as below:
CREATE TABLE mytable (ARRIVAL INT, ITEM_TYPE INT,ITEM VARCHAR);
INSERT INTO mytable(ARRIVAL, ITEM_TYPE,ITEM)
VALUES
(1,0,'Cat'),
(2,0,'Dog'),
(3,1,'Horse'),
(4,1,'Cow' ),
(5,0,'Fish'),
(6,0,'Barn'),
(7,0,'Potato');
WITH cte as (
SELECT ARRIVAL, ITEM_TYPE,ITEM
, CASE WHEN ITEM_TYPE = (lag ( ITEM_TYPE) OVER ( ORDER BY ARRIVAL ) ) OR (lag ( ITEM_TYPE) OVER ( ORDER BY ARRIVAL ) ) IS NULL
THEN 1 ELSE 0 END check_lag
FROM mytable
)
SELECT MIN(ITEM_TYPE) ITEM_TYPE, COUNT(1)
FROM (
SELECT ARRIVAL, ITEM_TYPE,ITEM, ROW_NUMBER() OVER(ORDER BY ARRIVAL)
- SUM(check_lag) OVER(ORDER BY ARRIVAL ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grp
FROM cte
) sub
GROUP BY grp
ORDER BY grp;
OUTPUT: