I am trying to use Azure Databricks to build a dashboard, using SQl query. I know that Azure Databricks uses pure ANSI SQL dialect, so some specific functions are not available.
My table has 2 columns: entry_dt of type timestamp (like “2022-02-15T05:11:22Z”, etc), and site_id (string values like “SITE1”, “SITE2″,…,”SITE22”).
In my Databricks dashboard I need to have a parameter for selecting a time range over entry_dt, a counter of unique values of site_id (Output should be a single number, like 55), a list of unique values of site_id (Output should be a single list, like “SITE1, SITE22, SITE45”), and a plot showing date (day actually, so I use function date(entry_dt) to eliminate time part) vs count of rows in this day.
Each thing by itself is easy, but how to combine it into one query, which shares the same selected time interval, defined by parameters (entry_dt_range.start and entry_dt_range.end in my case)? I don’t want to select time interval several times in the dashboard, for each query separately, as it doesn’t look good on the dashboard.
Query for counting unique site_id and making a list of them:
SELECT
count(DISTINCT site_id) AS cnt_distinct
,CONCAT_WS(', ', ARRAY_AGG(DISTINCT site_id )) AS distinct_site_ids
FROM
test_data
WHERE
entry_dt BETWEEN TIMESTAMP '{{ entry_dt_range.start }}'
AND TIMESTAMP '{{ entry_dt_range.end }}';
and query for making a table for the plot:
SELECT date(entry_dt)
,count(site_id) as cnt
FROM
test_data
WHERE
entry_dt BETWEEN TIMESTAMP '{{ entry_dt_range.start }}'
AND TIMESTAMP '{{ entry_dt_range.end }}'
GROUP BY date(entry_dt)
SORT BY entry_dt;