How to make same pattern multiple joins in Trino SQL shorter?

Background

I have a Delta table motor which has too many columns.

Due to AWS Glue Catalog schema size 400KB limit, I have to split the table.

To simplify the question, for example, before I have columns timestamp, col0, col1, col2, col3, … , col79.

After splitting to 8 tables, I have tables

  • motor_0: timestamp, col0, col1, …, col9
  • motor_1: timestamp, col10, col11, …, col19
  • motor_7: timestamp, col70, col71, …, col79

Note the same timestamp column is reserved across all tables.

Question

Now, I am using Trino SQL to join these tables. Here is my current SQL:

with t0 as (
    select * from delta.my_delta_db.motor_0
    order by timestamp desc limit 1000
),
t1 as (
    select * from delta.my_delta_db.motor_1
    order by timestamp desc limit 1000
),
t2 as (
    select * from delta.my_delta_db.motor_2
    order by timestamp desc limit 1000
),
t3 as (
    select * from delta.my_delta_db.motor_3
    order by timestamp desc limit 1000
),
t4 as (
    select * from delta.my_delta_db.motor_4
    order by timestamp desc limit 1000
),
t5 as (
    select * from delta.my_delta_db.motor_5
    order by timestamp desc limit 1000
),
t6 as (
    select * from delta.my_delta_db.motor_6
    order by timestamp desc limit 1000
),
t7 as (
    select * from delta.my_delta_db.motor_7
    order by timestamp desc limit 1000
),
select * from t0
     join t1 on t0.timestamp = t1.timestamp
     join t2 on t0.timestamp = t2.timestamp
     join t3 on t0.timestamp = t3.timestamp
     join t4 on t0.timestamp = t4.timestamp
     join t5 on t0.timestamp = t5.timestamp
     join t6 on t0.timestamp = t6.timestamp
     join t7 on t0.timestamp = t7.timestamp
order by t0.timestamp desc limit 1000

The reason I am adding order by timestamp desc limit 1000 in each with block is to make each Trino sub-task output data size small. This helps avoid joining the total number of rows for each join.

I am wondering if there is a way to make this query shorter but also not lose performance?

I found Trino has a WITH RECURSIVE, but seems not applying to my case (?)

Any guide would be appreciate. Thanks!

  • It sounds like you’re trying to deal with a poor or possibly denormalised design, have you created them with a clustered index on the timestamp column? What do you mean by “make the query shorter” ? Less text? The written size of a query has no correlation with its performance once compiled.

    – 

  • Thanks @Stu for reminding! There would be z-index on timestamp column in this Delta table. Yeah, just hopefully less text so that I don’t have to repeat same text. But if can further improve performance in another SQL, that will be even better! 😃

    – 




  • I’m not familiar with trino so can’t suggest and product specific tuning, however if you can use temporary tables perhaps insert your 1000 rows from t0 and then join with the temp table? Normally with a clustered index seek you won’t need to use CTEs like this just join the tables.

    – 




Leave a Comment