How do I get Delta Live Tables to read my nested JSON correctly?

Question

I am using Databricks DLT. My DBR is ML 13.3 LTS

I have tried giving it different schemas:

Option 1:

json_schema = StructType(
  [
    StructField(name="someId", dataType=IntegerType(), nullable=False),
    StructField("parameterSetId", StringType(), False),
    StructField("data", ArrayType(StructType([
        StructField("key", IntegerType()),
        StructField("key2", StringType()),
        StructField("beginDate", TimestampType()),
        StructField("endDate", TimestampType()),
        StructField("data", ArrayType(StructType([
            StructField("parameter", IntegerType()),
            StructField("value", IntegerType()),
            StructField("value2", IntegerType()),
        ]))),
    ])), False),
  ]
)

Option 2:

json_schema = StructType(
  [
    StructField(name="someId", dataType=IntegerType(), nullable=False),
    StructField("parameterSetId", StringType(), False),
    StructField("data", StringType(), False),
  ]
)

No matter what, I get this same error with a slightly modified Declared schema:.

org.apache.spark.sql.AnalysisException: Table 'raw_data' has a user-specified schema that is incompatible with the schema
 inferred from its query.
"
Streaming tables are stateful and remember data that has already been
processed. If you want to recompute the table from scratch, please full refresh
the table.
              

Declared schema:
root
-- someId: integer (nullable = false)
-- parameterSetId: string (nullable = false)
-- data: array (nullable = false)
    |-- element: struct (containsNull = true)
    |    |-- key: integer (nullable = true)
    |    |-- key2: string (nullable = true)
    |    |-- beginDate: timestamp (nullable = true)
    |    |-- endDate: timestamp (nullable = true)
    |    |-- data: array (nullable = true)
    |    |    |-- element: struct (containsNull = true)
    |    |    |    |-- parameter: integer (nullable = true)
    |    |    |    |-- value: integer (nullable = true)
    |    |    |    |-- value2: integer (nullable = true)


Inferred schema:
root
-- value: string (nullable = true)
-- parameter: string (nullable = true)
-- value2: string (nullable = true)
-- _rescued_data: string (nullable = true)

Leave a Comment Cancel reply