I am using Databricks DLT. My DBR is ML 13.3 LTS
I have tried giving it different schemas:
Option 1:
json_schema = StructType(
[
StructField(name="someId", dataType=IntegerType(), nullable=False),
StructField("parameterSetId", StringType(), False),
StructField("data", ArrayType(StructType([
StructField("key", IntegerType()),
StructField("key2", StringType()),
StructField("beginDate", TimestampType()),
StructField("endDate", TimestampType()),
StructField("data", ArrayType(StructType([
StructField("parameter", IntegerType()),
StructField("value", IntegerType()),
StructField("value2", IntegerType()),
]))),
])), False),
]
)
Option 2:
json_schema = StructType(
[
StructField(name="someId", dataType=IntegerType(), nullable=False),
StructField("parameterSetId", StringType(), False),
StructField("data", StringType(), False),
]
)
No matter what, I get this same error with a slightly modified Declared schema:
.
org.apache.spark.sql.AnalysisException: Table 'raw_data' has a user-specified schema that is incompatible with the schema
inferred from its query.
"
Streaming tables are stateful and remember data that has already been
processed. If you want to recompute the table from scratch, please full refresh
the table.
Declared schema:
root
-- someId: integer (nullable = false)
-- parameterSetId: string (nullable = false)
-- data: array (nullable = false)
|-- element: struct (containsNull = true)
| |-- key: integer (nullable = true)
| |-- key2: string (nullable = true)
| |-- beginDate: timestamp (nullable = true)
| |-- endDate: timestamp (nullable = true)
| |-- data: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- parameter: integer (nullable = true)
| | | |-- value: integer (nullable = true)
| | | |-- value2: integer (nullable = true)
Inferred schema:
root
-- value: string (nullable = true)
-- parameter: string (nullable = true)
-- value2: string (nullable = true)
-- _rescued_data: string (nullable = true)