There is a producer who is producing and sending kafka feed to my consumer. It’s all set up and I can see the data. I am using databricks so the processing is happening in pyspark and I am creating a dataframe and each cell has value which is json.
This is part of an example. The whole example is to big to show here.
{
“resourceType”: “Bundle”,
“id”: “”,
“meta”: {
“lastUpdated”: “”,
“profile”: [
“”
],
“tag”: [
{
“code”: “REASON”,
“display”: “UPDATED”
},
{
“code”: “”,
“display”: “”
}
]
},
“entry”: [
{
“resource”: {
“resourceType”: “”,
“identifier”: [
{
“extension”: [
{
“url”: “”,
“valueMetadata”: {
“modifiedDateTime”: “”,
“sourceSystemCd”: “”
}
}
],
“use”: “official”,
“type”: {
“coding”: [
{
“code”: “”,
“display”: “”
}
],
“text”: ”
},
“value”: “”,
“period”: {
“start”: “”
},
“assigner”: {
“display”: “”
},
“characteristic”: [
{
“name”: “Last Update Date”
}
I only want fields like address, gender, name etc. how do I do that?
Some records have multiple addresses as well. How do I create a schema for that? and cast the data to that schema so I can create a nice organized table.
Does this answer your question? Pyspark: Parse a column of json strings