Take certain fields from kafka message

There is a producer who is producing and sending kafka feed to my consumer. It’s all set up and I can see the data. I am using databricks so the processing is happening in pyspark and I am creating a dataframe and each cell has value which is json.

This is part of an example. The whole example is to big to show here.

{
“resourceType”: “Bundle”,
“id”: “”,
“meta”: {
“lastUpdated”: “”,
“profile”: [
“”
],
“tag”: [
{
“code”: “REASON”,
“display”: “UPDATED”
},
{
“code”: “”,
“display”: “”
}
]
},
“entry”: [
{
“resource”: {
“resourceType”: “”,
“identifier”: [
{
“extension”: [
{
“url”: “”,
“valueMetadata”: {
“modifiedDateTime”: “”,
“sourceSystemCd”: “”
}
}
],
“use”: “official”,
“type”: {
“coding”: [
{
“code”: “”,
“display”: “”
}
],
“text”: ”
},
“value”: “”,
“period”: {
“start”: “”
},
“assigner”: {
“display”: “”
},
“characteristic”: [
{
“name”: “Last Update Date”
}

I only want fields like address, gender, name etc. how do I do that?

Some records have multiple addresses as well. How do I create a schema for that? and cast the data to that schema so I can create a nice organized table.

  • Does this answer your question? Pyspark: Parse a column of json strings

    – 

Leave a Comment