I want to extract all JSON objects from this text file and create a dictionary. As you can see, in my text there are nested objects as a key value

text = Autotune exists! Hoorah! You can use microbolus-related features. {"iob":0.121,
"activity":0.0079,
"basaliob":-1.447,
"bolusiob":1.568,
"netbasalinsulin":-1.9,
"bolusinsulin":6.5,
"time":"2022-12-25T21:17:45.000Z",

"iobWithZeroTemp":
{"iob":0.121,
"activity":0.0079,
"basaliob":-1.447,
"bolusiob":1.568,
"netbasalinsulin":-1.9,
"bolusinsulin":6.5,
"time":"2022-12-25T21:17:45.000Z"},
"lastBolusTime":1671999216000,

"lastTemp":
{"rate":0,
"timestamp":"2022-12-25T23:56:14+03:00",
"started_at":"2022-12-25T20:56:14.000Z",
"date":1672001774000,
"duration":22.52}}
# Regular expression pattern to match nested JSON objects
pattern = r'(?<=\{)\s*[^{]*?(?=[\},])'


matches = re.findall(pattern, text)


parsed_objects = [json.loads(match) for match in matches]


for obj in parsed_objects:
    print(obj)

JSONDecodeError: Extra data: line 1 column 6 (char 5)

  • 2

    This is hopeless. You can easily enough strip out any text before an initial {, but scanning random text looking for JSON is not trivial.

    – 

  • Your JSON has nested objects. The regexp only matches objects with no nesting.

    – 

  • Using lookarounds to match the { and } means you’ll just get the middle of the object. But that’s not valid JSON by itself.

    – 

  • @Barmar can you help with pattern?

    – 

  • No, this is not an appropriate use of regexp, for the reason @TimRoberts explained.

    – 

Here is an attempt to get all valid JSON dicts from text using JSONDecoder.raw_decode():

text = """\
text = Autotune exists! Hoorah! You can use microbolus-related features. {"iob":0.121,
"activity":0.0079,
"basaliob":-1.447,
"bolusiob":1.568,
"netbasalinsulin":-1.9,
"bolusinsulin":6.5,
"time":"2022-12-25T21:17:45.000Z",

"iobWithZeroTemp":
{"iob":0.121,
"activity":0.0079,
"basaliob":-1.447,
"bolusiob":1.568,
"netbasalinsulin":-1.9,
"bolusinsulin":6.5,
"time":"2022-12-25T21:17:45.000Z"},
"lastBolusTime":1671999216000,

"lastTemp":
{"rate":0,
"timestamp":"2022-12-25T23:56:14+03:00",
"started_at":"2022-12-25T20:56:14.000Z",
"date":1672001774000,
"duration":22.52}}

This is some other text with { not valid JSON }

{"another valid JSON object": [1, 2, 3]}
"""

import json

decoder = json.JSONDecoder()

decoded_objs, idx = [], 0
while True:
    try:
        idx = text.index("{", idx)
    except ValueError:
        break

    while True:
        try:
            obj, new_idx = decoder.raw_decode(text[idx:])
            decoded_objs.append(obj)
            idx += new_idx
        except json.decoder.JSONDecodeError:
            idx += 1
            break


print(decoded_objs)

Prints:

[
    {
        "iob": 0.121,
        "activity": 0.0079,
        "basaliob": -1.447,
        "bolusiob": 1.568,
        "netbasalinsulin": -1.9,
        "bolusinsulin": 6.5,
        "time": "2022-12-25T21:17:45.000Z",
        "iobWithZeroTemp": {
            "iob": 0.121,
            "activity": 0.0079,
            "basaliob": -1.447,
            "bolusiob": 1.568,
            "netbasalinsulin": -1.9,
            "bolusinsulin": 6.5,
            "time": "2022-12-25T21:17:45.000Z",
        },
        "lastBolusTime": 1671999216000,
        "lastTemp": {
            "rate": 0,
            "timestamp": "2022-12-25T23:56:14+03:00",
            "started_at": "2022-12-25T20:56:14.000Z",
            "date": 1672001774000,
            "duration": 22.52,
        },
    },
    {"another valid JSON object": [1, 2, 3]},
]

Leave a Comment