Python read AVRO embedded into PCAP

I have a PCAP file that contains AVRO encoded data as a payload in the TCP packet. For test test purpose I have converted mentioned payload into binary file using xxd -r -p test.hex test.bin. (Later on I will use scappy to work with PCAP). Code below generates error “AssertionError: -29”.

I have a valid schema and viewing the binary file I see the expected first field “1.0.0”:

00000000    00 00 00 01 00 06 02 04 96 9e 77 35 00 00 08 00     ..........w5....
00000010    45 00 01 ea 19 f6 40 00 3c 06 2a 59 83 a0 aa 7e     E....@.<.*Y...~
00000020    0a 2d c0 73 92 16 37 ca c9 28 f5 36 42 80 5f 1c     .-.s.7..(.6B._
00000030    80 18 00 e5 ab 64 00 00 01 01 08 0a 03 67 d9 fd     ....d.......g..
00000040    65 e1 ee 11 01 b6 03 00 00 00 00 00 00 00 02 0a     e...............
00000050    31 2e 30 2e 30 0a 32 30 2e 51 32 26 43 58 50 39     1.0.0

Environment: “image”: “mcr.microsoft.com/devcontainers/python:1-3.11-buster”

import avro
from avro.io import DatumReader, BinaryDecoder
from avro.datafile import DataFileReader
import io

schema = avro.schema.parse(open("test.avsc").read())
reader = DatumReader(schema)

f = open("test.bin", mode="rb")
f.seek(79)
raw_bytes = f.read()
print(raw_bytes)

buff = io.BytesIO(raw_bytes)
elements = DataFileReader(buff, reader)

Expectation was to read AVRO encoded data as I have a valid schema.

If I change f.seek(74) I get different error “avro.schema.AvroException: Not an Avro data file: b’\x00\x00\x00\x00′ doesn’t match b’Obj\x01′.” That I suppose is expected as my binary file does not have AVRO schema embedded in the header. Already spent 2 days trying and googling.

I know there’s a way to do this in the standard avro library, but I maintain the python library fastavro and so I’m less familiar with the standard avro library. To do this in fastavro you should be able to do the following:

import fastavro
import json

with open("test.avsc") as schema_file:
    schema = json.load(schema_file)

with open("test.bin", mode="rb") as fp:
    record = fastavro.schemaless_reader(fp, schema)

Note: There might be some typos or something might be slightly wrong. I’m typing this directly in the answer without checking it because I don’t have the header to verify it works.

Leave a Comment