How to write integers, not integers-as-strings, in Python

I need to create file of 10,000 random integers for testing. I will be using the file in Python and C, so I can’t have the data represented as strings because I don’t want the extra overhead of integer conversion in C.

In Python I can use struct.unpack to convert the file to integer, but I can’t use the write() method to write that to a file for use in C.

Is there any way in Python to write just integers, not integers-as-strings, to a file? I have used print(val, file=f) and f.write(str(val)), but in both cases it writes a string.

Here is where I am now:

file_root = "[ file root ]"

file_name = file_root + "Random_int64"

if os.path.exists(file_name):
    f = open(file_name, "wb")
    f.seek(0)

for _ in range(10000):
    val = random.randint(0, 10000)
    f.write(bytes(val))

f.close()
f = open(file_name, "rb")

wholefile = f.read()
struct.unpack(wholefile, I)

My unpack format string is wrong, so I am working on that now. I’m not that familiar with struct.unpack.

  • 2

    You can use modules struct, array or ctypes to create a bytes object from int data. You can then write this object to a file.

    – 




  • 3

    This answer explains how to write bytes to file in python. You need to open your files in binary mode.

    – 

  • 2

    Plain text is, by far, the most portable format when using a file in different environments. You might want to rethink if a slight reduction in overhead is worth all the headaches you will certainly have by wrestling with binary formats.

    – 

  • 2

    I don’t know. Experiment and find out. (This is exactly the kind of headache I meant. If the file were plain text, then I know how both environments would see it. But binary? I haven’t a clue.)

    – 

  • 2

    If you use unpack to read the file, why do you not use pack to write the file, and use the same (whatever) format in both cases?

    – 




bytes(val), when val is an int, creates a bytes object of the length specified. If your random number is 12345, you are writing 12345 zeros, not the number. The trick is to pack and then write each integer.

From the struct module Byte Order, Size, and Alignment section, “<” writes bytes “little endian” (the byte order used by Intel/AMD). The next character could be “L” to wirte 4 byte unsigned long integers or “Q” to write 8 bytes. 4 is plenty big for your range of characters and produces a smaller file, but 8 is more “future proof” if you want to larger values in the future.

Assuming you want no repeats in the random numbers, you can create a list of integers, shuffle them, then write to a file one by one. Make sure to open a binary file so that there is no encoding done.

With a bit more cleanup you get

import random
import struct

file_root = "testfile"
file_name = file_root + "Random_int64"

with open(file_name, "wb") as f:
    for _ in range(10000):
        f.write(struct.pack("<Q", random.randint(0, 10000)))

You could also use a bytearray and packinto to build the buffer first and write once.

import random
import struct

file_root = "testfile"
file_name = file_root + "Random_int64"

buf = bytearray(10000*8)
for offset in range(10000*8, 8):
    struct.pack_into(buf, "<Q", offset, random.randint(0, 10000))

with open(file_name, "wb") as f:
    f.write(buf)

And if you don’t mind using packages outside of the standard library, numpy has the classic

import numpy as np
np.random.randint(10000, size=10000).tofile("test.bin")

If we are placing bets on performance, that’s where I’d go.

Leave a Comment