Mathematics, philosophy, code, travel and everything in between. More about me…

I write about

The difference between bytearray and bytes in Python

Python 3 has two classes representing raw data: bytes and bytearray. At a cursory glance they seem very similar. However, there is a difference that becomes crucial in certain applications.

bytes, like str, is an immutable sequence of bytes. bytearray is mutable.

One case where this matters is when we’re dealing with I/O operations, and thus buffering. For example, we may be receiving data over the network and waiting for message headers and terminators to appear in the stream before we can parse the message. So we keep adding incoming bytes to a buffer.

Using bytes we can achieve this with the following (pseudo) code:

buffer = b''
while message_not_complete(buffer):
    buffer += read_from_socket()

However, there is a significant cost we’re paying for each addition to the buffer. Since bytes is an immutable type, every time we append more bytes to buffer Python has to allocate the variable as the concatenation of buffer and the return value of read_from_socket. Concatenation is slow in Python and it shows when you’re processing high volume of data.

The bytearray implementation of buffering looks very similar:

buffer = bytearray()
while message_not_complete(buffer):

Yet this slight modification is orders of magnitude faster. Because bytearray is mutable, it can be treated similarly to list. It even has similar methods, append and extend, and they both perform much better than concatenation of bytes. Here’s a quick test:

In [1]: %%timeit x = b''
x += b'x'
100000 loops, best of 3: 3.02 µs per loop

In [2]: %%timeit x = bytearray()
10000000 loops, best of 3: 152 ns per loop

Notice that bytearray was 20× faster in appending bytes in this test. I recently had this experience first hand when writing custom network I/O code and mistakenly using bytes for buffering.

For further reading check out A few useful bytearray tricks and this Stack Overflow question.

September 29, MMXIV — Python.