Why is b.pop(0) over 200 times slower than del b[0] for bytearray?

Question

Letting them compete three times (a million pops/dels each time):

from timeit import timeit

for _ in range(3):
    t1 = timeit('b.pop(0)', 'b = bytearray(1000000)')
    t2 = timeit('del b[0]', 'b = bytearray(1000000)')
    print(t1 / t2)

Time ratios (Try it online!):

274.6037053753368
219.38099365582403
252.08691226683823

Why is pop that much slower at doing the same thing?

There's at least an assignment involved in .pop() but the difference is quite extraordinary — yesterday
Probably has to do with the fact that pop() has to return the value, hence it has to do reference counts and also converting the return value. And Python also takes a lot of time in attribute access(b.pop –> find the attribute, and then calling it is expensive but can be improved a bit by doing pop = b.pop) and it doesn't have to worry about all that with del being a single bytecode instruction. — yesterday
@AshwiniChaudhary No, those things don't make that much of a difference. Even pop(0) vs b[0]; del b[0]; b.pop was still over 100. — yesterday
this probably goes without saying but if performance is a concern you should either reverse the array order or get a deq — 9 hours ago
@Tornado547 No. Reversing makes it unnatural, iconfusing, inconvenient, and takes extra time. And collections.deque (if that's what you mean) is far less performant at jobs for bytearrays and is lacking features like substring search. — 9 hours ago

Toby Speight 26.1k47 gold badges64 silver badges98 bronze badges · Accepted Answer · 2023-02-07 11:01:38Z

When you run b.pop(0), Python moves all the elements back by one as you might expect. This takes O(n) time.

When you del b[0], Python simply increases the start pointer of the object by 1.

In both cases, PyByteArray_Resize is called to adjust the size. When the new size is smaller than half the allocated size, the allocated memory will be shrunk. In the del b[0] case, this is the only point where the data will be copied. As a result, this case will take O(1) amortized time.

Relevant code:

bytearray_pop_impl function: Always calls

    memmove(buf + index, buf + index + 1, n - index);

The bytearray_setslice_linear function is called for del b[0] with lo == 0, hi == 1, bytes_len == 0. It reaches this code (with growth == -1):

if (lo == 0) {
    /* Shrink the buffer by advancing its logical start */
    self->ob_start -= growth;
    /*
      0   lo               hi             old_size
      |   |<----avail----->|<-----tail------>|
      |      |<-bytes_len->|<-----tail------>|
      0    new_lo         new_hi          new_size
    */
}
else {
    /*
      0   lo               hi               old_size
      |   |<----avail----->|<-----tomove------>|
      |   |<-bytes_len->|<-----tomove------>|
      0   lo         new_hi              new_size
    */
    memmove(buf + lo + bytes_len, buf + hi,
            Py_SIZE(self) - hi);
}

Good answer. Do you know why developers did not choose to increment the pointer also for b.pop(0)? — yesterday
@JérômeRichard I've seen it many times. They don't want to optimize uncommon cases. Both to avoid slowing down common cases even slightly (might be a net loss overall!), and to avoid having extra code that needs to be written, tested, reviewed, maintained, read, may introduce errors, and may complicate things. See the original discussion of the slicing optimization for a perfect example of all that. — yesterday
Someone had use cases for the slicing optimization (FIFO buffers), and it was somewhat of a fight. Amusingly at some point they spoke of "popping" but weren't talking about pop() but about removing a slice. They never spoke about popping single bytes, and I do believe that is really uncommon for bytearrays. Even from the end. Especially when the array is as huge as mine. And my speed ratio is also much smaller when the array is much smaller. — yesterday
So I suspect nobody ever even desired bytearray's pop(0) to be optimized. I didn't, either, I have no real use for it, I only stumbled upon this when looking for something else. And if I did have a use for it, I could most likely use del instead and would just do that. — yesterday
@PM2Ring Hmm, I don't think so. We know why slice deletion got implemented, that's in the linked original discussion. I suspect (didn't check) that del with single index got optimized as a side effect. And optimizing pop was never even mentioned, at least in that discussion. — yesterday

Dillon DavisDillon Davis 5,9602 gold badges14 silver badges37 bronze badges · Accepted Answer · 2023-02-05 19:24:56Z

I have to admit, I was very surprised by the timings myself. After convincing myself that they were in fact correct, I took a dive into the CPython source code, and I think I found the answer- cpython optimizes del bytearr[0:x], by just incrementing the pointer to the start of the array:

    if (growth < 0) {
        if (!_canresize(self))
            return -1;

        if (lo == 0) {
            /* Shrink the buffer by advancing its logical start */
            self->ob_start -= growth;

You can find the del bytearray[...] logic here (implemented via bytearray_setslice, with values being NULL), which in turn calls bytearray_setslice_linear, which contains the above optimization.

For comparison, bytearray.pop does NOT implement this optimization- see here in the source code.

Why is b.pop(0) over 200 times slower than del b[0] for bytearray?