Why is b.pop(0) over 200 times slower than del b[0] for bytearray?

Why is b.pop(0) over 200 times slower than del b[0] for bytearray?


24

Letting them compete three times (a million pops/dels each time):

from timeit import timeit

for _ in range(3):
    t1 = timeit('b.pop(0)', 'b = bytearray(1000000)')
    t2 = timeit('del b[0]', 'b = bytearray(1000000)')
    print(t1 / t2)

Time ratios (Try it online!):

274.6037053753368
219.38099365582403
252.08691226683823

Why is pop that much slower at doing the same thing?

Share
Improve this question

6

  • 1

    There's at least an assignment involved in .pop() but the difference is quite extraordinary

    – roganjosh

    yesterday

  • 1

    Probably has to do with the fact that pop() has to return the value, hence it has to do reference counts and also converting the return value. And Python also takes a lot of time in attribute access(b.pop –> find the attribute, and then calling it is expensive but can be improved a bit by doing pop = b.pop) and it doesn't have to worry about all that with del being a single bytecode instruction.

    – Ashwini Chaudhary

    yesterday


  • 3

    @AshwiniChaudhary No, those things don't make that much of a difference. Even pop(0) vs b[0]; del b[0]; b.pop was still over 100.

    – Kelly Bundy

    yesterday

  • this probably goes without saying but if performance is a concern you should either reverse the array order or get a deq

    – Tornado547

    9 hours ago

  • @Tornado547 No. Reversing makes it unnatural, iconfusing, inconvenient, and takes extra time. And collections.deque (if that's what you mean) is far less performant at jobs for bytearrays and is lacking features like substring search.

    – Kelly Bundy

    9 hours ago


2 Answers
2

Reset to default


34

When you run b.pop(0), Python moves all the elements back by one as you might expect. This takes O(n) time.

When you del b[0], Python simply increases the start pointer of the object by 1.

In both cases, PyByteArray_Resize is called to adjust the size. When the new size is smaller than half the allocated size, the allocated memory will be shrunk. In the del b[0] case, this is the only point where the data will be copied. As a result, this case will take O(1) amortized time.

Relevant code:

bytearray_pop_impl function: Always calls

    memmove(buf + index, buf + index + 1, n - index);

The bytearray_setslice_linear function is called for del b[0] with lo == 0, hi == 1, bytes_len == 0. It reaches this code (with growth == -1):

if (lo == 0) {
    /* Shrink the buffer by advancing its logical start */
    self->ob_start -= growth;
    /*
      0   lo               hi             old_size
      |   |<----avail----->|<-----tail------>|
      |      |<-bytes_len->|<-----tail------>|
      0    new_lo         new_hi          new_size
    */
}
else {
    /*
      0   lo               hi               old_size
      |   |<----avail----->|<-----tomove------>|
      |   |<-bytes_len->|<-----tomove------>|
      0   lo         new_hi              new_size
    */
    memmove(buf + lo + bytes_len, buf + hi,
            Py_SIZE(self) - hi);
}

Share
Improve this answer

11

  • 6

    Good answer. Do you know why developers did not choose to increment the pointer also for b.pop(0)?

    – Jérôme Richard

    yesterday

  • 17

    @JérômeRichard I've seen it many times. They don't want to optimize uncommon cases. Both to avoid slowing down common cases even slightly (might be a net loss overall!), and to avoid having extra code that needs to be written, tested, reviewed, maintained, read, may introduce errors, and may complicate things. See the original discussion of the slicing optimization for a perfect example of all that.

    – Kelly Bundy

    yesterday

  • 3

    Someone had use cases for the slicing optimization (FIFO buffers), and it was somewhat of a fight. Amusingly at some point they spoke of "popping" but weren't talking about pop() but about removing a slice. They never spoke about popping single bytes, and I do believe that is really uncommon for bytearrays. Even from the end. Especially when the array is as huge as mine. And my speed ratio is also much smaller when the array is much smaller.

    – Kelly Bundy

    yesterday


  • 2

    So I suspect nobody ever even desired bytearray's pop(0) to be optimized. I didn't, either, I have no real use for it, I only stumbled upon this when looking for something else. And if I did have a use for it, I could most likely use del instead and would just do that.

    – Kelly Bundy

    yesterday


  • 3

    @PM2Ring Hmm, I don't think so. We know why slice deletion got implemented, that's in the linked original discussion. I suspect (didn't check) that del with single index got optimized as a side effect. And optimizing pop was never even mentioned, at least in that discussion.

    – Kelly Bundy

    yesterday


14

I have to admit, I was very surprised by the timings myself. After convincing myself that they were in fact correct, I took a dive into the CPython source code, and I think I found the answer- cpython optimizes del bytearr[0:x], by just incrementing the pointer to the start of the array:

    if (growth < 0) {
        if (!_canresize(self))
            return -1;

        if (lo == 0) {
            /* Shrink the buffer by advancing its logical start */
            self->ob_start -= growth;

You can find the del bytearray[...] logic here (implemented via bytearray_setslice, with values being NULL), which in turn calls bytearray_setslice_linear, which contains the above optimization.

For comparison, bytearray.pop does NOT implement this optimization- see here in the source code.

Share
Improve this answer



Your Answer


Post as a guest

Required, but never shown


By clicking тАЬPost Your AnswerтАЭ, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged

or ask your own question.

Leave a Reply

Your email address will not be published. Required fields are marked *