Unexpected uint64 behaviour 0xFFFF’FFFF’FFFF’FFFF – 1 = 0?

Question

Consider the following brief numpy session showcasing uint64 data type

import numpy as np
 
a = np.zeros(1,np.uint64)
 
a
# array([0], dtype=uint64)
 
a[0] -= 1
a
# array([18446744073709551615], dtype=uint64)
# this is 0xffff ffff ffff ffff, as expected

a[0] -= 1
a
# array([0], dtype=uint64)
# what the heck?

I’m utterly confused by this last output.

I would expect 0xFFFF’FFFF’FFFF’FFFE.

What exactly is going on here?

My setup:

>>> sys.platform
'linux'
>>> sys.version
'3.10.5 (main, Jul 20 2022, 08:58:47) [GCC 7.5.0]'
>>> np.version.version
'1.23.1'

Interestingly I get a OverflowError: long too big to convert for the second decrement. What version of numpy is this, and in what environment? — yesterday
Does it matter if you make the 1 an np array of uint64 type also? — yesterday
Same as Dan, I get OverflowError: Python int too large to convert to C long — yesterday
b = np.zeros(1,np.uint64); b[0] = 1; a - b give, as one may expect, array([18446744073709551614], dtype=uint64). Similarly a[0] -= np.uint64(1) also works. I suspect the -= is allowing a conversion to the right-hand side-value type — yesterday
I'm seeing the same issue as OP. Running 3.11 on MacOS. a[0] -= 1 leaves the value at zero, while a -= 1 sets the value to FF..FF. — yesterday

score 28 · Accepted Answer · 2023-05-01 08:59:18Z

28

By default, NumPy converts Python int objects to numpy.int_, a signed integer dtype corresponding to C long. (This decision was made back in the early days when Python int also corresponded to C long.)

There is no integer dtype big enough to hold all values of numpy.uint64 dtype and numpy.int_ dtype, so operations between numpy.uint64 scalars and Python int objects produce float64 results instead of integer results. (Operations between uint64 arrays and Python ints may behave differently, as the int is converted to a dtype based on its value in such operations, but a[0] is a scalar.)

Your first subtraction produces a float64 with value -1, and your second subtraction produces a float64 with value 2**64 (since float64 doesn’t have enough precision to perform the subtraction exactly). Both of these values are out of range for uint64 dtype, so converting back to uint64 for the assignment to a[0] produces undefined behavior (inherited from C – NumPy just uses a C cast).

On your machine, this happened to produce wraparound behavior, so -1 wrapped around to 18446744073709551615 and 2**64 wrapped around to 0, but that’s not a guarantee. You might see different behavior on other setups. People in the comments did see different behavior.

edited yesterday

answered yesterday

user2357112

255k28 gold badges415 silver badges493 bronze badges

7

4

I am curious, as someone who only uses numpy occasionally, does the community feel this is a defect? This is incredibly surprising since most languages with explicitly-specified fixed with types have their behaviour defined similarly or identically to C.

– Chuu

22 hours ago
3

@Chuu: I dunno about the general community, but I'd personally prefer if mixing uint64 and signed integers produced uint64 output instead of float64 output. (C handles this specific case better, but it's got its own issues – for example, integer arithmetic on two unsigned operands can produce signed overflow and undefined behavior in C, due to how unsigned types smaller than int get promoted to signed int. I'm not sure if NumPy has any protections in place to avoid this issue.)

– user2357112

22 hours ago
2

Isn’t inheriting UB from C rather worrying, given that the C standard allows a program that invokes UB to do literally anything? Admittedly, most (all?) of the fun stuff that has surfaced lately, like not generating any code at all for an execution path that always invokes UB, hinges on the UB being detected at compile time, which this won’t be, but still…

– Ture Pålsson

21 hours ago
@TurePålsson: "Isn’t inheriting UB from C rather worrying" – absolutely! The behavior we're seeing here already sucks, but it could be arbitrarily worse some day, and if it does get worse, there's no telling how long the problem might pass unnoticed.

– user2357112

20 hours ago
@TurePålsson: Yup. Fortunately real-world C implementations will produce a value for out-of-range FP to int conversions (at least when the the value being converted is a run-time variable that might be in range), Some compilers even go as far as officially defining the behaviour, such as MSVC: devblogs.microsoft.com/cppblog/… (article re: changing the out-of-range conversion result for FP-to-unsigned to match AVX-512 instructions which produce all-ones, 0xfff… in that case.)

– Peter Cordes

17 hours ago

|
Show 2 more comments

Kelly BundyKelly Bundy 22.2k7 gold badges29 silver badges62 bronze badges · Accepted Answer · 2023-04-30 18:17:44Z

8

a[0] - 1 is 1.8446744073709552e+19, a numpy.float64. That can’t retain all the precision, so its value is 18446744073709551616=2⁶⁴. Which, when written back into a with dtype np.uint64, becomes 0.

answered yesterday

Kelly Bundy

22.2k7 gold badges29 silver badges62 bronze badges

2

7

NumPy floating->integer conversion with a floating point value out of the integer type's range is undefined behavior (inherited from C), so there's no guarantee the result is actually 0 – it could be anything.

– user2357112

yesterday
1

@user2357112 Yeah I had actually written a second paragraph about that, also noting harold's different result, but then decided to keep this brief, just explaining the observed behavior of the OP.

– Kelly Bundy

23 hours ago

Add a comment
|

Unexpected uint64 behaviour 0xFFFF’FFFF’FFFF’FFFF – 1 = 0?