Consider the following brief numpy session showcasing uint64
data type
import numpy as np
a = np.zeros(1,np.uint64)
a
# array([0], dtype=uint64)
a[0] -= 1
a
# array([18446744073709551615], dtype=uint64)
# this is 0xffff ffff ffff ffff, as expected
a[0] -= 1
a
# array([0], dtype=uint64)
# what the heck?
I’m utterly confused by this last output.
I would expect 0xFFFF’FFFF’FFFF’FFFE.
What exactly is going on here?
My setup:
>>> sys.platform
'linux'
>>> sys.version
'3.10.5 (main, Jul 20 2022, 08:58:47) [GCC 7.5.0]'
>>> np.version.version
'1.23.1'
11
2 Answers
2
Reset to default
Highest score (default)
Trending (recent votes count more)
Date modified (newest first)
Date created (oldest first)
By default, NumPy converts Python int objects to numpy.int_
, a signed integer dtype corresponding to C long
. (This decision was made back in the early days when Python int
also corresponded to C long
.)
There is no integer dtype big enough to hold all values of numpy.uint64
dtype and numpy.int_
dtype, so operations between numpy.uint64
scalars and Python int objects produce float64 results instead of integer results. (Operations between uint64 arrays and Python ints may behave differently, as the int is converted to a dtype based on its value in such operations, but a[0]
is a scalar.)
Your first subtraction produces a float64 with value -1, and your second subtraction produces a float64 with value 2**64 (since float64 doesn’t have enough precision to perform the subtraction exactly). Both of these values are out of range for uint64 dtype, so converting back to uint64 for the assignment to a[0]
produces undefined behavior (inherited from C – NumPy just uses a C cast).
On your machine, this happened to produce wraparound behavior, so -1 wrapped around to 18446744073709551615 and 2**64 wrapped around to 0, but that’s not a guarantee. You might see different behavior on other setups. People in the comments did see different behavior.
7
-
4
I am curious, as someone who only uses numpy occasionally, does the community feel this is a defect? This is incredibly surprising since most languages with explicitly-specified fixed with types have their behaviour defined similarly or identically to C.
– Chuu22 hours ago
-
3
@Chuu: I dunno about the general community, but I'd personally prefer if mixing uint64 and signed integers produced uint64 output instead of float64 output. (C handles this specific case better, but it's got its own issues – for example, integer arithmetic on two unsigned operands can produce signed overflow and undefined behavior in C, due to how unsigned types smaller than int get promoted to signed int. I'm not sure if NumPy has any protections in place to avoid this issue.)
– user235711222 hours ago
-
2
Isn’t inheriting UB from C rather worrying, given that the C standard allows a program that invokes UB to do literally anything? Admittedly, most (all?) of the fun stuff that has surfaced lately, like not generating any code at all for an execution path that always invokes UB, hinges on the UB being detected at compile time, which this won’t be, but still…
– Ture Pålsson21 hours ago
-
@TurePålsson: "Isn’t inheriting UB from C rather worrying" – absolutely! The behavior we're seeing here already sucks, but it could be arbitrarily worse some day, and if it does get worse, there's no telling how long the problem might pass unnoticed.
– user235711220 hours ago
-
@TurePålsson: Yup. Fortunately real-world C implementations will produce a value for out-of-range FP to int conversions (at least when the the value being converted is a run-time variable that might be in range), Some compilers even go as far as officially defining the behaviour, such as MSVC: devblogs.microsoft.com/cppblog/… (article re: changing the out-of-range conversion result for FP-to-unsigned to match AVX-512 instructions which produce all-ones, 0xfff… in that case.)
– Peter Cordes17 hours ago
a[0] - 1
is 1.8446744073709552e+19
, a numpy.float64
. That can’t retain all the precision, so its value is 18446744073709551616=264. Which, when written back into a
with dtype np.uint64
, becomes 0
.
2
-
7
NumPy floating->integer conversion with a floating point value out of the integer type's range is undefined behavior (inherited from C), so there's no guarantee the result is actually 0 – it could be anything.
– user2357112yesterday
-
1
@user2357112 Yeah I had actually written a second paragraph about that, also noting harold's different result, but then decided to keep this brief, just explaining the observed behavior of the OP.
– Kelly Bundy23 hours ago
Your Answer
Sign up or log in
Post as a guest
Required, but never shown
Post as a guest
Required, but never shown
By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy
Not the answer you're looking for? Browse other questions tagged
or ask your own question.
or ask your own question.
Interestingly I get a
OverflowError: long too big to convert
for the second decrement. What version of numpy is this, and in what environment?yesterday
Does it matter if you make the 1 an np array of uint64 type also?
yesterday
Same as Dan, I get
OverflowError: Python int too large to convert to C long
yesterday
b = np.zeros(1,np.uint64); b[0] = 1; a - b
give, as one may expect,array([18446744073709551614], dtype=uint64)
. Similarlya[0] -= np.uint64(1)
also works. I suspect the-=
is allowing a conversion to the right-hand side-value typeyesterday
I'm seeing the same issue as OP. Running 3.11 on MacOS.
a[0] -= 1
leaves the value at zero, whilea -= 1
sets the value to FF..FF.yesterday
|
Show 6 more comments