I want to use the the following function in the MemoryExtensions namespace
public static int ToUpper(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo? culture)
My question now is: am I always safe when destination Span has the length of the source span? e.g.
destination = stackalloc char[source.Length];
If no, can someone provide an example which string converts to a larger string when calling ToUpper on in (including which culture)?
6
2 Answers
MemoryExtensions.ToUpper
returns -1
if the destination is too small.
The source code for ToUpper
has this gem:
// Assuming that changing case does not affect length
if (destination.Length < source.Length)
return -1;
There is no other point where -1
is returned, the function finishes with return source.Length;
So they’ve assumed it can’t happen. Whether they’re right is another question: if you find a counter-example I suggest you file a bug report on GitHub.
The docs for TextInfo
(used later on in the code) say:
The returned string might differ in length from the input string. For more information on casing, refer to the Unicode Technical Report #21 "Case Mappings," published by the Unicode Consortium (https://www.unicode.org/). The current implementation preserves the length of the string. However, this behavior is not guaranteed and could change in future implementations.
2
-
1
What the doc says about it: The returned string might differ in length from the input string. For more information on casing, refer to the Unicode Technical Report #21 "Case Mappings," published by the Unicode Consortium (unicode.org). The current implementation preserves the length of the string. However, this behavior is not guaranteed and could change in future implementations.
– Ben Voigt6 hours ago
-
1
True but I think at that point there would have to be a new API surface which would return the required length, as there are a number of places that this assumption is made. There isn't even such a function buried privately in
TextInfo
and associated classes.– Charlieface5 hours ago
I want to say yes, but I know some character sets/cultures do weird things when certain letters are capitalized. I'd expect you'd be fine 99.99%+ of the time, but that extra 0.01% could be killer. Or maybe you are perfectly fine… I'm not an expert here, and I'm interested to see if anyone could prove otherwise.
8 hours ago
The one example that I could think of, which might do this, was German ""viel spaß", possibly turning the
ß
character intoSS
, but it turns out that it is left alone bystring.ToUpper(new CultureInfo("de-DE"))
.8 hours ago
All characters can range from 1-4 bytes, with mostly international chars taking up the 4 bytes. You should look into .GetByteCount(). This site has a table of the different UTF-8 1-4 byte characters. That being said, Dotnet is UTF-16 encoding by default. which is 2-4 bytes per character. So I am guessing that it spacing would be
source.Length * 4
at most7 hours ago
ß could also turn into ẞ, the former having 2 utf-8 bytes and the latter having 3 utf-8 bytes, though as @500-InternalServerError pointed out german culture appears to just ignore ß
7 hours ago
@Narish
stackalloc
will allocate that many<sizeInBytesOfYourType>
which in this case is2 * source.Length
6 hours ago