Is there a possibility when calling .ToUpper() that the new string requires more memory?

Is there a possibility when calling .ToUpper() that the new string requires more memory?


9

I want to use the the following function in the MemoryExtensions namespace

public static int ToUpper(this ReadOnlySpan<char> source, Span<char> destination, CultureInfo? culture)

My question now is: am I always safe when destination Span has the length of the source span? e.g.

destination = stackalloc char[source.Length];

If no, can someone provide an example which string converts to a larger string when calling ToUpper on in (including which culture)?

6

  • 3

    I want to say yes, but I know some character sets/cultures do weird things when certain letters are capitalized. I'd expect you'd be fine 99.99%+ of the time, but that extra 0.01% could be killer. Or maybe you are perfectly fine… I'm not an expert here, and I'm interested to see if anyone could prove otherwise.

    – Joel Coehoorn

    8 hours ago


  • 6

    The one example that I could think of, which might do this, was German ""viel spaß", possibly turning the ß character into SS, but it turns out that it is left alone by string.ToUpper(new CultureInfo("de-DE")).

    – 500 – Internal Server Error

    8 hours ago

  • All characters can range from 1-4 bytes, with mostly international chars taking up the 4 bytes. You should look into .GetByteCount(). This site has a table of the different UTF-8 1-4 byte characters. That being said, Dotnet is UTF-16 encoding by default. which is 2-4 bytes per character. So I am guessing that it spacing would be source.Length * 4 at most

    – Narish

    7 hours ago

  • 1

    ß could also turn into ẞ, the former having 2 utf-8 bytes and the latter having 3 utf-8 bytes, though as @500-InternalServerError pointed out german culture appears to just ignore ß

    – Skgland

    7 hours ago


  • 1

    @Narish stackalloc will allocate that many <sizeInBytesOfYourType> which in this case is 2 * source.Length

    – Charlieface

    6 hours ago

2 Answers
2


2

MemoryExtensions.ToUpper returns -1 if the destination is too small.

The source code for ToUpper has this gem:

            // Assuming that changing case does not affect length
            if (destination.Length < source.Length)
                return -1;

There is no other point where -1 is returned, the function finishes with return source.Length;

So they’ve assumed it can’t happen. Whether they’re right is another question: if you find a counter-example I suggest you file a bug report on GitHub.

The docs for TextInfo (used later on in the code) say:

The returned string might differ in length from the input string. For more information on casing, refer to the Unicode Technical Report #21 "Case Mappings," published by the Unicode Consortium (https://www.unicode.org/). The current implementation preserves the length of the string. However, this behavior is not guaranteed and could change in future implementations.

2

  • 1

    What the doc says about it: The returned string might differ in length from the input string. For more information on casing, refer to the Unicode Technical Report #21 "Case Mappings," published by the Unicode Consortium (unicode.org). The current implementation preserves the length of the string. However, this behavior is not guaranteed and could change in future implementations.

    – Ben Voigt

    6 hours ago

  • 1

    True but I think at that point there would have to be a new API surface which would return the required length, as there are a number of places that this assumption is made. There isn't even such a function buried privately in TextInfo and associated classes.

    – Charlieface

    5 hours ago



-2

You did not make it clear why you are writing assembler code in C#.

If you want to translate it into C#, it will look like this:

string source = GetSourceString(...);
string destination = source.ToUpper();

New contributor

bpellett is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



Leave a Reply

Your email address will not be published. Required fields are marked *