I have functions like the following:
const char* get_message() {
return "This is a constant message, will NOT change forever!";
};
const char* get_message2() {
return "message2";
};
And I’m planning to use them everywhere my app, even though in different threads.
I’m wondering about the life time of these strings, i.e. whether it’s safe to use these const char*
string out of the function get_message
.
I guess that a hard coded const char*
string will be compiled into the codes segment of a app instead of the data segments, so maybe it is safely to use them as above?
10
4 Answers
To give an answer from the standard, "message"
is a string literal, and string literals have static lifetime, which means that the object (the char const[]
which contains the characters) has a lifetime for the entire program. (It’s a bit more complicated for objects with non-trivial constructors or destructors). So pointers to it will be valid for the lifetime of the program.
Yes, it is safe to do that. Your assumptions are correct.
3
-
It is not quite right, because if you consider code segment is ,text, but this string constant will be compiled into .rodata section of ELF file
– Drazen Grasovec13 hours ago
-
6
@DrazenGrasovec Yes, and in PE files it will usually be the
.rdata
segment. But this is a technical detail that is not really helpful in the context of the question (in my opinion).– Christian Halaszovich13 hours ago
-
2
@DrazenGrasovec The standard doesn't speak about code segments in binary files, which is an implementation detail. It does however say that string literals have
static
storage duration and your code is therefore safe, no matter where the actual string literals end up.– Ted Lyngmo6 hours ago
Short answer is string literal "message2" will exist in memory
as long as process, but in .rodata section (assuming we talk about ELF file).
We return pointer to string constant, but as we will latter see, there is not separate memory defined anywhere which stores this const char *
pointer
and there is no need to, as address of string is calculated in code and returned using register $rax every time function is called.
But lets take a look in the code what happens with gdb
We put breakpoint in our function returning a pointer to constant string, and we see assembly code and process map:
Code gets this string in following instruction:
0x000055555555514a <+8>: lea 0xeb3(%rip),%rax # 0x555555556004
What this instruction does it calculates address of "message2".
We see here what PIC (position independent code) means.
Address of "message2" string is not hardcoded as absolute,
but is calculated as relative, as hardcoded offset 0xeb3 of next instruction address (0x555555555151 + 0xeb3) and put in register rax.
Purpose of relative addressing (current address +/- offset)
means process will always get the right address of "message2",
no matter where in memory it is loaded.
So here we see that const char *
that you asked actually doesn’t exist in memory, because address is calculated "on the fly" and returned using $rax:
We have address in $rax:
(gdb) i r $rax
rax 0x555555556004 93824992239620
And it holds address of "message2":
(gdb) x/s 0x555555556004
0x555555556004: "message2"
Now lets see where address 0x555555556004 in process address map
is:
0x555555556000 0x555555557000 0x1000 0x2000 r--p /home/drazen/proba/main
So this section is not executable and not writable, just readable and private (r–p) which makes sense as this is not shared library.
When we check with readelf it shows that it is in the .rodata section of ELF file:
drazen@HP-ProBook-640G1:~/proba$ readelf -x .rodata main
Hex dump of section '.rodata':
0x00002000 01000200 6d657373 61676532 00 ....message2.
So answer is that this string will not be hardcoded in code segment .text of the ELF file but read only data segment .rodata, but yes it will exist as long process exists in memory.
And just to add small detail, this constant string will be returned to main() function by reference of course (address), but not on the stack but in register rax:
(gdb) i r
rax 0x555555556004 93824992239620
rbx 0x0
Hope it helps!
9
-
4
Technically you are right. But in the context of the question it doesn't really matter. Also your answer is very platform specific and not necessarily true for all platforms.
– Christian Halaszovich13 hours ago
-
4
Please don't post pictures of code/data.
– Ted Lyngmo13 hours ago
-
6
You can't judge whether some code is legal from the assembly alone. UB could give you one assembly on one compiler, and break things on another.
– HolyBlackCat12 hours ago
-
2
well, question was, was this string hard-coded into code segment, and technically it isnt, because code segment is where instructions are, and this constant string is not coded as a part of assembly instruction, is just plain data so its places in rodata section. I haven't tried this on different platforms, but i am sure it will be placed in rodata as well.
– Drazen Grasovec12 hours ago
-
7
@DrazenGrasovec It doesn't matter where it is placed. The standard guarantees static lifetime, so the compiler must do something to ensure static lifetime. Different linkers and different architectures have different notions of segments. I've used compilers which placed string literals in the same segment as code, and even ones that placed it in the same segment as other data (and which allowed overwriting it!).
– James Kanze12 hours ago
I’m wondering about the life time of these strings, i.e. whether it’s safe to use these
const char*
string out of the functionget_message
.
A quick look at the standard then.
Evaluating a string-literal results in a string literal object with static storage duration, initialized from the given characters as specified above. Whether all string-literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified. [Note: The effect of attempting to modify a string-literal is undefined. —end note]
—-ISO/IEC JTC1 SC22 WG21 N4860 (section 5.13.5 [String literals])
So yes. After the function evaluates the string literal and returns a const char*
to the string literal, the standard assures that this string literal will be given static storage duration.
Literal strings like that have a life-time of the whole program. But note that a
const char *
can point to non-literal strings as well, and the life-time of those string depends on what they're pointing to.14 hours ago
Obligatory nitpick, string literals have type
const char [N]
, notconst char *
.14 hours ago
I would suggest
auto&
instead ofconst char*
to not needlessly loose type information. It will also make functions requiring arrays of known bound happy – and you can use range-based for loops on the returned C strings if needed. Example14 hours ago
Simple solution for strings in C++: Use
std::string
for all your strings. Then the life-time issue is sidestepped. There are exception (as for any strict rule) wherestd::string_view
might be a better choice, but if you only usestd::string
you can't go wrong.13 hours ago
This is the bad thing with "C" style coding and why C++ has containers (like
std::vector
,std::string
) and smart pointers likestd::unique_ptr
andstd::shared_ptr
all these types help with describing lifetime of objects. Your constant object probably is best modeled using astd::string_view
and a message that can change usingstd::string
13 hours ago