String literals in gdscript

Godot Version

4.6

Question

How are string literals handled memory-wise in gdscript?

For example, if I do something like:

# arr is of type PackedStringArray
arr.append("0")
arr.append("1")
arr.append("0")

is there only one instance of “0” and the other one is just referencing it? Or are there two copies of it?

If we simply take two identical strings and use them in different parts of the code, since they’re allocated on the heap, godot will only have a single instance of that string, and your other code will point to it.

But I think in the case of PackedStringArrays, Godot will prioritize a contiguous area in memory for speed, so your “0” string might be duplicated for faster access.

So it’s not always universal what happens with strongs, it depends heavily on the context you use them in, but if I’m wrong, I do hope someone will correct me!

I do not think the GDScript compiler is performing optimizations like this (nothing close to the level you would expect from state-of-the-art compilers like GCC or LLVM). I wouldn’t worry about it though. I know RAM is getting expensive, but we’re talking about a few bytes here. :grinning_face_with_smiling_eyes: Even a few kilobytes of string literals will be absolutely dwarfed in size by one typical texture or vertex buffer.

If you do have very large strings and you are worried about them being duplicated, (i.e. pages and pages of dialog-trees, or a dozen configuration files for some kind of simulation) I would consider storing them as resources rather than baking them into the scripts themselves as constants.

Godot under the hood is using C++, so you’d be surprised how much optimization is actually happening behind the scenes. It’s a game engine, so it needs every bit of performance squeezed out, and strings are, at that low level, quite heavy, and needs all the optimization they can get.

1 Like

Yes. The engine itself goes though great efforts not to leave any performance on the table negligently, but the technology isn’t magic. The optimizations performed by the C++ compiler can only work on invariants which are known at compile-time. They function as a series of mathematical proofs which can identify when a complex piece of code can be replaced by something more simple which produces an equivalent outcome. It is like taking a few steps to simplify an algebraic expression before solving an equation. You can substitute 2*2 with 4, but 2x must remain 2x.

They could make the GDScript bytecode compiler and interpreter run faster for instance, or use less memory, but they aren’t going to trickle through the GDScript compiler and change its behavior such that it starts doing dead code elimination, unrolling loops, tail end recursion, or de-duplication of constants. These types of optimizations need to be implemented manually (the low hanging fruit may be already, but this is a very complex subject and at a certain point you begin to wonder whether you’re better off actually designing GDScript as a front-end for GCC or LLVM).

Things are a little bit different with the GDShader language. Instead of being compiled to bytecode, it is compiled to GLSL and then compiled to SPIRV using glslang. In this case, it does benefit directly from whatever upstream optimizations are implemented in LLVM.

The String type itself is well optimized, making use of copy-on-write, using a separate StringName type for hash-based comparisons and look-ups, but I’m not confident two identical string literals in GDScript will be de-duplicated automatically. If you created one string from a literal and copied it, it would (even at runtime). Either way, the performance savings we’re talking about here are absolutely negligible. String literals can only be created at compile time. It isn’t something which is going to be happening at the bottom of a loop thousands of times per frame. It is not a concern unless someone starts building an even higher level language which outputs to GDScript.

1 Like

Looks like this optimization is implemented. I must throw in the disclaimer that GDScript implementation is somewhat sprawling and I may not be seeing the whole picture here, but inGDScriptParser::_parse_expression(...) , deep in that monster of a switch statement, there is a case for GDScriptParser::Node::LITERAL which calls add_constant(), which calls add_or_get_constant(), which is implemented in GDScriptByteCodeGenerator by calling get_constant_pos() which searches a map by value (the value of a string literal in our case) and obtains the key, creating one if it doesn’t exist.

Identical string literals should return the same address, but I’m not sure what the scope is. I’m assuming the scope is one script (rather than every script in the project, or a local scope like within one function).

1 Like

If you want control over this use a StringName.

arr.append(&"0")
arr.append(&"1")
arr.append(&"0")

From the docs:

Two StringNames with the same value are the same object. Comparing them is extremely fast compared to regular Strings.

I would be careful about assumptions like this. As you have seen digging into the code, they have done a lot. But I also have seen GDScript become more efficient than the C# implementation in the last two years. It used to be that if you wanted processing speed in your game for certain things, especially math, you used C#. Now GDScript is not only comparable, in may cases it is even faster than C# in critical areas because the language is optimized for running games, and C# is a generic language that while optimized, is not specifically made for running Godot games.

I expect that ternd to continue, so just because an optimization doesn’t exist today is no reason to assume that’s true in the next version(s), which come out at a very fast pace.

The scope is the whole project if you use a StringName per the docs.

Afaik, the two “0” literals will likely point to same data for the same script file, but be two different strings if they happen to be in different files, unless you explicitly use a string name prefix. &"0". I’m not sure if Godot will try to automatically do this under the hood for plain string literals project-wide. It might.

In general, strings are handled similarly to Python. They are immutable and copy-on-write strategy is used when moving them around. So, including the string names, quite a bit of performance optimization is happening there.

1 Like