Variable being dropped possibly due to memory leak

Godot Version

v4.2.2.stable.official [15073afe3]

Question

`I will preface this by saying that I am a hobbyist who has been learning as I go, so I do not have any background in computer science.

I am attempting to create a game that involves hundreds of ships moving cargo back and forth between different space stations. A system spawner node procedurally creates a set number of solar systems that each have one or two space stations. For the prototype a ship spawner node creates one ship for each space station. Ships are activated one at a time to spread out the processing load. When a ship is first activated a task is created in the WorkerThreadPool to sort through the closest neighboring space stations and find the trade route with the best utility score for the ship’s home station. The trade route utility calculations are handled on a separate thread through the WorkerThreadPool to limit the impact on the main thread. Once a good trade route has been determined, the ship sets a navigation target for the neighboring station with the best trade route. The ship travels to the station, makes the first part of the trade, and then travels back to the home station to complete the trade.

The space stations are slowly orbiting around a star, and so in order to navigate to a station each ship must access the current global position of their target station at regular intervals. I have set it up so that this is only happening roughly once every ten physics frames to limit the lag. Because I am building this as I learn Godot, the implementation that I first got to work involved setting up an autoload station manager node with a dictionary that contains data about every station (station name, station nodepath, station position). A signal goes out to ping the stations to update their positions in the dictionary and then the ships read the dictionary to get their target station’s position. I think I now have an idea for a better implementation, but I want to figure out what is causing the current issue before I devote the time to doing a deep refactoring of my code.

Currently, this implementation runs for several minutes but inevitably errors out with " Invalid get index of type ‘String’ (on base: ‘Dictionary’) ". If the error occurred immediately then I would know that the issue was clearly with the code, but on the most recent test the code chunk in question executed successfully over 1300 times before the error occurred. The error always occurs when a ship is attempting to access the dictionary on the autoload station manager node. The ship calls a function that uses the station name (which is stored as a string locally on the ship node) and accesses the dictionary on the station manager node (where the station name is the key for a nested dictionary that contains the additional pieces of information about the station).

When this error is thrown I check the debugger and find that the variable holding the ship’s current station is empty. However, using print() I have confirmed that the variable is not empty at several points leading up to the point where there error occurs. In fact, I’ve set it up so that the function that accesses the dictionary should not run at all if the ship’s station name variable doesn’t contain a legitimate station name.

While trying to debug this I noticed that the static memory slowly ticks up while the game is running and the error tends to happen as the static memory gets close to the static max. In the most recent incident the error occurred when the static memory was at 202.6 MiB and the static max was at 210.2 MiB. The computer being used has 32GB of RAM, so I would assume that the issue is not caused by insufficient RAM, but I am not familiar enough with memory management to say with confidence that that is not the issue. So, currently my best explanation is that the error is caused by a memory leak, potentially because I have been using a large number of arrays and dictionaries. For additional context, at the point of error for the most recent attempt there were 18,628 objects, 38 resources, 12297 nodes, and 0 orphan nodes.

I’d be happy to provide additional information, but I’m not sure what other information would be useful. Thank you for any help you can provide.’

The way you describe this, it may be due to an error in order of operation?
Is it possible that there is a situation where you change the target station, therefore changing the name, and the routine is called while the target name is empty?

A thing you can do that might help you navigate this error better is using .get() for Dictionary access

var a = {'a': 0}

a['b']            #This will cause a crash
a.get('b', -1)    #This will return `-1`

Then use a breakpoint on a check
image

1 Like

Interesting theory! I’ll test that out later today and report back.

Thank you!

This smells to me like a concurrency problem; it sounds like you’re using threads to access these dictionaries. Are you protecting them?

Quoth the docs:

In GDScript, reading and writing elements from multiple threads is OK, but anything that changes the container size (resizing, adding or removing elements) requires locking a mutex.

(Thread-safe APIs — Godot Engine (stable) documentation in English)

1 Like

This was exactly the issue. While reviewing my code to see if using a getter would fix the error I found something I had missed before.

Short version: I had implemented a feature to check for ships that had gone idle and tell them to schedule a task, but I didn’t put in anything to prevent it from scheduling a task while a task was already being processed. That wasn’t a problem originally, but then I added a feature to have ships wait briefly before leaving a station to simulate the time it takes to load / unload cargo. So, a ship would start the undock process, then wait for a bit, and in some rare cases while a ship was waiting to undock the idle checker would tell it to undock again.

So far it seems like fixing that will fix the problem, but I’ll have to do some more refactoring before I can say for sure.

Thanks for your help!

You were basically right, although instead of it being a problem with the threads it was a problem with a poorly thought out use of await and call_deferred.

Thanks for your help!