"Random" infinite freezing without errors - a tool for tracking internal processes?

Can you post a self-contained minimal snippet of code that replicates this in a fresh project? So we can copy-paste and run it, and verify that it indeed breaks the engine internals as you describe.

If I could reproduce the error, I could solve it, so no, I can’t

Judging from your previous post, you seem to be quite confident about how the problem is triggered. Why not just do that outside of the specific context of your 36000 lines of code? That should replicate it if your theory is correct. Otherwise it’s very reasonable to assume that it’s just plain old bugs somewhere in your code causing it. Because if I do exactly what you described here:

I can’t get the engine to freeze. So obviously, that’s not “all”

You got the response you got because of the information you gave. You tell us you’ve only every programmed with GDScript. Your answers sound like they’re coming from an LLM, which is what

And so these people helped you solve the problem then?

It’s not simple. You think it’s simple because you understand what all your variable names are referencing.

You talk as if you understand how a null check works on a compiler level, and yet you didn’t even know you were iterating over an Array.

Clearly your ego is hurt, and you’re trying to prop it up for some strange reason by using the LLM that helped you code this monstrosity. There is no way you’re going to impress us with your knowledge or ability, because it’s like listening to a new computer science graduate in an interview who has no clue how much they don’t know.

Yes, so that makes sense, and the name makes sense in that context - but again you didn’t provide enough context for us to figure that out.

ECS doesn’t work well without the System part. But you do you.

That’s on you. I am not responsible for your feelings. Perhaps I shouldn’t have taken the time to explain to you why we needed more info, and just told you. Seems like you fixed it on your own anyway.

You seem to think we owe you something. As if our time isn’t valuable, and only yours is. That because we volunteered to answer your question we had better do it your way or you’re not going to ask anymore. If that’s your attitude for getting help from people who are not paid to help you, but do it because they like helping, then by all means please do not come back. We don’t need ungrateful people who treat us like LLMs here.

When you ask for advice, you don’t get to decide what’s pertinent to your question. You can’t ask for help and be the expert. That’s just life. Deal with it.

On your way out, I suggest you take a look at this thread: Post-mortem of my failed attempt to vibe-code a metroidvania game

It’s obvious my ego is hurt, because I don’t like it when people treat me condescendingly based on their experience and point out that they’re significantly more experienced than me. I’m not saying you owe me anything, but my feelings are based on how I perceive your writing and what you’re putting into it. Regardless of what you mean, for many people, mentioning your experience along with a few “advice on the topic” would sound patronizing. It’s generally helpful not to do that

This is very subjective, of course, but I don’t want to hear how experienced you are - I want to know the root of the problem. Is this treating someone like a LLM? In my opinion, no - I don’t want to waste your time or mine figuring out whose architecture is better, who’s more experienced, who understands code better, which naming is better, which patterns are correct and which are not, and so on. (but maybe i didn’t get you very well about LLM, English is not my first language)

Of course, all of these things (naming, architecture and etc.) make sense in the general context, and of course, bad design can and will lead to errors, but the problem I came here with today is very specific. Most of your recommendations boil down to the idea that to solve my problem, I need to completely refactor my game, which is simply not true, because everything in my game works except for this one spot and a very specific bug that doesn’t even occur with 100% certainty

Obviously, I understand code much worse than you do, and that’s fair enough. Otherwise, I wouldn’t be here. But I just can’t understand why you don’t explain it in layman’s terms. For example, why not just say something like, “The compiler’s null check is affected by many things, including how you call an object or what states it has. Here are some resources on memory C++ or GDScript management in Godot that you can read. Here is documentation about Compilation process”.

Is that a general answer? Absolutely. But it’s much more to the point than “rewrite half the game and don’t name variables that way.”, and personally I prefer exactly these answers, although of course I didn’t say which answers I prefer, so it’s my fault

Ofc you’re free to answer however you want - I’m just explaining my reaction, cus I don’t like how passive aggressive the tone is in this dialogue, both on my part and on yours, so I’d just like to clarify that point.

Anyway, thanks for taking the time to answer my question; I’ve fixed this and think about refactor things module
_
P.s - Oh, I see what you meant about LLM. I use a translator, so my messages probably sound more formal because I can’t discuss such complex technical details using my “standard” English - there’s a risk of losing a lot of the details

That’s the problem. I can’t reproduce this error anywhere else except this one—even though the code in this place is literally identical to the code in any other place. I literally have a second identical loop that works perfectly. So I assumed the problem was deeper and wouldn’t have a simple solution. Note that the main topic was about finding debugging tools to at least try to figure out what the problem was

The debugging tools are: the debugger, the print console and your brains. Learn to use all of them proficiently because that’s all we have and will have for the foreseeable future.

Let’s see the code of that exact loop.

Or did you fix it? It’s not clear from what you wrote. If you did fix it, let’s hear the determined cause and the solution, may help someone else in the future.

I really fixed it. Solution: manually (erase) remove the Sprite2D ref from the array before calling queue_free() on it.

If I don’t do this, any attempt to access this Sprite2D within an iteration array after queue_free() will permanently freeze the game

I thought the problem was checking reference validity, not calling queue_free() on it. If you’re saying that calling is_instance_valid() on an invalid instance can cause the engine to freeze - you most certainly don’t understand what is happening in your code.

The problem is that the array contains null instead of a Sprite2D reference (because the object has been queue_free()), and accessing this null in any way causes the program to freeze. Usually, null does not behave this way

But isn’t is_instance_valid created to check if an object exists at all? The documentation explicitly states that it is used to check previous freed objects. If that’s not the case, I’d appreciate it if you could explain what’s going on

I understand it like this: we have objects in an array, and we check each object. If the object is null, it’s unpleasant, but we can move on, this is usually the case

Exactly, that’s why your conclusion doesn’t make sense. You are most certainly misinterpreting something.

I could explain if I saw the actual code that supposedly causes this, and all the relevant context. Since you don’t want to show the code. We can’t get far.

Have in mind that this kinds of “illogical” bugs tend to signal deeper problems with your code and that your supposed fix may not be the fix at all, as those problems will continue to cause strange behavior if root causes are not found and addressed.

Okay, I’m actually really interested in understanding this, so if you don’t mind, I’ll try to show you in more detail

Again, I can’t show you the entire code (unless I send the scripts in their entirety), but I can show you the entire loop and explain everything.

So, we have the following hierarchy: the ThingBody class is a extented Sprite2D that represents an interactive Thing object in the world. Each ThingBody contains a Thing resource, which contains fields about the ThingBody’s position in space, name, and other data, as all Things are stored in location data and used to create ThingBodies.

When a ThingBody is placed in the world, it is bound to the TileWorldData of the nearest tile. TileWorldData is a resource that contains a lot of information about the tile at that position. It has an object_pool that contains references to all the ThingBodies in that tile.

#ThingBody.gd
func _register_on_tile(tile: TileWorldData):
    if not tile.objects_pool:
        tile.objects_pool = {}
    tile.objects_pool[self] = true
    tile.setup_thing(thing)
    if not thing.on_tiles_dict:
        thing.on_tiles_dict = {}
    thing.on_tiles_dict[tile] = true

Also, although there is a setup_thing() method, we don’t pass the thing itself to the tile, but only the ThingBody, so the tile stores a reference to the Sprite2D (ThingBody)

After that, the ThingBody becomes available for interaction. When a player interacts with an object, the modules are activated.

Each Thing has an array of modules(ThingModule), where each module is a resource with encapsulated logic and related data. In the case of the current problem, we are working with the TM_Item module. When someone interacts with ThingBody, the following code is called:

func use(user : EntityBody):
    if thing:
        await thing.execute_modules(user,self)

Here, we call the action on all Thing modules. This is necessary because each interactive Thing is assembled or generated with modules. This is necessary so that a single object can have the logic of an inventory, an item, or anything else, as the logic is described in the modules themselves.
In this case, await is needed for specific modules that can cause animations or delays. Here the thing call:

# Thing.gd
func execute_modules(user : EntityBody, new_body : ThingBody):
	if modules.size() > 0:
		for module : ThingModule in modules:
			await module.execute_action(user, new_body)

In this code, we check if the player can pick up an item, and if they can, we remove the ThingBody from the ground because the player has picked up the item:

# TM_Item.gd (ThingModule)
func execute_action(user : EntityBody, body : ThingBody):
	if not user or not user.entity or not body.thing or not item:
		return
	var inv = user.entity.inventory_core
	if not inv:
		return
	if inv.add_item(InventoryCore.GET_TYPE.STASH_CORE, item):
		body.thing_free()

The player successfully picks up the item, after which we remove the ThingBody from the world inside ThingBody:

# ThingBody.gd
func thing_free():
    if thing:
        if thing.on_tiles_dict.size() > 0:
            for tile: TileWorldData in thing.on_tiles_dict.keys():
                tile.remove_thing(thing)
                tile.objects_pool.erase(self)
        var sec_data = Cursor.location_manager.sector_data
        if sec_data:
            sec_data.remove_thing(thing)
        if Cursor.hovered_thing == thing:
            Cursor.hovered_thing = null
    queue_free()

In this code, we remove the object from all the places where it was previously added. Initially, I didn’t have the line tile.objects_pool.erase(self) because I had forgotten about it, and this is a crucial point. Today, adding this line solved the problem, but we’ll go further.

Where does the problem itself appear? The problem appear in the tile validation method. The game has a method in UiPlayerOverlay.gd that collects information about all ThingBodies on a tile and displays it on the screen, allowing the player to interact with any ThingBody. I have shortened points 2 and 3 of this method, as they relate to other checks and do not concern Thing at all.

# Public - Object list
func update_object_list(list : Dictionary):
	# UiPlayerOverlay.gd
	if updating_list:
		return
	updating_list = true
	UtilityGod.child_clear(object_list_holder)
	UtilityGod.child_clear(action_list_holder)
	if !list.has("ground_data"):
		return
	# 1. Ground tilles
	var tile : TileWorldData = list.get("ground_data")
	var obj_button : Button = object_button_prefub.instantiate()
	obj_button.icon = ResourceGod.get_object_list_icon("ground")
	if !tile:
		return
	# 4. Things
	if !tile.objects_pool.is_empty():
		for obj in tile.objects_pool:
			if !is_instance_valid(obj):
				continue
				if !obj.thing:
					continue
				if is_instance_valid(obj.thing):
					var n_obj_button : Button = object_button_prefub.instantiate()
					n_obj_button.icon = ResourceGod.get_object_list_icon("thing")
					n_obj_button.text = obj.thing.t_name
					object_list_holder.add_child(n_obj_button)
					if !n_obj_button.pressed.is_connected(select_object_from_list):
						n_obj_button.pressed.connect(select_object_from_list.bind(n_obj_button,obj,{"name" : n_obj_button.text}))
	updating_list = false

So, in this method, we access TileWorldData and its obects_pool, which, as we remember, contains references to all the ThingBodies that exist within it. The error occurred in the line if !is_instance_valid(obj)

Initially, there is no this check, and sometimes I received the debug error “this object previously freed”. This was because the thing_free() method didn’t have the line tile.objects_pool.erase(self) - but at the time, I didn’t think about it because I was sure I had added that line (which I hadn’t).

Then, I added the line if !is_instance_valid(obj): - I reasoned as follows: “the is_instance_valid method checks whether the object exists, so if the object is previously freed for some strange reason, the system will simply ignore it.”

However, instead of skipping this already released ThingBody, for some reason, my entire game would freeze indefinitely when the “window is not responding.” The debugger and breakpoints showed me that the point at which the program completely stops is this check: !is_instance_valid(obj): - and the same applies to the check if obj == null - and any attempt to access this obj

My main and only question is why? Why didn’t the continue work? The debugger shows me that obj is null, so why doesn’t it just skip it? Note that the update_object_list method doesn’t have any await, coroutines, or anything else. This method is called as follows:

# TileCore.gd
func show_tile_data(tile : TileWorldData):
    # Init
    var interact_list_to_show : Dictionary = {}
    # Ground
    interact_list_to_show.set("ground_data",tile)
    # Active
    var active_tile = active_tiles.get(tile.map_pos)
    if active_tile:
        interact_list_to_show.set("interact_data",active_tile)
    # Show
    if control_entity.entity.ui_core: control_entity.entity.ui_core.player_ui_overlay.update_object_list(interact_list_to_show)

So I’m assuming two things: either I completely don’t understand what null is and how is_instance_valid works (which is strange because I use it a lot and it usually works) - or there’s something else going on. At the moment, I’m 100% certain that the error is in my code, but I can’t even guess what it is, even though I’ve managed to fix it

What’s with this indentation after first continue? Is this correct?. Put a print statement just before that continue and print obj and tile.objects_pool. What’s the printout?

The first continue never works, so the code after is_instance_valid is never executed - the freeze occurs exactly in the first is_instance_valid code line

Then do printouts just before if !is_instance_valid(obj): i.e. at the very beginning of the loop body.

If the indentation in that loop is as you posted it, then that loop does precisely nothing.

It looks like the freeze occurs when you call it like this (Pic 1): - however, instead of completing the entire cycle (just exit if false), it freezes - maybe this is the expected behavior?

    # 4. Things
    if !tile.objects_pool.is_empty():
        for obj in tile.objects_pool:
            if is_instance_valid(obj):
                if !obj.thing:
                    continue

After adding print() before is_instance_valid(), the freeze no longer appears, and in the debugger, I see the following:

Where did first continue disappear? Which code is relevant? The one you posted or the one shown in your screenshot? If you post one code and run another code then this whole exchange doesn’t make much sense.

I assume that the original version I had was just if is_instance_valid(obj): without continue, since the freeze does not occur with continue as in the screenshot above. This problem turned out to be confusing for me, so it’s not surprising that I’m confused. If the code looks like this:

 # 4. Things
    if !tile.objects_pool.is_empty():
        for obj in tile.objects_pool:
            if is_instance_valid(obj):
                if !obj.thing:
                    continue

Then, the freeze occurs. Perhaps this is just my big stupidity? If so, I apologize for wasting your time. I simply assumed that in the version with if is_instance_valid(obj): the loop would simply not continue

that’s what I’m talking about when I say freeze.

Even without continue, it’s not easy to reproduce the error, as it doesn’t always occur, but seems to happen randomly when combining object lifting and tile exploration