Handling idle and physics loops for faster than real-time RL simulations in Godot

IndoorAdventurer · March 4, 2026, 10:13am

Godot Version

Godot 4.6

Question

I am currently working on a GDExtension in C++ that allows me to train vision-based RL agents in Python. I am doing this because I want to create simulations that make use of compute shaders for important parts of the simulation dynamics (e.g. a simulation where you have to clean up dirt particles).

My question is what the best approach would be to handle the idle and physics loops in this context. The goal is deterministic, step-by-step control from Python with maximum throughput: each time Python sends an action, Godot should advance exactly one physics step and one idle/render step, then return an observation. This implies a fixed delta (decoupling simulation time from real time) and a 1:1 ratio of idle to physics steps.

The existing approach I’ve seen — used by the godot_rl_agents plugin — is to crank up Engine.physics_ticks_per_second and Engine.time_scale:

Engine.physics_ticks_per_second = _get_speedup() * 60
Engine.time_scale = _get_speedup() * 1.0

This works to some extent, but it feels fragile: the speedup factor has to be tuned manually per simulation and per device, and pushing it too far might risk instability.

Is there a cleaner way to achieve this in a C++ GDExtension? If not, what is the closest/best alternative?

More Context

To give more context, the project currently works as follows. I have a node called HPAMasterNode at the root of my project. It takes a scene file (d_env_scene) as input, and creates N subviewports with this scene in it with own_world_3d set to true. Later, I want to collect all textures on the GPU so I can efficiently write them to RAM in a single call. I am using POSIX shared memory and semaphores for the interface between Python and Godot. Here is a snippet of my code to clarify:

void HPAMasterNode::_init_envs() {
	if (d_env_scene.is_null())
		return;

	for (int idx = 0; idx != d_num_envs; ++idx) {
		// Create subviewport:
		SubViewport *subview = memnew(SubViewport);
		subview->set_size(d_obs_res);
		subview->set_update_mode(SubViewport::UPDATE_ALWAYS);
		subview->set_use_own_world_3d(true);

		// Instantiate simulation scene:
		Node *scene_inst = d_env_scene->instantiate();
		subview->add_child(scene_inst);
		add_child(subview);

		// Create sprite to display the scene for now
		// TODO: this is just for testing. Textures should get collected on
		// the GPU and written to RAM in a single batch (and then placed
        // in shared memory).
		Sprite2D *sprite = memnew(Sprite2D);
		sprite->set_texture(subview->get_texture());
		sprite->set_centered(false);
		sprite->set_position(Vector2(
			idx * d_obs_res.x,
			0
		));
		add_child(sprite);
	}
}

I already implemented a shared memory interface, but I still need to efficiently write data back to RAM. I also may need to think about utilizing multithreading to increase throughput.

Thanks in advance for your advice!
Vincent

normalized · March 4, 2026, 11:20am

Pause the scene tree and disable the rendering server. Put all processing code in _physics_process().

Run an endless loop. At each iteration, check if you need to run your step. If not, just await the next physics tick or idle step. If yes, un-pause the scene tree, await the physics tick and force-render the frame. When done, pause the scene tree again.

This should basically on demand execute the next physics tick and render the results into a single frame.

IndoorAdventurer · March 4, 2026, 12:23pm

Thank you for the input!

I am not sure if this solves my problem, though. I need to maximize throughput, so the physics loop should basically run as fast as the idle loop, with both of them always getting the same fixed delta value (e.g. 1/60 if we assume the simulation runs at 60Hz; even if in real time I get 500 fps).

Pausing the tree is a good call. I was indeed planning on doing that before handing control to Python – its also what they do in the RL agents library.

But I am still wondering if the C++ API allows me to gain more control over the two loops; if I can somehow trigger them manually with a specified delta-value.

normalized · March 4, 2026, 12:24pm

Who commands the clock, Python or Godot?

The thing I wrote should behave as if the physics step is triggered manually.

IndoorAdventurer · March 4, 2026, 12:27pm

Ideally there shouldn’t be a clock. The two processes should basically be in a consumer/producer scheme: Godot processes as hard as it can to get the next step and sends data over to Python. Then Python works as hard as it can to process the input and give output actions to send back to Godot, etc. There is never any need to wait for next clock ticks; once one is done control should directly be handed over to the other.

normalized · March 4, 2026, 12:29pm

Then just let Godot run and send the data to Python. If it needs to wait until Python is ready for more data, then Python is effectively controlling the simulation clock.

You can on demand execute the next physics tick and render it as I described above. Python can press the “demand button”.

Isn’t that precisely what you asked for:

The goal is deterministic, step-by-step control from Python with maximum throughput: each time Python sends an action, Godot should advance exactly one physics step and one idle/render step, then return an observation

IndoorAdventurer · March 4, 2026, 1:03pm

The issue I am still not sure about has to do with providing a fixed delta. Say both the simulation and the python part are really efficient, such that combined they could run at 500 fps. But the RL model assumes on every next step exactly 1/60th of a second has passed. Maybe I am not understanding it correctly, but it seems in your approach delta is still determined by physics_ticks_per_second, while I need to say: “nope, exactly 1/60th of a second has passed in simulation time, even if in reality it was only 1/500th of a second”. And of course after processing for 1/500th of a second I can’t afford to just wait till 1/60th has passed to get the right delta that way. In the RL agents library they solved this increasing physics_ticks_per_second and time_scale by the same number, but as said that is a fragile approach because if it is slightly faster you are wasting that remaining time waiting for the next clock tick, while if it is slightly slower (even just temporarily for a few frames) the engine will think it needs to play catch-up.

Apologies if I am still misunderstanding something in your exlanation, though.

normalized · March 4, 2026, 1:37pm

There’s no other way than altering physics_ticks_per_second and working around that, or altering the source code. Godot is a game engine and is built to run simulations in real time, not as fast as possible.

To do it reliably, you can set physics_ticks_per_second to a very large number so the next tick will be executed as soon as possible when the tree is un-paused. Scale the time accordingly to get the desired simulation step time (the way you already described). Then do everything (including the rendering) in _physics_process(). You’ll likely need to disable vsync so it doesn’t interfere with your on-demand rendering.

Something like:

extends Node

var step := 0

func _ready():
	RenderingServer.render_loop_enabled = false
	Engine.physics_ticks_per_second = 10000000
	Engine.time_scale = Engine.physics_ticks_per_second / 60.0
	get_tree().paused = true
	while true:
		if Input.is_action_just_pressed("ui_accept"): # condition to run the next tick
			get_tree().paused = false
		await get_tree().process_frame


func _physics_process(dt):
	print("Simulation step, delta = ", dt)
	print("render start")
	RenderingServer.force_draw()
	RenderingServer.force_sync()
	print("render end")
	
	# use the rendered image
	get_viewport().get_texture().get_image().save_png("res://frames/frame_%06d.png"%step)
	
	await get_tree().physics_frame
	get_tree().paused = true
	step += 1

IndoorAdventurer · March 4, 2026, 3:01pm

Ah thank you so much! This helps a lot ^^ I think I will implement it like this then. Just to be sure I’ll keep the post open a little while longer if anyone else still has some extra ideas.

Thanks again

normalized · March 4, 2026, 5:53pm

Note that I only tested if it runs. Haven’t benchmarked the timings, but it should do what you want. If you’re implementing in C++ you’ll need to use signal handlers instead of awaits but I think you know how to deal with that.

Do report back, I’m interested in knowing if it behaves as expected.

IndoorAdventurer · June 5, 2026, 5:41pm

Hey!

Sorry for the late reply :-p I did manage to get it to work! Your response helped a lot ^^ It essentially came down to indeed setting physics ticks per second to a really large number, and also setting time scale to a big number in such a way that the ratio determined the delta value you would get in physics process. See the first 3 lines of the method below:

// SIM_TIME_MULTIPLIER = 1e6

void VGAMasterNode::_configure_sim_loop() {
    Engine *engine = Engine::get_singleton();
    engine->set_physics_ticks_per_second(static_cast<int>(SIM_TIME_MULTIPLIER) * d_step_rate_hz);
    engine->set_time_scale(SIM_TIME_MULTIPLIER);
    engine->set_max_physics_steps_per_frame(1);
    engine->set_physics_jitter_fix(0.0);
    engine->set_max_fps(0);

    DisplayServer::get_singleton()->window_set_vsync_mode(DisplayServer::VSYNC_DISABLED);
    DisplayServer::get_singleton()->window_set_size(Vector2i(1, 1));
    get_tree()->set_physics_interpolation_enabled(false);
    RenderingServer::get_singleton()->set_render_loop_enabled(false);
}

I then just call force_draw manually within _physics_process right before I have to send the data over to Python. No need for a for loop inside the _ready method, pausing trees or signal handlers

By setting physics steps per frame to 1, max_fps uncapped and turning vsync off, it calls _process just as often as _physics_process, but I avoid using that anyway because there isn’t the same strong guarantee what the delta value would be.

If you want to see the full code, I just made the repo public: GitHub - IndoorAdventurer/visual_godot_agents: A high-performance interface between Godot and Python for RL applications, using POSIX shared memory. · GitHub

Now I’ll just add some last tiny features and move on to the project I was actually supposed to work on :-p

Kind regards and thanks again!
Vincent