Godot Version
Godot 4.6
Question
I am currently working on a GDExtension in C++ that allows me to train vision-based RL agents in Python. I am doing this because I want to create simulations that make use of compute shaders for important parts of the simulation dynamics (e.g. a simulation where you have to clean up dirt particles).
My question is what the best approach would be to handle the idle and physics loops in this context. The goal is deterministic, step-by-step control from Python with maximum throughput: each time Python sends an action, Godot should advance exactly one physics step and one idle/render step, then return an observation. This implies a fixed delta (decoupling simulation time from real time) and a 1:1 ratio of idle to physics steps.
The existing approach I’ve seen — used by the godot_rl_agents plugin — is to crank up Engine.physics_ticks_per_second and Engine.time_scale:
Engine.physics_ticks_per_second = _get_speedup() * 60
Engine.time_scale = _get_speedup() * 1.0
This works to some extent, but it feels fragile: the speedup factor has to be tuned manually per simulation and per device, and pushing it too far might risk instability.
Is there a cleaner way to achieve this in a C++ GDExtension? If not, what is the closest/best alternative?
More Context
To give more context, the project currently works as follows. I have a node called HPAMasterNode at the root of my project. It takes a scene file (d_env_scene) as input, and creates N subviewports with this scene in it with own_world_3d set to true. Later, I want to collect all textures on the GPU so I can efficiently write them to RAM in a single call. I am using POSIX shared memory and semaphores for the interface between Python and Godot. Here is a snippet of my code to clarify:
void HPAMasterNode::_init_envs() {
if (d_env_scene.is_null())
return;
for (int idx = 0; idx != d_num_envs; ++idx) {
// Create subviewport:
SubViewport *subview = memnew(SubViewport);
subview->set_size(d_obs_res);
subview->set_update_mode(SubViewport::UPDATE_ALWAYS);
subview->set_use_own_world_3d(true);
// Instantiate simulation scene:
Node *scene_inst = d_env_scene->instantiate();
subview->add_child(scene_inst);
add_child(subview);
// Create sprite to display the scene for now
// TODO: this is just for testing. Textures should get collected on
// the GPU and written to RAM in a single batch (and then placed
// in shared memory).
Sprite2D *sprite = memnew(Sprite2D);
sprite->set_texture(subview->get_texture());
sprite->set_centered(false);
sprite->set_position(Vector2(
idx * d_obs_res.x,
0
));
add_child(sprite);
}
}
I already implemented a shared memory interface, but I still need to efficiently write data back to RAM. I also may need to think about utilizing multithreading to increase throughput.
Thanks in advance for your advice!
Vincent