Physics callback performance/threading

Godot Version

4.3.beta2

Issue

TLDR:
_physics_process() callback on many objects stalls _process(). Although there is sufficient physics frame time for the physics process calculations, the callback stall causes visual stuff (notably the camera) to stutter badly due to inconsistent frame time. How fix/workaround?

Full:
I’m creating a 2D game with a lot of dynamic objects (moving and static units), and the entirety of their logic occurs in _physics_process(), since I want it to occur in a constant and (mostly) deterministic manner. I use _process() only for logic-independent code, such as moving the top-down camera, so that the movement of the view can take advantage of higher frame rates and occur as smoothly as possible.

I’m aiming for hundreds of medium-complexity units, however I ran into a peculiar performance issue. At first, the physics engine struggled to keep up since I was calling move_and_slide() on too many physics objects at once. This was rather easy to address, since I don’t need the game logic to update 60 times per second. I dropped the physics rate to 10-30 ticks per second, depending on the selected game speed, which now gives the physics engine more than enough time to process all of the data until the next physics frame. Combined with the new physics interpolation feature, this works really well with smooth, consistent movement for the units.

However, this is where main issue comes up. Although I have a surplus of physics time, the actual _physics_process() callback of ~100 units still takes a long while. The profiler shows ~10ms spent on physics time. Now, game logic-wise this isn’t a problem, since the physics frame time is 33.3ms (at 30 ticks per second) and there is still plenty of time to do all the calculations. But since the _process and _physics_process() occur on the main thread, it appears the _process function is stalled until the physics callbacks are finished.

This leads to a situation where the average frame rate is misleadingly good, but every fifth or so frame has a sudden spike in frame time, going from ~2ms to ~15ms. The result is an awful stuttering in the camera movement, when it needs to be as smooth as possible.

Any suggestions to fix or workaround this?

I’d think the best option would be to have _process() of the critical nodes work in parallel, but I’m not sure that’s even possible as I get an error that canvas_transform can only be called in the main thread, meaning I’d have to wait for the physics callback to finish either way. The units whose callback stalls the main thread also can’t be transferred to another thread, as then it seems they aren’t allowed to access the physics server.
My other consideration would be to process a limited number of units each physics callback, stalling _process() for much less, but I’d prefer to avoid that as it wouldn’t play nicely with some other logic, including the current physics interpolation.

Profiler data (note the inconsistent frame time spikes):