Godot Version
4.6 beta 3
Question
Start reading from next blockquote to skip all the project explanation
I’m working on a simple particle simulation using verlet integration running on the GPU using compute shaders.
The setup is pretty simple, currently with just 2 compute shader passes (plus a rendering pass, but that’s not relevant for this problem):
The first is a space binning pass
The second is a pass in wich all collisions and dynamics are calculated for each particle
To improve the simulation quality, I implemented sub-stepping multiple times in a single frame, and I’ve done this by dispatching the second pass multiple times.
Actually, in my current implementation this second pass also has 2 sub-passes, one for forces and gravity integration and one for collision detection between particles, so I dispatch the same compute shader 2 times for every sub-step in the simulation, and I have a parameter in the push constants to decide wich sub-pass should be running in each dispatch, implemented using a simple if statement
Although this seems a bit unoptimized to have the whole logic and the memory impact needed for collision detection to be there in a simple pass such as the gravity integration pass, the simulation actually runs pretty well and can handle hundreds of thousands of particles while still being surprisingly stable. I would also be pretty curious to know how much of an impact having two sub-passses is to the performance.
But the problem is that now I wanted to implement constraints between particles, using another pass that runs for each constraint rather than for each particle.
Start reading here to skip all of the project explanation
So what I need to do, is to run multiple alternating compute shader passes, that is to say to run (excluding the binning pass, ignore this parenthesis if you skipped the explanation) all in the same compute list:
pass1
then pass 2
then pass 3
then again pass1
then pass 2
then pass 3
then again pass1
then pass 2
then pass 3
etc. , running a set of 3 distinct passes multiple times.
Now, for what I understand, the only way to do this is to
- bind compute pipeline for pass 1
- bind uniform set for pass 1
- set push constants for pass 1
- dispatch
- add barrier
- bind compute pipeline for pass 2
- bind uniform set for pass 2
- set push constants for pass 2
- dispatch
- add barrier
- bind compute pipeline for pass 3
- bind uniform set for pass 3
- set push constants for pass 3
- dispatch
- add barrier
And to repeat all of this over and over again
I guess that would work, but it seems really over complicated to me and with lot of room for optimization.
So my question is: Is this the only way to do it, or is there a better way that avoids using all those compute pipelines and uniform sets bindings?
If you need to see the code, feel free to ask; and sorry for eventual grammatical errors
Thanks in advance for any tips!