Draw Compute Shader As Texture

Godot Version

4.3

Question

Hello, so I have a compute shader that just draws pink pixels:

fill_neon.glsl

#[compute]
#version 450

// Workgroup size of 16x16.
layout(local_size_x = 16, local_size_y = 16) in;

// Bind the output texture (set 0, binding 0).
layout(set = 0, binding = 0, rgba8) uniform writeonly image2D out_image;

void main() {
    ivec2 pos = ivec2(gl_GlobalInvocationID.xy);
    
    // Do a bounds check so we don't write outside the texture.
    ivec2 dims = imageSize(out_image);
    if (pos.x >= dims.x || pos.y >= dims.y) {
        return;
    }
    
    // Write neon pink (RGBA: 1.0, 0.0, 1.0, 1.0).
    imageStore(out_image, pos, vec4(1.0, 0.0, 1.0, 1.0));
}

texture_test.gd

extends Node2D

var rd: RenderingDevice
var output_texture_rid: RID
var shader: RID
var pipeline: RID
var uniform_set: RID

var texture_width = 400
var texture_height = 400

func _ready():
    rd = RenderingServer.create_local_rendering_device()
    
    # Create the output texture.
    var tformat = RDTextureFormat.new()
    tformat.width = texture_width
    tformat.height = texture_height
    tformat.depth = 1
    tformat.format = RenderingDevice.DATA_FORMAT_R8G8B8A8_UNORM
    tformat.usage_bits = RenderingDevice.TEXTURE_USAGE_SAMPLING_BIT | RenderingDevice.TEXTURE_USAGE_STORAGE_BIT | RenderingDevice.TEXTURE_USAGE_CAN_COPY_FROM_BIT
    tformat.mipmaps = 1
    var tview = RDTextureView.new()
    output_texture_rid = rd.texture_create(tformat, tview, [])
    
    # Load and create the compute shader that fills the texture with neon pink.
    var shader_file = load("res://fill_neon.glsl")
    var spirv = shader_file.get_spirv()
    shader = rd.shader_create_from_spirv(spirv)
    pipeline = rd.compute_pipeline_create(shader)
    
    # Bind the output texture to the shader (set 0, binding 0).
    var uni_outimage = RDUniform.new()
    uni_outimage.binding = 0
    uni_outimage.uniform_type = RenderingDevice.UNIFORM_TYPE_IMAGE
    uni_outimage.add_id(output_texture_rid)
    uniform_set = rd.uniform_set_create([uni_outimage], shader, 0)
    
    # Dispatch the compute shader.
    var cl = rd.compute_list_begin()
    rd.compute_list_bind_compute_pipeline(cl, pipeline)
    rd.compute_list_bind_uniform_set(cl, uniform_set, 0)
    
    # Define a workgroup size that matches the shader (16x16) and compute group counts.
    var local_size = 16
    var groups_x = int(ceil(texture_width / float(local_size)))
    var groups_y = int(ceil(texture_height / float(local_size)))
    rd.compute_list_dispatch(cl, groups_x, groups_y, 1)
    rd.compute_list_end()
    rd.submit()
    rd.sync()  # Wait for GPU to finish
    var data: PackedByteArray = rd.texture_get_data(output_texture_rid, 0)

    queue_redraw()

func _draw():
    # Read back the texture and draw it.
    var data: PackedByteArray = rd.texture_get_data(output_texture_rid, 0)
    var img = Image.create_from_data(texture_width, texture_height, false, Image.FORMAT_RGBA8, data)
    var tex = ImageTexture.create_from_image(img)
    
    #img.save_png("res://test_output.png")
    #tex = preload("res://test_output.png")
    
    draw_texture_rect(tex, Rect2(Vector2(50, 50), tex.get_size()), false)

Note that I attached the script to a Node2D.

Now this successfully creates an image that’s pink. You can verify that by saving it. The issue: This doesn’t render it. Now if you uncomment the save and preload lines in the _draw() function, it works.

So to summarize:

Using the image directly with draw_texture_rect() renders a white (default color of draw_texture_rect() texture not the pink one we just computed.

Preloading the image from test_output.png i.e. the saved img works just fine.

I don’t understand what’s going on. My guess is some sync issue but I’m really kinda lost here.

Thanks in advance for any help.

Update: I kinda solved it but I still don’t understand what’s going on.

Basically if we move

    var data: PackedByteArray = rd.texture_get_data(output_texture_rid, 0)
    var img = Image.create_from_data(texture_width, texture_height, false, Image.FORMAT_RGBA8, data)
    var tex = ImageTexture.create_from_image(img)

into _ready(), right after we call rd.sync() and we use a member variable to hold the texture, we can then do:

func _draw():
    draw_texture_rect(output_texture, Rect2(Vector2(50, 50), output_texture.get_size()), false)

and it works. I don’t fully get what’s going on because in my mind, we call the compute shader once. It fills the buffer. The buffer won’t change because we never call the shader again. So shouldn’t I be able to also create the texture in _draw()?

Second question: What’s the most efficient way to render the output of a compute shader visually? My goal is to have a ton of particles flying around.

It may be disposing something between the _ready() call and the _draw() call. Not sure what, though.

There’s Texture2DRD which may be more useful but it only works with the main RenderingDevice that you get from RenderingServer.get_rendering_device(). Something like:

extends Node2D

var rd: RenderingDevice
var output_texture_rid: RID
var shader: RID
var pipeline: RID
var uniform_set: RID

var texture_width = 400
var texture_height = 400

var texture:Texture2DRD

func _ready():
    rd = RenderingServer.get_rendering_device()

    # Create the output texture.
    #...
    
    # Create the Texture2DRD here and assign the rid
    texture = Texture2DRD.new()
    texture.texture_rd_rid = output_texture_rid
    
    # Load and create the compute shader that fills the texture with neon pink.
    #...
    
    # Bind the output texture to the shader (set 0, binding 0).
    #...
    
    # Dispatch the compute shader.
    #...

    queue_redraw()

    
func _draw():
    draw_texture_rect(texture, Rect2(Vector2(50, 50), texture.get_size()), false)

I’m not sure if you can somehow share a texture between the main rendering device and a local one.

Edit:

This seems to work:

	var main_rd = RenderingServer.get_rendering_device()
	var main_rid = main_rd.texture_create(tformat, tview)

	texture = Texture2DRD.new()
	texture.texture_rd_rid = main_rid

	output_texture_rid = rd.texture_create_from_extension(RenderingDevice.TEXTURE_TYPE_2D,
		tformat.format,
		tformat.samples,
		tformat.usage_bits,
		main_rd.get_driver_resource(RenderingDevice.DRIVER_RESOURCE_TEXTURE, main_rid, 0),
		tformat.width,
		tformat.height,
		tformat.depth,
		tformat.array_layers)

Basically you create the texture in the main rendering device and you pass it to your local rendering device by “creating” a texture using the native texture handle from the texture in the main rendering device.

Yes, someone just told me basically the same you wrote! Thanks!

Do you know of any other way of making it even more performant?

I’m not sure, sorry. This is probably the most performant way already.

So I was able to use the global RenderingDevice and create a Texture2DRD and simply do

    var tview: RDTextureView = RDTextureView.new()
    output_texture_rid = rd.texture_create(tformat, tview, [])
    display_texture.texture_rd_rid = output_texture_rid

and then:

func _draw():
    draw_texture_rect(display_texture, Rect2(Vector2(0.0, 0.0), display_texture.get_size()), false)

but today I wasn’t able to open my project anymore. I was on 4.3 so I just downloaded 4.4 and it opened again but I kept getting this error:

E 0:00:10:408   world.gd:142 @ _process(): Only local devices can submit and sync.
  <C++ Error>   Condition "is_main_instance" is true.
  <C++ Source>  servers/rendering/rendering_device.cpp:6232 @ submit()
  <Stack Trace> world.gd:142 @ _process()

The docs for compute shaders talk about using a local one but then the docs also say that the global RD is the one that writes to the texture - so now I actually see the issue the poster above mentioned. What’s the most performant way of solving this?

I also get this error if I set everything to a local RenderingDevice:

E 0:00:00:449   world.gd:89 @ set_up_draw_shader(): Condition "!RD::get_singleton()->texture_is_valid(p_texture_rd_rid)" is true.
  <C++ Source>  scene/resources/texture_rd.cpp:90 @ _set_texture_rd_rid()
  <Stack Trace> world.gd:89 @ set_up_draw_shader()
                world.gd:123 @ _ready()

Edit: This might be an example that does what I want. I hope it’s still uptodate godot-demo-projects/compute/texture at master · godotengine/godot-demo-projects · GitHub

You don’t need to submit() and sync() in the main RenderingDevice as it will take care of that itself automatically. If you want to use the main RenderingDevice then you can remove those calls.

The docs talk about creating a buffer, modify it in a compute shader, and getting it back so they use a local RenderingDevice.

If you want to share a texture between RenderingDevices then you could use the snippet I posted above:

I guess that as long as you don’t try to write to the same texture from different RenderingDevices it should be okay.

the demo uses the main RenderingDevice so, if that effect is similar to what you want to achieve then use it.

I don’t know which method is the most performant one.

Yes, the sync() and submit() is something that’s still in there from when I first played around. Was something I was gonna investigate later because I thought it’s not needed, might slow things down a bit but overall doesn’t hurt.

I don’t wanna write to the same texture from different devices, I just wanna make sure I don’t copy anything around between “CPU Memory” and “GPU Memory”.

Thanks, I’ll try both and later on might just make a performance measurement/try to understand what’s going on under the hood here.

So I read a bit more and basically the global RD runs in it’s own thread, the rendering thread, while the scripts one usually writes run in the main thread - unless coded otherwise.

So we can use RenderingServer.call_on_render_thread() to queue function calls and only inside those function calls one should use the global RD.

So I think your example isn’t really thread safe no whereas the linked one should be? That might be a difference?

Maybe, I’m not an expert on the subject. I’d go with the demo example just to be sure though.