RenderingDevice.create_local_rendering_device() crashes the game for 1 (or more) players

Godot Version

4.6.1

Question

I am integrating Conway’s Game of Life into my idle game. Here’s a gif of it working (with a performance setting on, so it appears to be skipping cycles): https://imgur.com/eWb1hcz

This works flawlessly on my end, as well as on my cheap laptop which uses Intel graphics.

There is one (1) user whose game crashes after 8 physics frames. That is interesting because the ONLY time anything happens is on the first frame, and on every 180th frame after that.

I was able to solve the crash by changing this:

var renderer: RenderingDevice = RenderingServer.create_local_rendering_device()

into:

var renderer: RenderingDevice = RenderingServer.get_rendering_device()

However, a 2nd user reported a crash as well. So it seems that both result in a crash for different users.

Below is the gd script which runs the shader. The crashes never occur in any of these functions, only in _physics_process() after 8 random frames. The key areas are _ready(), _run_cycle(), and _update_display().

class_name GameOfLife
extends MarginContainer


signal cycle_completed(cells_alive: int)

const MAX_GRID_SIZE: Vector2i = Vector2i(100, 100)

static var instance: GameOfLife

static var active_grid_size: Vector2i = Vector2i(10, 10)

static var renderer: RenderingDevice
static var shader: RID
static var pipeline: RID

var texture_a: RID
var texture_b: RID
var current_texture: RID
var next_texture: RID
var alive_counter_buffer: RID

var uniform_set_a: RID
var uniform_set_b: RID
var current_uniform_set: RID
var next_uniform_set: RID

var cells_alive: int = 0

var cycle_duration_ticks: int = 180 ## Default: 180 (3 sec)
var tick_counter: int = 0

var texture_rect: TextureRect
var current_image_texture: ImageTexture
var is_display_enabled: bool = false

## The number of update_display calls ignored
var skip_display_updates: int = 0
var cycle_count: int = 0


#region Ready


func _ready():
	instance = self
	
	set_physics_process(false)
	
	renderer = RenderingServer.get_rendering_device()
	assert(renderer != null, "Gotta be on Forward+ bro")
	if not renderer:
		printerr("GoL - Failed to create RenderingDevice")
	
	var shader_file: RDShaderFile = load("uid://c1io0vqeq8pmu")
	var shader_spirv: RDShaderSPIRV = shader_file.get_spirv()
	shader = renderer.shader_create_from_spirv(shader_spirv)
	pipeline = renderer.compute_pipeline_create(shader)
	
	texture_a = _create_game_texture()
	texture_b = _create_game_texture()
	
	current_texture = texture_a
	next_texture = texture_b
	
	alive_counter_buffer = _create_alive_counter_buffer()
	
	_clear_grid()
	
	uniform_set_a = _create_uniform_set(texture_a, texture_b)
	uniform_set_b = _create_uniform_set(texture_b, texture_a)
	
	current_uniform_set = uniform_set_a
	next_uniform_set = uniform_set_b
	
	Main.done.became_true.connect(set_physics_process.bind(true))
	
	_dev_test()


func _dev_test() -> void:
	instance.cycle_duration_ticks = 2
	instance.set_grid_size(Vector2i(100, 100))
	await Main.await_done(1.0)
	while true:
		instance._randomize_grid(0.3, 1)
		await Utility.timer(16.0)


func _create_game_texture() -> RID:
	var fmt := RDTextureFormat.new()
	fmt.width = MAX_GRID_SIZE.x
	fmt.height = MAX_GRID_SIZE.y
	fmt.format = RenderingDevice.DATA_FORMAT_R8_UINT
	fmt.usage_bits = RenderingDevice.TEXTURE_USAGE_STORAGE_BIT | \
					 RenderingDevice.TEXTURE_USAGE_CAN_UPDATE_BIT | \
					 RenderingDevice.TEXTURE_USAGE_CAN_COPY_FROM_BIT
	return renderer.texture_create(fmt, RDTextureView.new(), [])


func _create_alive_counter_buffer() -> RID:
	var buffer_data := PackedInt32Array([0])
	var buffer_bytes: PackedByteArray = buffer_data.to_byte_array()
	return renderer.storage_buffer_create(
			buffer_bytes.size(), buffer_bytes)


func _clear_grid() -> void:
	var data := PackedByteArray()
	data.resize(MAX_GRID_SIZE.x * MAX_GRID_SIZE.y)
	data.fill(0)
	renderer.texture_update(current_texture, 0, data)
	renderer.texture_update(next_texture, 0, data)


func _create_uniform_set(read_tex: RID, write_tex: RID) -> RID:
	const SHADER_SET: int = 0
	
	var uniforms: Array[RDUniform] = []
	
	var uniform_read := RDUniform.new()
	uniform_read.uniform_type = RenderingDevice.UNIFORM_TYPE_IMAGE
	uniform_read.binding = 0
	uniform_read.add_id(read_tex)
	uniforms.append(uniform_read)
	
	var uniform_write := RDUniform.new()
	uniform_write.uniform_type = RenderingDevice.UNIFORM_TYPE_IMAGE
	uniform_write.binding = 1
	uniform_write.add_id(write_tex)
	uniforms.append(uniform_write)
	
	var uniform_buffer := RDUniform.new()
	uniform_buffer.uniform_type = RenderingDevice.UNIFORM_TYPE_STORAGE_BUFFER
	uniform_buffer.binding = 2
	uniform_buffer.add_id(alive_counter_buffer)
	uniforms.append(uniform_buffer)
	
	return renderer.uniform_set_create(uniforms, shader, SHADER_SET)


#endregion


#region Cycle


func _physics_process(_delta: float) -> void:
	if tick_counter == 0:
		_run_cycle()
	tick_counter = wrapi(tick_counter + 1, 0, cycle_duration_ticks)


func _run_cycle() -> void:
	var zero_data: PackedByteArray = PackedInt32Array([0]).to_byte_array()
	renderer.buffer_update(
			alive_counter_buffer, 0, zero_data.size(), zero_data)
	
	var push_constant: PackedByteArray = PackedInt32Array([
			active_grid_size.x, active_grid_size.y, 0, 0]).to_byte_array()
	
	var compute_list: int = renderer.compute_list_begin()
	renderer.compute_list_bind_compute_pipeline(compute_list, pipeline)
	renderer.compute_list_bind_uniform_set(
			compute_list, current_uniform_set, 0)
	renderer.compute_list_set_push_constant(
			compute_list, push_constant, push_constant.size())
	
	var x_groups: int = ceili(active_grid_size.x / 8.0)
	var y_groups: int = ceili(active_grid_size.y / 8.0)
	renderer.compute_list_dispatch(compute_list, x_groups, y_groups, 1)
	renderer.compute_list_end()
	
	# NOTE - This only works when renderer is a local RD, and isn't needed anyway.
	#renderer.submit()
	#renderer.sync()
	
	var alive_bytes: PackedByteArray = renderer.buffer_get_data(
			alive_counter_buffer)
	var alive_array: PackedInt32Array = alive_bytes.to_int32_array()
	cells_alive = alive_array[0]
	
	var temp: RID = current_texture
	current_texture = next_texture
	next_texture = temp
	
	var temp_uniform: RID = current_uniform_set
	current_uniform_set = next_uniform_set
	next_uniform_set = temp_uniform
	
	cycle_count = wrapi(cycle_count + 1, 0, skip_display_updates + 1)
	if cycle_count == 0:
		_update_display()
	
	cycle_completed.emit(cells_alive)


func _update_display():
	if not is_display_enabled or not texture_rect:
		return
	
	if not current_texture.is_valid():
		return
	
	# Runs the main() method from the shader
	var byte_data: PackedByteArray = renderer.texture_get_data(
			current_texture, 0)
	
	if byte_data.is_empty():
		return
	
	var rgb_data := PackedByteArray()
	rgb_data.resize(active_grid_size.x * active_grid_size.y * 3)
	
	for y: int in range(active_grid_size.y):
		for x: int in range(active_grid_size.x):
			var src_idx: int = y * MAX_GRID_SIZE.x + x
			var dst_idx: int = (y * active_grid_size.x + x) * 3
			
			var age: int = byte_data[src_idx]
			var color: Color = _get_age_color(age)
			
			rgb_data[dst_idx] = int(color.r * 255)
			rgb_data[dst_idx + 1] = int(color.g * 255)
			rgb_data[dst_idx + 2] = int(color.b * 255)
	
	var img: Image = Image.create_from_data(
			active_grid_size.x, active_grid_size.y,
			false, Image.FORMAT_RGB8, rgb_data)
	current_image_texture.set_image(img)


func _get_age_color(age: int) -> Color:
	if age == 0:
		return Color(0.15, 0.15, 0.15)
	
	var t: float = float(age) / 255.0
	
	if t < 0.16:
		var local_t: float = t / 0.16
		return Color(1.0, 1.0, 1.0).lerp(Color(0.0, 0.5, 1.0), local_t)
	elif t < 0.33:
		var local_t: float = (t - 0.16) / 0.17
		return Color(0.0, 0.5, 1.0).lerp(Color(0.0, 1.0, 1.0), local_t)
	elif t < 0.50:
		var local_t: float = (t - 0.33) / 0.17
		return Color(0.0, 1.0, 1.0).lerp(Color(0.0, 1.0, 0.0), local_t)
	elif t < 0.66:
		var local_t: float = (t - 0.50) / 0.16
		return Color(0.0, 1.0, 0.0).lerp(Color(1.0, 1.0, 0.0), local_t)
	elif t < 0.83:
		var local_t: float = (t - 0.66) / 0.17
		return Color(1.0, 1.0, 0.0).lerp(Color(1.0, 0.5, 0.0), local_t)
	else:
		var local_t: float = (t - 0.83) / 0.17
		return Color(1.0, 0.5, 0.0).lerp(Color(1.0, 0.0, 1.0), local_t)


#endregion


#region Control


func _enable_display():
	if is_display_enabled:
		return
	
	is_display_enabled = true
	
	texture_rect = TextureRect.new()
	add_child(texture_rect)
	texture_rect.expand_mode = TextureRect.EXPAND_IGNORE_SIZE
	texture_rect.stretch_mode = TextureRect.STRETCH_KEEP_ASPECT_CENTERED
	
	current_image_texture = ImageTexture.new()
	texture_rect.texture = current_image_texture
	
	_update_display()


func _disable_display():
	if not is_display_enabled:
		return
	
	is_display_enabled = false
	
	if texture_rect:
		texture_rect.queue_free()
		texture_rect = null
	current_image_texture = null


func set_grid_size(_size: Vector2i):
	active_grid_size.x = clampi(_size.x, 1, MAX_GRID_SIZE.x)
	active_grid_size.y = clampi(_size.y, 1, MAX_GRID_SIZE.y)
	_update_display()


func set_cycle_duration(_ticks: int):
	cycle_duration_ticks = _ticks


func set_display_update_frequency(skip_cycles: int):
	skip_display_updates = max(0, skip_cycles)


func place_cell(x: int, y: int, age: int = 1):
	if x < 0 or x >= active_grid_size.x or y < 0 or y >= active_grid_size.y:
		return
	
	var byte_data: PackedByteArray = renderer.texture_get_data(current_texture, 0)
	var idx: int = y * MAX_GRID_SIZE.x + x
	byte_data[idx] = clampi(age, 0, 255)
	renderer.texture_update(current_texture, 0, byte_data)
	renderer.texture_update(next_texture, 0, byte_data)
	_update_display()


func _randomize_grid(alive_probability: float = 0.3, max_age: int = 1):
	var byte_data: PackedByteArray = renderer.texture_get_data(
			current_texture, 0)
	
	for y in range(active_grid_size.y):
		for x in range(active_grid_size.x):
			var idx = y * MAX_GRID_SIZE.x + x
			if randf() < alive_probability:
				byte_data[idx] = randi_range(1, max_age)
			else:
				byte_data[idx] = 0
	
	renderer.texture_update(current_texture, 0, byte_data)
	renderer.texture_update(next_texture, 0, byte_data)
	
	_update_display()


#endregion


#region Signals


func _on_visibility_changed() -> void:
	if is_visible_in_tree():
		_enable_display()
	else:
		_disable_display()


#endregion

And here is the shader. Again, I’ll note that this works as expected on my two systems. I would not expect this to be the cause of the crash. But I’m a noob.

#[compute]
#version 450

layout(local_size_x = 8, local_size_y = 8, local_size_z = 1) in;
layout(push_constant) uniform PushConstants {
    int active_width;
    int active_height;
} push;

layout(set = 0, binding = 0, r8ui) uniform restrict readonly uimage2D input_texture;
layout(set = 0, binding = 1, r8ui) uniform restrict writeonly uimage2D output_texture;
layout(set = 0, binding = 2) buffer AliveCounter {
    uint alive_count;
};

int count_neighbors(ivec2 pos, ivec2 active_size) {
    int count = 0;
    
    for (int dy = -1; dy <= 1; dy++) {
        for (int dx = -1; dx <= 1; dx++) {
            if (dx == 0 && dy == 0) continue;
            
            // Wrap around edges using ACTIVE grid size
            ivec2 neighbor_pos = ivec2(
                (pos.x + dx + active_size.x) % active_size.x,
                (pos.y + dy + active_size.y) % active_size.y
            );
            
            uint neighbor_age = imageLoad(input_texture, neighbor_pos).r;
            if (neighbor_age > 0u) {
                count++;
            }
        }
    }
    
    return count;
}

void main() {
    ivec2 pos = ivec2(gl_GlobalInvocationID.xy);
    ivec2 active_size = ivec2(push.active_width, push.active_height);
    
    if (pos.x >= active_size.x || pos.y >= active_size.y)
		return;
    
    int neighbors = count_neighbors(pos, active_size);
    uint current_age = imageLoad(input_texture, pos).r;
    bool is_alive = current_age > 0u;

    uint next_age = 0u;
    
    if (is_alive) {
        atomicAdd(alive_count, 1u);

        // Cell should survive if it has 2 or 3 neighbors
        if (neighbors == 2 || neighbors == 3) {
            // Cell survives
            next_age = min(current_age + 1u, 255u);
        } else {
            // Cell dies
            next_age = 0u;
        }
    } else {
        // Dead cell becomes alive if it has exactly 3 neighbors
        if (neighbors == 3) {
            next_age = 1u;
        }
    }
    
    imageStore(output_texture, pos, uvec4(next_age, 0u, 0u, 0u));
}

Because it crashes with, apparently, a delay (because nothing happens on the 2nd thru 8th process frames), I was thinking something asynchronous was going on. I just don’t know.

Why are you applying a shader directly to the renderer? This code is way overcomplicated and looks like AI-generated code.

I took a look at the shader and it could just as easily be applied to a TextureRect unless I’m missing something.

I can apply it to TextureRect? I can skip using a rendering device??

Yes.

Again, it looks like you got bad advice from an LLM. I recommend not using LLMs to understand Godot, because they don’t.

Try applying the shader to a ShaderMaterial and attaching that to a TextureRect if there’s an image involved, or just a ColorRect if not. (Set its anchors to Full Rect if you want to cover the screen.) Then change your references to renderer to the rect Control.

1 Like

You’re right, this was originally generated by Claude.

glsl shaders cannot be applied to shadermaterials, I guess. It has to be a gdshader. However, I got a GDScript version working. The grid will only go to 100x100, so that’s not bad for gdscript. I guess a shader isn’t even needed to begin with. I thought relying on the video card would be crucial, so I wasn’t seeing any problem with using RenderingDevice.

So my current solution was to scrap it all and do it in GDScript, and it’s working well even in Compatibility. Thx for your replies

1 Like

Correct.

Yeah, you can draw directly to a Node, so I didn’t think a shader was needed for what you were doing, but I couldn’t be sure.

Looks like this isn’t true for the purposes of the Game of Life. RenderingDevice is required in order to store state between frames when using shaders. I was able to find another user who created a project using the same strategy.

-

It looks like processing a 100x100 grid 60 times per second isn’t possible with GDScript. Using AI again, it used the WorkerThreadPool and was able to process the grid in chunks, which is neat. However, it’s still not possible anyway.

Here are my ultimate findings:

  1. Using GDScript: 150-160 FPS, but it takes 3-4 physics frames to process the entire grid.
  2. Using Vulkan and get_rendering_device(): 120-130 FPS, but certainly processes the grid 60 times per second.
  3. Using Vulkan and create_local_rendering_device(): 165+ FPS, but it crashes the game for at least 1 user.

By the way, for #2, it actually does not crash the game for that 2nd user. False flag.

I understand that AI is troublesome, but its first proposed method (create_local_rendering_device) remains the best option for speed and performance. It doesn’t know nothing, even if it doesn’t know everything.

This has been frustrating because of that 1 user whose game crashes. Why does create_local_rendering_device() result in a crash?

If you’re on windows and 4.6 try switching between Vulkan and D3D driver in Project Settings > Rendering > Rendering Device. Otherwise run Vulkan validation layers on problematic hardware:

There’s still a possibility that your generated code is just buggy.

3 Likes

I disagree.

Take a look at my game, Katamari Mech Spacey, and look at the background, which is a shader that is redrawing every frame. It’s very complex and has no performance issues.

An LLM literally knows nothing by definition. It does not have the ability to know. It is a pattern matching database. It provides information based on observed patterns. Just today I read an article about how a judge in India used an LLM to write a judgement, and it created 4 fake previous judgements to support her case.

There are tons of stories like this, and yet people continue to believe that LLMs would never make a mistake like that with them. LLMs that know anything about Godot only know about code. They never give answers involving using nodes and built-in functionality. And even then, they cannot tell what in their database is good vs bad, and they confuse what other languages can do with what GDScript can do.

This sentence indicates to me that you are trying to rebuild your own game engine - even if you do not know that is what you are doing. No shader needs access to a buffer in that way. A shader literally processes every pixel on screen every frame simultaneously using the GPU. You are trying to cram all the processing into the CPU, and it is failing because GPUs are meant to be dedicated hardware to deal with frame buffering, and processing every pixel on the screen every second.

If your statement were objectively true, that would mean that a single shader on a 3D object would bring a game to its knees.

If it works for you, great. A working game is the goal.

But the LLM that you relied on to create this, as well as every other LLM is going to scan this and process it as a good solution. This thread is literally contributing to LLMs getting worse every generation as the consume the bad outputs of other AIs.

Ultimately, what you are doing is not using Godot the way it is intended because an LLM told you this is the way to solve your problem. A well-written shader would run on a single node. Alternately, you could create the entire system with node objects and leverage all the things that Godot gives you. For you to see a performance hit like what you are seeing, using nodes, you’d need 10s of thousands rendering on the screen at the same time.

I’m not arguing that your experience is not valid, just that you are trying to put a screw in the wall by hammering it in with the handle of a power drill.

1 Like

Switching to DirectX 12 actually improved performance of #2 (using the global renderer) by 40ish FPS, so I’m totally satisfied with that. It will still run on Vulkan on Linux, but I will keep the GDScript version for the case that the game crashes for anyone, but hopefully it won’t

Fair enough, I don’t know wtf I’m doing when it comes to shaders and RenderingDevice. I’m just trying to get Game of Life going. I don’t care how it happens, as long as the FPS doesn’t tank, the game doesn’t crash, and I get my little version of Game of Life running.

Idk if the shader in your game does anything besides looking cool, but the shader for GoL tracks the age of cells and reports how many cells are alive in each cycle, and then in GDScript I add to a certain resource based on how many live cells there are, and color the cells based on their age.

For the second time, I’m satisfied with an answer. We’ll see if I’ll be back in a few days again!