Implementing per-texel ambient occlusion

Godot Version

4.4 stable

Question

I recently saw a few people implementing per-texel shadows in Godot and I think the effect looks great, however I want to take it a step further. I recently saw this (https://www.youtube.com/watch?v=Ijnjp31oKYU) video on youtube that also calculates ambient occlusion per texel in Unity and have been trying to recreate it in Godot but feel I am a bit in over my head, lol. (You can really see the ambient occlusion in effect at around 25 seconds into the video, notice the shadows behind the bookshelf as they move it)

I tried a couple of things so far, like calculating the ambient occlusion in world space so that I could round the output to the nearest texel but was getting lots of artifacts and a weird line across the x-axis of the world (or maybe it was the z-axis). I think the artifacts had something to do with me doing the conversions from screen and world space wrong.

The closest if gotten so far however has been a standard implementation of ssao that I learned about from learnopengl.com but converting the screen uv to world coordinates using the depth buffer, then rounding those coordinates to the density of my desired texel size then converting it back to screen space. Which, I know isn’t technically correct but so far it’s the closest I’ve gotten to a solution. Figuring out where the nearest texel is and rounding to it needs to be implemented in a more sophisticated way but for right now I’m just trying to get an understanding of what I need for the actual effect. Like for example, would it actually be better to calculate AO in world space instead of screen space? And if so what is the proper way of doing that? I’ve not found much resources online unfortunately.

So if this is something anyone has done before, I would really appreciate some pointers as to how to actually implement this properly and make the effect look as good as it does in the above mentioned video.

Here is the code I have so far, sorry it isn’t very good and a bit messy but I am just trying to get a prototype of the effect going so I can understand the basic concept and improve it from there. If you couldn’t tell graphics programming is still new to me lol.

shader_type spatial;
render_mode unshaded, fog_disabled;

uniform sampler2D screen_texture: hint_screen_texture, source_color, filter_nearest, repeat_disable;
uniform sampler2D depth_texture: hint_depth_texture, filter_nearest, repeat_disable;
uniform sampler2D normal_texture: hint_normal_roughness_texture, source_color, filter_nearest, repeat_disable;
uniform sampler2D noise_texture: source_color, filter_nearest, repeat_enable;

uniform int kernel_size = 64;
uniform float radius = 0.2;
uniform float bias = 0.025;
uniform float intensity = 1.0;

vec3 world_pos(vec2 uv, mat4 inv_proj) {
	vec4 clip_pos = vec4(uv * 2.0 - 1.0, texture(depth_texture, uv).x, 1.0);
	vec4 view_pos = inv_proj * clip_pos;
	return view_pos.xyz / view_pos.w;
}

vec3 get_normal(vec2 uv) {
	vec3 normal = texture(normal_texture, uv).rgb;
	return (normal - 0.5) * 2.0;
}

vec2 screen_pos(vec3 wpos, mat4 view, mat4 proj) {
	vec4 vpos = (view * vec4(wpos, 1.0));
	vec4 cpos = proj * vec4(vpos.xyz, 1.0);
	vec2 ndc = cpos.xy / cpos.w;
	return ndc.xy * 0.5 + 0.5;
}

vec3 screen_to_world(vec2 uv, float depth, mat4 view, mat4 proj) {
	vec4 ndc = vec4(uv * 2.0 - 1.0, depth, 1.0);
	vec4 vpos = proj * ndc;
	vpos.xyz /= vpos.w;
	return (view * vec4(vpos.xyz, 1.0)).xyz;
}

void vertex() {
	POSITION = vec4(VERTEX.xy, 1.0, 1.0);
}

void fragment() {
	vec3 wpos = screen_to_world(SCREEN_UV, texture(depth_texture, SCREEN_UV).r, INV_VIEW_MATRIX, INV_PROJECTION_MATRIX);
	vec2 uv = screen_pos(round(wpos * 32.0) / 32.0, VIEW_MATRIX, PROJECTION_MATRIX);
	const vec2 noise_scale = VIEWPORT_SIZE / 4.0;
	
	vec3 frag_pos = world_pos(uv, INV_PROJECTION_MATRIX);
	vec3 normal = texture(normal_texture, uv).rgb;
	vec3 random_vec = texture(noise_texture, uv * noise_scale).rgb;
	
	vec3 tangent = normalize(random_vec - normal * dot(random_vec, normal));
	vec3 bitangent = cross(normal, tangent);
	mat3 TBN = mat3(tangent, bitangent, normal);
	
	float occlusion = 0.0;
	for (int i = 0; i < kernel_size; i++) {
		float j = float(i);
		vec3 rand = normalize(vec3(sin(j), cos(j), sin(j * 0.5)));
		
		vec3 sample = TBN * rand;
		sample = frag_pos + sample * radius;
		
		vec4 offset = vec4(sample, 1.0);
		offset = PROJECTION_MATRIX * offset;
		offset.xyz /= offset.w;
		offset.xyz = offset.xyz * 0.5 + 0.5;
		
		float sample_depth = world_pos(offset.xy, INV_PROJECTION_MATRIX).z;
		float range_check = smoothstep(0.0, 1.0, radius / abs(frag_pos.z - sample_depth));
		occlusion += (sample_depth >= sample.z + bias ? intensity : 0.0) * range_check;
	}
	
	ALBEDO = vec3(1.0 - occlusion / float(kernel_size));
}

And here is a screenshot of what it currently looks like. As you can see there are still quite a few artifacts and issues and also in motion there’s a lot of jitter as well.