Shader performance accessing Uniforms

Godot Version

v4.3.stable.official.77dcf97d8

Question

To play around with custom post processing shaders, I implemented very naïves “pixelate” and “blur” shaders.
I apply them by adding a CanvasLayer to my 3D scene, with a ColorRect that has the shader in its material.

My pixelate shader works fine and do not affect performance.
Blur, though, cost so much that I instantly go down to 30 FPS even on recent GPU hardware.
I do a loop of 100 iterations in the shader. In each iteration I call “texture()”, which seems to cause the problem (commenting the line solve performance issue). It doesn’t seem that costly to me but I apparently am mistaken.

In your experiences, what is the maximum “complexity” you can use in a shader ? Are there better ways of accessing uniform textures ?

I know there are far better (and documented) ways to implement blur, I just want to clarify my understanding of what is costly and what is not.

Here is said blur shader:

shader_type canvas_item;

const ivec2 size = ivec2(10, 10);

uniform sampler2D screen_texture : hint_screen_texture, repeat_disable, filter_nearest;

void fragment() {
	vec4 color = vec4(0.);
	for (int x = -size.x / 2; x < size.x / 2; x++) {
		for (int y = -size.y / 2; y < size.y / 2; y++) {
			vec2 fvec = vec2(ivec2(x, y));
			color += texture(screen_texture, UV + SCREEN_PIXEL_SIZE * fvec);
		}
	}
	color /= float(size.x * size.y);
	
	COLOR = color;
	// Called for every pixel the material is visible on.
}

I found this thread explaining that texture read is indeed pretty slow:
https://www.reddit.com/r/opengl/comments/kpv9rv/how_expensive_are_texture_reads/

In one of the response, user Corysama talk about GPU pipelining some of the calls to avoid waiting for each texture() call sequentially.

If someone have any information on if and how gdshader could take advantage of this, or some tricks about texture access to reduce the overhead it would still be appreciated.