Batching Problems: Expected 9 draw calls, got 100

Godot Version

4.3-stable

Question

Hi,

I have been chasing down an issue for hours involving the number of draw calls for a mesh that should have some degree of batching with the Forward+ renderer.

I’m rendering 524 trees, and there is a y-billboarded Quad mesh that fades in once the tree gets further than 10 m away from the camera - this all works well, but during optimization I was suprised by the number of drawcalls that were popping up for the Quad mesh.

All details are below, but basically, I have a bunch of QuadMeshes being used as a billboard (each on a MeshInstance3D). They are a part of a “tree” packed scene, and there are 9 different StandardMaterial3Ds that are used. I’m expecting 9 draw calls, and I’m getting 100.

Here is the scene with virtually everything stripped out except the sky:

At this point, Godot is reporting 118 draw calls.

If I hide the billboard meshes, the scene becomes empty, and there’s just a single draw call. My “reference” scene has 1 draw call, 1 object, and 12692 primitives. With the billboards in I’m getting around 100 draw calls, 15486 primitives, and around 1400 objects.

The primitive count is in the ballpark for what’s expected for 524 quad meshes (the delta is about 3000 primitives). The drawcalls seem way off to me.

For the billboard quad meshes, there are 9 variant materials,

The mesh is the same every time, and I am careful that the mesh and material are not duplicated. They are set procedurally in code via a dictionary and a lookup key:


const BILLBOARD_MATS = {
	"0_0": preload("res://prefabs/tree/billboards/mats3/billboard_green_density0.tres"),
	"0_1": preload("res://prefabs/tree/billboards/mats3/billboard_green_density1.tres"),
... 7 more times

And then during the ready() function, this is called:

func load_billboard() -> void:
	var key := str(state)+"_"+str(density)
	branches_billboard.mesh = BILLBOARD_QUAD_MESH
	var mat: StandardMaterial3D = BILLBOARD_MATS[key]
	branches_billboard.set_surface_override_material(0, mat)

Branches billboard is just a multi-mesh instance3D.

The materials are all StandardMaterial3Ds, but here are the settings:

  • Transparency: Depth Pre-Pass
  • Cull Mode: Disabled
  • Shading Mode: Per-Vertex
  • Albedo Color: f2f2f2
  • Albedo Texture: (all use the same branch_billboard.png)
  • UV settings are unique for each material
  • Shadows: Disable Receive Shadows
  • Mode: Y-Billboard
  • Keep Scale: On

Some of the settings I’m just experimenting with. The texture looks like this:

Basically just shift UV coords to get whatever tree is desired.

Am I severely misunderstanding how Forward+ batches draw calls? Is there some other limitations I am unaware of?

I know how to use MultiMeshInstance3D, but I’d rather them stay as individual MeshInstance3Ds in this case.

Hi.
I can’t tell you how this specific number of draw calls comes together … but …

Have you checked that every mesh and material is actual a instance and not a copy? ( you said UVs are unique for each material, so those are not instances)
If you are using a light, shadow rendering is a draw call for any scene object for every light that casts shadow.
If you are using directional light with PSSM shadows ( by default ) you can have multiple draw calls for every split.
Depth pre pass must be a separate draw call (i guess) but doesn’t seem to be counted by the debugger monitor, so are transparent objects (which are rendered in the transparency pass).

… and by the way. It’s very helpful that “most changes” in the editor are directly executed into the playing preview. Therefor you can switch things around and directly measure the difference in draw calls.

Thanks for responding. As per instance vs. a copy - I’m attaching the materials and meshes via GDscript, and I’m not using the duplicate() method, so presumably it’s just an instance of the underlying mesh/material.

Maybe I should say that I haven’t had issues batching things in the past.

I almost need to reach out to someone who works on rendering in the Godot engine to figure out if there’s any code that sorts the draw calls based (kind of like this article: Order your graphics draw calls around! – realtimecollisiondetection.net – the blog)

If not, then maybe the Forward+ can just do some intelligent batching when drawn objects happen to be next to each other in the scene tree.

In any case, I’m going to do some isolated testing in a new project and report back, in case it’s useful to anyone else.

And for context: I definitely need to optimize. I’m already at a million triangles with my terrain system and my tree system :slight_smile: It’s a core mechanic that the terrain is highly tessellated for snow deformation, and that each tree is unique and each branch is interactable… so it is a tall order.

First update:

The basic test passes, and works as expected. I’m generating a grid of 5x5 plane meshes, with the material set based on the x index. As expected, only 5 draw calls:

func _ready() -> void:
	
	for c in get_children(): c.free()
	
	var plane = PlaneMesh.new()
	plane.size = Vector2(1,1)
	
	var m1 := StandardMaterial3D.new()
	m1.albedo_color = Color.RED
	
	var m2 := StandardMaterial3D.new()
	m2.albedo_color = Color.BLUE
	
	var m3 := StandardMaterial3D.new()
	m3.albedo_color = Color.GREEN
	
	var m4 := StandardMaterial3D.new()
	m4.albedo_color = Color.PURPLE
	
	var m5 := StandardMaterial3D.new()
	m5.albedo_color = Color.CYAN
	
	var x_mat: Array[StandardMaterial3D] = [m1, m2, m3, m4, m5]
	
	for x in range(5):
		for y in range(5):
			var m := MeshInstance3D.new()
			m.name = "MeshInstance" + str(x) + str(y)
			add_child(m)
			m.mesh = plane
			m.set_surface_override_material(0, x_mat[x])
			m.owner = get_tree().edited_scene_root
			m.position = Vector3(x*2.0, 0, y*2.0)

Works with each mesh being a “SpecialMesh” packed scene as well. So at this point I have no idea why the batching is not happening in my game project.

SpecialMesh.gd:

@tool
class_name SpecialMesh
extends MeshInstance3D

const x_mat := [
	preload("res://mats/m1.tres"),
	preload("res://mats/m2.tres"),
	preload("res://mats/m3.tres"),
	preload("res://mats/m4.tres"),
	preload("res://mats/m5.tres")
]
const NEW_PLANE_MESH = preload("res://new_plane_mesh.tres")

@export var x_index := -1

func _ready() -> void:
	mesh = NEW_PLANE_MESH
	set_surface_override_material(0, x_mat[x_index])
	print(x_mat[x_index])

And the main script:

@tool
extends Node3D
const MESH_INSTANCE_00 = preload("res://mesh_instance_00.tscn")

func _ready() -> void:
	for c in get_children(): c.free()
	
	for x in range(5):
		for y in range(5):
			var m: SpecialMesh = MESH_INSTANCE_00.instantiate()
			m.name = "MeshInstance" + str(x) + str(y)
			m.position = Vector3(x*2.0, 0, y*2.0)
			m.x_index = x
			add_child(m)
			
			m.owner = get_tree().edited_scene_root

Same result with QuadMesh (as opposed to PlaneMesh). I’ll have to do some more digging in my own project:

Also switching the x and y indices so that the draw calls are not in the “correct” order in the scene tree doesn’t show the issue either - the renderer batches them properly

Updating again: I found this in the 3.5 docs (which may or may not be relevant anymore), on “item re-ordering”. Optimization using batching — Godot Engine (3.5) documentation in English

I think it’s possible since the QuadMesh is nested too deeply within its packed scene, the item re-ordering lookahead value is too small in my project.

I’m going to consider re-architecturing my packed scene first, and then tweak the project setting after. I’ll report back if I find anything.

The batching documentation in 3.x only applies to 2D rendering, not 3D. In Godot 3.x, there was no form of automatic batching (i.e. instancing) in 3D.

1 Like

I believe you should do your testing using transparency as in your tree sprites.
Pretty sure you’ll find that’s where all those draw calls are getting added.
Cheers !

If you want to dive realy deep in the rendering process maybe this tool will help you

https://renderdoc.org/

With this you can capture the rendering process of a frame and go through it step by step

1 Like

Thank you everyone for your replies.

@OleNic Just tried it. Unfortunately it still hasn’t isolated the issue I’m seeing. In my test project, I have the exact same settings on the StandardMaterial3D as for my tree billboards in my actual game (including transparency), and the draw call count is 5 in the editor, as expected:

I think the editor is somehow lying about draw calls. The same scene, but not ran in the editor, reports 51 draw calls

Transparency doesn’t seem to affect this. In fact, none of the StandardMaterial3D properties affect it. In this screenshot, it is literally just a StandardMaterial3D with a different background for the 5 materials.

In theory this should be 5 draw calls. But supposedly the engine is creating many more.

@klaas It’s funny, I’ve used RenderDoc when working on custom OpenGL stuff, but it did not occur for me to use it here… That’s where I need to go next. If the statistics are incorrect for draw calls either in the editor or at runtime, I need to get to the bottom of it.

Sorry for spamming this with messages - but @klaas, I think you were right regarding shadows. I think that’s the main source of the extra draw calls. I need to figure out how to optimize that, because my current scheme with shadows is costing me 4x the draw calls. There’s still something unaccounted for in my original problem (100 draw calls vs 9), but I think this helps me understanding where the bulk of the issue comes from. Thanks everyone.