Performance Issues with lots of TileMapLayers for Large 3D-Stacked Worlds

Godot Version

godot_v4.3-stable_linux

Question

I’m working on a tile-based sandbox world in Godot where the map scales in 3D — essentially multiple Z-levels stacked vertically. I want the world to be scalable in size, and also want to allow the player to view it from a height (like a top-down or isometric view with height).

Right now I’m using multiple TileMapLayer, one per Z-level, but performance drops significantly as the world grows or the number of layers increases. For example, I tested loading a 256x256 tiles per layer grid with 16 layers stacked — performance was acceptable but starting to lag. However, when I tried to load more Z-levels (like 16x16 tiles per layer but very deep vertically with 4096 Z-levels), CPU usage spiked and Godot basically froze. The total number of tiles was similar in both cases, but the layout and distribution affected performance a lot.

I think I’m starting to realize that what I’m doing might be fundamentally contradictory… Maybe I’m wrong in my approach?

What is the optimal rendering strategy for this kind of setup?

I’d try using a GridMap.

1 Like

Thanks for the answer! I’ll try to write a demo and test how it performs.

From what I understand, GridMap is more voxel-based, right? So instead of using a TileSet like in TileMap, I’ll need to create a library of mesh resources?

Also, GridMap is 3D, correct? I was hoping to squeeze out more FPS by using 2D rendering and integer-based logic. Is there a trick to make it behave more like 2D by using flat (plane) meshes for all the resources?

I’ve actually already implemented custom navigation logic, so the main bottleneck I’m facing is the rendering performance.

1 Like

No. A voxel is a 3D pixel. A GridMap is a list of 3D objects that is stored in 3D space. Similar in some ways, but not a Voxel per se.

Yes. You literally drop all the items into a scene and the go to Scene → Export As… → Meshlibrary… You should save the scene you made the MeshLibrary from in case you want to update it later, but you do not need to export the scene itself as part of your project. (You do need to still export the models as part of your project.)

As far as the models themselves, you can just create MeshInstance3D objects with a Plane Mesh as the Shape.

Yes, you can use them all as planes. I recommended it because you’re trying to track your information in 3-dimensions, and a GridMap does that for you without additional work.

Here are a few pieces of advice for performance:

  1. Upgrade. You are running Godot 4.3 which means you are two production versions behind. 4.4.1-stable has been out for almost 4 months now, and 4.5 is about to come out of beta. I would recommend copying your project folder to a new location and upgrading that. But it you can measure performance and see what happens.
  2. Try out the NavigationRegion2D/3D nodes and see if they are potentially more performant than yours. The reason for this is they are written in C++ a compiled languages, and yours are written in GDScript, an interpreted language.
  3. Use the .NET (Mono) version for Linux. (You’ll also want VSCode on your box if your preferred IDE doesn’t support C#.) Then just copy your project and open the copy up in the .NET version. Then you can change the pieces that are giving you performance issues into C# code. They then will be compiled and the performance will likely improve significantly.
1 Like

Thanks for the detailed answer — I really appreciate it!

You misunderstood my setup a bit, though. I actually tagged the post with cpp because I’m using godot-cpp and writing C++ modules compiled externally (through VSCode). My idea is that, by giving myself some constraints, I can write code that vectorizes well — which, based on my benchmarks, has a significant impact on performance.

As for the pathfinding system — I already implemented a basic one, and I’m planning to extend it with a cached hierarchical A* approach (zonal cache). But that’s not the main topic of the post.

When it comes to interaction with C++, I’ve got a decent grasp. However, the rendering pipeline is still a bit of a black box for me — I don’t fully understand what’s considered “best practice” when it comes to things like visibility culling, and how exactly I am supposed to skip drawing of something.

To expand on my original problem — logic (in C++) and UI (in GDScript) are both working fine and performant. But the rendering side of things is where I’m struggling to get efficient results for a scalable world with multiple Z-layers.

Thanks again for the tip about updating — I’ll definitely try testing performance on the newer version.

1 Like

I don’t think TileMapLayer really does much to optimize for the stacking case, so if you’ve got 4096 of them and they’re all drawing, you’re probably asking the GPU to fill the equivalent of 4096 full-frame images every buffer swap. That’s not going to be cheap even on a top-end GPU. I’m not sure how TileMapLayer handles draw order, but if there’s any translucency it will probably be doing back to front, so you won’t even get any benefit from depth testing.

You probably want to consider a scheme that does less overdraw.

2 Likes

I tried GridMap — I liked the baseline FPS (~120) even with a tall tower of cubes. But the FPS dropped proportionally with tower height, which was disappointing. I realized I needed culling, but didn’t like the memory model: GridMap feels like a collection, and removing/hiding meshes doesn’t actually free or skip them — they’re still part of the internal structure. Rebuilding the entire thing seemed costly.

Next, I tried chunking (x^3) and manually building meshes — surprisingly solid results. Similar baseline FPS to GridMap, maybe lower memory usage (though I might be wrong there).

Then I moved to mesh instancing per chunk. This gave a noticeable boost. After that, I got an extra ~20% by doing zero-copy updates: mesh_instance.set_count(n^3) per chunk, then doing set_visible(actual_count).

Culling was still a bit tricky in this setup…

Eventually, I figured out a nice trick: I split the world into a grid of vertical columns, and assigned a MultiMesh to each column. Since it’s orthographic, I can scan top-down and call one set_instance_count(n^2) per column — and FPS became infinite. (I think this setup also goes well into thread pool)

(Using a pool allocator for chunks was also crucial.)

Later, I might add a bitwise compressed format for chunks to speed up traversal. And maybe explore extending the zero-copy approach via a custom Vulkan pipeline if I ever dig into that.

Hope this helps someone!

2 Likes

I conducted further investigation and optimized the rendering bottlenecks using my custom approach. I noticed that opaqueblocksBitY (culling with scanning bit representation) didn’t bring a significant performance gain. The actual bottleneck was in the set_instance_* calls.

What helped was reducing external function calls using the following pattern:

multimesh->set_instance_count(instance_count);
multimesh->set_buffer(buffer);
multimesh->set_visible_instance_count(instance_count);

The PackedFloat32Array buffer needs to be filled manually — it’s not exactly zero-copy, but it’s quite efficient in terms of reducing the number of calls.

Having direct control over the output buffer also opens the door to vectorization.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.