Editor performance on startup when handling a large amout of assets

Godot Version

4.3.0

Introduction

I’m trying to implement some reversed engineered files from a game inside Godot.
My problem is that this game have many many binary compiled assets, and they are also referencing other files with relative path most of the time (like, a model is referencing meshes who in turn have references to mesh buffers and materials, those lasts ones are referencing shader params & shaders, there’s also somewhere references to textures, etc…).
At the end, even though I haven’t finished putting all the resources of the game in my assets folder, Godot ends up with really poor startup time in Editor mode, so I did a bit of profiling in order to understand why before continuing.

Profiling Result

72 444 files to scan at startup
3.00 GB on disk (2.88GB real usage)

Measures varies between 70s to 110s for EditorFileSystem::_first_scan_filesystem & EditorFileSystem::_first_scan_process_scripts

VerySleepy profiling results:

94s load

EditorFileSystem::_first_scan_process_scripts is called recursively on each folder
but it also does a ResourceLoader::get_resource_type(path) foreach file
which itself does a loader->get_resource_type(path) foreach loader (17 in total according to my debugger)

And the two most expensive ones are:

  • ResourceFormatLoaderBinary::get_resource_type
    • FileAccess::open(p_path, READ);
  • ResourceFormatImporter::get_resource_type
    • FileAccess::open(p_path + ".import", READ)

Resulting both in FileAccessWindows::open_internal taking most of the time.
The true culprit being the Windows Kernel API with FindFirstFileW taking half the time inclusive, and half the time exclusive (maybe other syscalls not mesured by VerySleepy ?)
I also noticed in my VerySleepy Profiling capture other Windows Kernel API calls, that seems less impactful, but which would reveal later also impacting performances:

  • CreateFileW
    • 19.10s
    • from ResourceFormatImporter::_get_path_and_type opening
  • FindClose
    • 6.90s
    • from ResourceFormatLoaderBinary::get_resource_type

I did have some trouble to evaluate timings overall, and since I was spending too much time and I didn’t want to spend more to get a good flame graph, I installed Superluminal Performance to double check my hypothesis with it and an immediate flame graph that I’m accustomed to.

Superluminal Performance profiling results

70s load

With 70s in EditorFileSystem::_first_scan_filesystem and then EditorFileSystem::_first_scan_process_scripts, we do spend most of it in the get_resource_type of ResourceFormatLoaderBinary and ResourceFormatImporter

  • ResourceFormatImporter::get_path_and_type is spending all it’s time in FileAccessWindows::open_internal and only because of all subsequent syscalls
    Superluminal_3

  • ResourceFormatLoadBinary::get_resource_type does spend some times on ProjectSettings::localize_path and ResourceLoader::recognize but are well managed (even though recognize is doing syscall reads, it only takes ~800ms compared to our 70s of functions ), so again, the true culprit is the FileAccessWindows::open_internal at the end
    Superluminal_2

And so : FileAccessWindows::open_internal ends up using ~64s which is the majority of the time, and each and every culprit are for windows syscalls.

  • 57% (~36s) on FindFirstFileW
  • 14% (~9s) on CreateFileW for it’s _wstat call
  • 11% (~7s) on CreateFileW for it’s _wfsopen call
  • 8% (~5s) on FindClose

Superluminal_4

My hypothesis were correct, and so we can assume that in the end, most of the time is spent in all Windows system calls here and there.

Remarks

For a UX point

Even though as a programmer I’m certainly the last one that should give any advice on that matter, I want to bring that I was disturbed to be faced with this problematic. Because while this loading is happening, the editor is stuck on the startup screen and freezing (windows saying it’s not responding if you click somewhere) and during that time, even a verbose output doesn’t give any hindsight as to what is happening. I had to use a first set of basic knowledge and then my experience to know and then confirm that it wasn’t frozen but working on something. (For non programmers, they might someday reach that point of large project, maybe, dunno, but as-is, I think they will be surprised, and maybe it could become a friction point :person_shrugging: )

For the performance point

I observed that the CPU usage is also a somewhat low 5% (at least on my CPU) according to Windows task manager and which appears to be a single threaded load (was hard to tell from VerySleepy and still not accustomed to read WPA, but Superluminal confirmed it to me)

Disk usage seems low, but then, Windows may have cached some stuff. Beside, when doing full WPA trace, I can plot more data and it is accessing the disk. However, it shouldn’t be a problem from my disk throughput perspective.
If I look at File I/O, I can find every operations, and that’s where it may be struggling.



Conclusion

I just wanted to share my observations, because as an engine programmer working on those subjects, I know it’s good to get data :smile:

I know that, obviously, Godot shouldn’t be modified to suit my specific needs, however I do think that if the engine keeps growing in user base, it will be faced with such scenarios more and more, and they’ll need support for such questions, so this is kind of a preview and hopefully a documentation post.
I also do understand that premature optimization isn’t good when programming and as it is, the way that then engine is doing its things is the result from a lot of design experience and care from people with a much clearer and bigger comprehension of it than myself.

However, I still think that:

  • First, all those syscalls to windows API aren’t good, and that’s usually a given that in any situation, doing many syscalls in a row isn’t good. (and yeah Windows is really doing its thing behind the hood to make it that expensive)
  • Second, all those syscalls are for things that Godot can’t support, because half of the opening are for (path + '.import') that doesn’t exists, and others are for binary files that it will conclude that it can’t handle anyway the file, just to check it’s type. (because that’s the performance cost ether I have activated the GDExtension or not ; it first starts with this expensive scan before, and it’s long).

I’m wondering how it would do with pngs and other files that it can support. From my observations I suppose it should be a little bit better (half of it anyway ?), because the loop resource loader thing wouldn’t be that bad and wouldn’t reach each time the ResourceFormatLoaderBinary. But I don’t know what results we would have for the ResourceFormatImporter that would have to handle the assets[].import files.

Also, all of that scanning does happen for EditorFileSystem::_first_scan_process_scripts.
The comment of the first call state that

// This loads the global class names from the scripts and ensures that even if the
// global_script_class_cache.cfg was missing or invalid, the global class names are valid in ScriptServer.

And I can’t tell if it’s by design or not that this happen.
There could be scripts in custom binary format, there could be many things, but do we have to parse all the files and even non scripts one ?

Or couldn’t we hint the files to search through for that operation ? Or does the initial first scan have intended side effects ? Or do we just skips binary resource loader check
?
Since this happens before my gdextension is loaded, then even with a GDExtension to handle those custom files, it wouldn’t matter in the end and the cost would be there, so what would be appropriate ?

In my case, I do know that I only have static data in those files, so another way to view it to be that it would be my responsability to tell godot to ignore them ?

Solutions ?

And thus, in that situation, with many many assets to handle and no desire to delete them, what could be the solutions that I (and others) should do to mitigate this problem ?

Pack all files in a single binary archive ? (some games do that, and the one that I’m working with is also doing that for other files, those that I’m currently missing)
Implement a Custom Binary loader for that archive ? with a compression algorithm ? with a custom virtual file system ? Could it be working if I want to work with the files in editor mode / tool mode ? Can it be done in GDExtension ? Or would it need to be a module ?

What could be done to implement such a thing in Godot ? Any classes / documentation pages to look up ?

Or is there any other ways that I didn’t thought off yet ?

Thank you for reading all of that. :bowing_man:

(I was thinking at asking in Discord initially, but the more I prepared those data points, the more I thought that it should be shared here for reference)

1 Like

tl;dr(partically)
I can not agree there is problem, when you have 3GB of files and Editor loads for 90s. I’ve seen a big 2D open project thats loads more than 2mins. May be if you put project on SSD it will load faster.
In theory and like windows work, second opening must be faster, cuz windows cache files.

See PRs for this specific issue.
Fix slow editor load on large projects by Hilderin · Pull Request #95672 · godotengine/godot · GitHub and Fix slow editor load on large projects (v2) by Hilderin · Pull Request #95678 · godotengine/godot · GitHub

Great benchmarking and debugging, very detailled. Checkout this PR that addresses a lot of what you found out: Fix slow editor load on large projects (v2) by Hilderin · Pull Request #95678 · godotengine/godot · GitHub
Feel free to test and suggest any other optimizations or create a PR if you’d like to contribute optimizing the Editor/Engine.