Godot Version
4.3.0
Introduction
I’m trying to implement some reversed engineered files from a game inside Godot.
My problem is that this game have many many binary compiled assets, and they are also referencing other files with relative path most of the time (like, a model is referencing meshes who in turn have references to mesh buffers and materials, those lasts ones are referencing shader params & shaders, there’s also somewhere references to textures, etc…).
At the end, even though I haven’t finished putting all the resources of the game in my assets folder, Godot ends up with really poor startup time in Editor mode, so I did a bit of profiling in order to understand why before continuing.
Profiling Result
72 444 files to scan at startup
3.00 GB on disk (2.88GB real usage)
Measures varies between 70s to 110s for EditorFileSystem::_first_scan_filesystem
& EditorFileSystem::_first_scan_process_scripts
VerySleepy profiling results:
94s load
EditorFileSystem::_first_scan_process_scripts
is called recursively on each folder
but it also does a ResourceLoader::get_resource_type(path)
foreach file
which itself does a loader->get_resource_type(path)
foreach loader (17 in total according to my debugger)
And the two most expensive ones are:
ResourceFormatLoaderBinary::get_resource_type
FileAccess::open(p_path, READ);
ResourceFormatImporter::get_resource_type
FileAccess::open(p_path + ".import", READ)
Resulting both in FileAccessWindows::open_internal
taking most of the time.
The true culprit being the Windows Kernel API with FindFirstFileW
taking half the time inclusive, and half the time exclusive (maybe other syscalls not mesured by VerySleepy ?)
I also noticed in my VerySleepy Profiling capture other Windows Kernel API calls, that seems less impactful, but which would reveal later also impacting performances:
CreateFileW
- 19.10s
- from
ResourceFormatImporter::_get_path_and_type
opening
FindClose
- 6.90s
- from
ResourceFormatLoaderBinary::get_resource_type
I did have some trouble to evaluate timings overall, and since I was spending too much time and I didn’t want to spend more to get a good flame graph, I installed Superluminal Performance to double check my hypothesis with it and an immediate flame graph that I’m accustomed to.
Superluminal Performance profiling results
70s load
With 70s in EditorFileSystem::_first_scan_filesystem
and then EditorFileSystem::_first_scan_process_scripts
, we do spend most of it in the get_resource_type
of ResourceFormatLoaderBinary
and ResourceFormatImporter
-
ResourceFormatImporter::get_path_and_type
is spending all it’s time inFileAccessWindows::open_internal
and only because of all subsequent syscalls
-
ResourceFormatLoadBinary::get_resource_type
does spend some times onProjectSettings::localize_path
andResourceLoader::recognize
but are well managed (even thoughrecognize
is doing syscall reads, it only takes ~800ms compared to our 70s of functions ), so again, the true culprit is theFileAccessWindows::open_internal
at the end
And so : FileAccessWindows::open_internal
ends up using ~64s which is the majority of the time, and each and every culprit are for windows syscalls.
- 57% (~36s) on
FindFirstFileW
- 14% (~9s) on
CreateFileW
for it’s_wstat
call - 11% (~7s) on
CreateFileW
for it’s_wfsopen
call - 8% (~5s) on
FindClose
My hypothesis were correct, and so we can assume that in the end, most of the time is spent in all Windows system calls here and there.
Remarks
For a UX point
Even though as a programmer I’m certainly the last one that should give any advice on that matter, I want to bring that I was disturbed to be faced with this problematic. Because while this loading is happening, the editor is stuck on the startup screen and freezing (windows saying it’s not responding if you click somewhere) and during that time, even a verbose output doesn’t give any hindsight as to what is happening. I had to use a first set of basic knowledge and then my experience to know and then confirm that it wasn’t frozen but working on something. (For non programmers, they might someday reach that point of large project, maybe, dunno, but as-is, I think they will be surprised, and maybe it could become a friction point )
For the performance point
I observed that the CPU usage is also a somewhat low 5% (at least on my CPU) according to Windows task manager and which appears to be a single threaded load (was hard to tell from VerySleepy and still not accustomed to read WPA, but Superluminal confirmed it to me)
Disk usage seems low, but then, Windows may have cached some stuff. Beside, when doing full WPA trace, I can plot more data and it is accessing the disk. However, it shouldn’t be a problem from my disk throughput perspective.
If I look at File I/O, I can find every operations, and that’s where it may be struggling.
Conclusion
I just wanted to share my observations, because as an engine programmer working on those subjects, I know it’s good to get data
I know that, obviously, Godot shouldn’t be modified to suit my specific needs, however I do think that if the engine keeps growing in user base, it will be faced with such scenarios more and more, and they’ll need support for such questions, so this is kind of a preview and hopefully a documentation post.
I also do understand that premature optimization isn’t good when programming and as it is, the way that then engine is doing its things is the result from a lot of design experience and care from people with a much clearer and bigger comprehension of it than myself.
However, I still think that:
- First, all those syscalls to windows API aren’t good, and that’s usually a given that in any situation, doing many syscalls in a row isn’t good. (and yeah Windows is really doing its thing behind the hood to make it that expensive)
- Second, all those syscalls are for things that Godot can’t support, because half of the opening are for
(path + '.import')
that doesn’t exists, and others are for binary files that it will conclude that it can’t handle anyway the file, just to check it’s type. (because that’s the performance cost ether I have activated the GDExtension or not ; it first starts with this expensive scan before, and it’s long).
I’m wondering how it would do with pngs and other files that it can support. From my observations I suppose it should be a little bit better (half of it anyway ?), because the loop resource loader thing wouldn’t be that bad and wouldn’t reach each time the ResourceFormatLoaderBinary
. But I don’t know what results we would have for the ResourceFormatImporter
that would have to handle the assets[].import
files.
Also, all of that scanning does happen for EditorFileSystem::_first_scan_process_scripts
.
The comment of the first call state that
// This loads the global class names from the scripts and ensures that even if the
// global_script_class_cache.cfg was missing or invalid, the global class names are valid in ScriptServer.
And I can’t tell if it’s by design or not that this happen.
There could be scripts in custom binary format, there could be many things, but do we have to parse all the files and even non scripts one ?
Or couldn’t we hint the files to search through for that operation ? Or does the initial first scan have intended side effects ? Or do we just skips binary resource loader check
?
Since this happens before my gdextension is loaded, then even with a GDExtension to handle those custom files, it wouldn’t matter in the end and the cost would be there, so what would be appropriate ?
In my case, I do know that I only have static data in those files, so another way to view it to be that it would be my responsability to tell godot to ignore them ?
Solutions ?
And thus, in that situation, with many many assets to handle and no desire to delete them, what could be the solutions that I (and others) should do to mitigate this problem ?
Pack all files in a single binary archive ? (some games do that, and the one that I’m working with is also doing that for other files, those that I’m currently missing)
Implement a Custom Binary loader for that archive ? with a compression algorithm ? with a custom virtual file system ? Could it be working if I want to work with the files in editor mode / tool mode ? Can it be done in GDExtension ? Or would it need to be a module ?
What could be done to implement such a thing in Godot ? Any classes / documentation pages to look up ?
Or is there any other ways that I didn’t thought off yet ?
Thank you for reading all of that.
(I was thinking at asking in Discord initially, but the more I prepared those data points, the more I thought that it should be shared here for reference)