Question about excessive Frames per Second

vivalzar · January 9, 2024, 12:56pm

Godot Version

4.3.dev1

Question

I’m making a game prototype in Godot and ran into something weird concerning FPS. The number of frames per second is more than 5000 and the GPU is at 100%. In the Project Settings, FPS is at the default value of 0 and V-Sync is enabled. Actually, whichever V-Sync mode I choose, FPS remains at 5000+. The only way to reduce it is to set the FPS option to 60.

Is that expected? Is there a way to really use the V-Sync?

By the way, it’s fully reproductible with an empty project.

pennyloafers · January 9, 2024, 3:54pm

I wouldn’t assume so, but there are default rendering settings and game effects that could put an old GPU to work. I assume it doesn’t do so for 4.2? I guess the fps is slightly strange occurrence

I think there is an option to latch physics to rendering, I could be mistaken. I think you could bring up the issue in GitHub

vivalzar · January 9, 2024, 4:07pm

Thanks. I have a modern gaming rig, so I don’t think it’s the issue.
I’ll try in 4.2.1 first.

pennyloafers · January 9, 2024, 4:19pm

There could be a bug in 4.2 too maybe, there is an issue related to frame synchronization. I wonder if this could be related?

github.com/godotengine/godot

Vulkan GPU/CPU synchronization is wrong

opened 03:26PM - 12 Aug 23 UTC

darksylinc

bug topic:rendering confirmed

### Godot version 4.2.x master [16a93563bfd3b02ca0a8f6df2026f3a3217f5571] ### …System information Godot v4.2.dev (262d1eaa6) - Ubuntu 20.04.6 LTS (Focal Fossa) - X11 - Vulkan (Forward+) - dedicated AMD Radeon RX 6800 XT - AMD Ryzen 9 5900X 12-Core Processor (24 Threads) ### Issue description **"OOOFFF"** That's the only way I can describe this ticket. It starts as a simple CPU to GPU synchronozation bug but there is also another set of wrong assumptions and bugs which, when combined, "more or less" synchronization ends up being correct. There are so many things going on I don't know where to start, so I'm going to write this in the order in which I discovered these problems. At worst this is just a waste a VRAM, at best suddenly many of those DEVICE_LOST start making sense. # A lone validation error that shouldn't have happened It all started with a random validation error saying a `vkDestroyRenderPass` got called while it was still in use while trying to repro #80286 on AMD, an error that I could not reproduce ever again. So I went looking for vkDestroyRenderPass calls in the codebase and luckily they weren't many. I'm still evaluating everything to understand what's going on. But it is clear it's **wrong** (there's no way to sugar coat it). That's when I noticed the function `RenderingDeviceVulkan::swap_buffers` does the following: ```cpp void RenderingDeviceVulkan::swap_buffers() { context->swap_buffers(); frame = (frame + 1) % frame_count; _begin_frame(); } // And inside _begin_frame we call _free_pending_resources( frame ) void RenderingDeviceVulkan::_free_pending_resources(int p_frame) { // Free in dependency usage order, so nothing weird happens. // Pipelines. while (frames[p_frame].render_pipelines_to_dispose_of.front()) { RenderPipeline *pipeline = &frames[p_frame].render_pipelines_to_dispose_of.front()->get(); vkDestroyPipeline(device, pipeline->pipeline, nullptr); frames[p_frame].render_pipelines_to_dispose_of.pop_front(); } // .. and lots of other vkDestroy* calls } ``` **Where's the wait?.** That's wrong. The code should've been something along: ```cpp context->swap_buffers(); frame = (frame + 1) % frame_count; vkWaitForFences(device, 1, &fences[frame]); _begin_frame(); ``` I was about to shout 'Eureka!', but went into swap_buffers() just in case there was already a wait there (although it would be some weird math because it would have to be `vkWaitForFences(device, 1, &fences[frame + 1])`). That's when it got worse. # An inverted wait I couldn't find anything in `swap_buffers`, but I was wondering where is the `vkWaitForFences` call??? I found it in [VulkanContext::prepare_buffers](https://github.com/godotengine/godot/blob/16a93563bfd3b02ca0a8f6df2026f3a3217f5571/drivers/vulkan/vulkan_context.cpp#L2265-L2266). But sadly it performs the following: ```cpp vkWaitForFences(device, 1, &fences[frame_index], VK_TRUE, UINT64_MAX); vkResetFences(device, 1, &fences[frame_index]); vkQueueSubmit(graphics_queue, 1, &submit_info, fences[frame_index]); frame_index += 1; frame_index %= FRAME_LAG; ``` **This is wrong.** In case someone wonders, for data that is used to communicate CPU from/to GPU we are supposed to have at least 2 pools of memory: CPU writes to pool 0 while GPU is reading from pool 1, and CPU writes to pool 1 while GPU is reading from pool 0. If the GPU is still reading from pool 0 and CPU wants now to write to pool 0 (it's done writing to 1), it needs to stall and wait. This is a standard double buffer scheme. A triple buffer scheme is the same, but there is an extra pool to minimize potential stalls. Unfortunately with Godot's current code the following sequence is what will happen after running for several frames. | Action | Remarks | |-----------------------|---------------------------------------------------| | vkWaitForFences on 0 | | | vkResetFences on 0 | | | vkQueueSubmit on 0 | | | frame_index = 1; | | | Fill gpu buffers on 1 | | | vkWaitForFences on 1 | | | vkResetFences on 1 | | | vkQueueSubmit on 1 | | | frame_index = 0; | | | Fill gpu buffers on 0 | Oops!! We never waited to see if fence[0] is done | | vkWaitForFences on 0 | | | vkResetFences on 0 | | | vkQueueSubmit on 0 | | And thus **we're not actually synchronizing anything!** The CPU keeps writing to data that may be in use by the GPU. A simple possible solution is the following: ```cpp vkResetFences(device, 1, &fences[frame_index]); vkQueueSubmit(graphics_queue, 1, &submit_info, fences[frame_index]); frame_index += 1; frame_index %= FRAME_LAG; vkWaitForFences(device, 1, &fences[frame_index], VK_TRUE, UINT64_MAX); ``` For best performance it's often suggested to delay the vkWaitForFences() as much as possible (i.e. until it's actually needed for us to write to a buffer from CPU, or destroy a resource, etc). But it would seem given Godot's code chaotic complexity this is the only safe solution that won't result in more hazards. Sadly, the report doesn't end here. There is more. Godot doesn't have 2 pools, it's got 4. # Wrong assumptions on Swapchains `RenderingDeviceVulkan::frame_count` is set to the following: ```cpp // Always need one extra to ensure it's unused at any time, without having to use a fence for this. frame_count = p_context->get_swapchain_image_count() + 1; ``` > **Note:** frame_count is almost always 4 That comment is HUGE. **This is a false assumption**. Neither `vkAcquireNextImageKHR` nor `vkQueuePresentKHR` have an obligation to synchronize the CPU. They use VkSemaphores, which synchronize GPU work with other GPU work (e.g. so that a window isn't presented to the screen while the GPU is still working on it, producing a similar effect to when VSync is off). However in practice most (all???) drivers do stall the CPU due to limitations in SW or HW architectures. Linux+X11 is notorious for stalling inside `vkAcquireNextImageKHR`, but other drivers prefer to stall inside `vkQueuePresentKHR` instead. Since we are debunking false assumptions, we also don't have the guarantee that `vkAcquireNextImageKHR` will return swapchain indices in order (e.g. 0, 1, 2, 3, 0, 1, 2, 3). It could return them out of order or even return the same set multiple times (e.g. 2, 0, 2, 0, 1). We could set the value frame_count to as high as 30, and we still won't have the guarantee that CPU and GPU work will be synchronized. We *need* a fence. ## Swapchains should not dictate the number of pools As I said earlier, on a double buffer scheme, we should have 2 pools of memories (`FRAME_LAG = 2`). But because of the wrong assumption on swapchains, Godot keeps 4 pools of memories (because `frame_count = 4` on most machines). Swapchain count should not be used for anything else other than tracking swapchain related data like VkRenderPass. # Combine these 2 bugs and it works So we have 2 competing implementations and 2 bugs: - `VulkanContext::prepare_buffers` uses `vkWaitForFences` but the calls are out of order - This is what's causing the CPU to wait for the GPU. - This *should* be used to track the 2 pools. - `VulkanContext` uses swapchain count to use 4 pools instead of 2. I've ran simulations in Spreadsheets of what happens when we run our current `vkWaitForFences` code with just 2 values and a `frame_count = 4` and it would seem that this just happens to synchronize correctly, *a happy accident*. Because Godot will stall when starting the 3rd frame, and it keeps 4 pools of data, the data should be safe for access on the 3rd frame. However the complexity of all this is too large, so it's possible that I missed something and we may still get a race condition (that lone `vkDestroyRenderPass` error that sparked this invegistation must've come from somewhere!), just a rare one. I also noticed that if, for some God-forsaken reason, `frame_count = 3` or lower, then we ***will*** have race conditions. I analyzed the surrounding code behind `frame_count` setup, and this is possible but it is improbable. We set `desiredNumOfSwapchainImages = 3` and I don't think there are implementations that have `surfCapabilities.maxImageCount = 2` however I could be wrong. Also, if the driver returns 2 in `vkGetSwapchainImagesKHR` despite us asking for at least 3 (which AFAIK would be a driver bug), we're still in trouble. However it would make sense that those users who can repro `DEVICE_LOST` with high frequency happen to have `frame_count <= 3`. I didn't analyze what happens when `separate_present_queue == true`. In fact, this is so rare that I don't think this code has been tested at all, and we have no way to verify yet if the bug reports have been using this. ### Steps to reproduce This bug is rare to repro, but it should be much easier to repro if we force frame_count to a value lower than 4. ### Minimal reproduction project None.

There is a fix up, but not sure if its in your 4.3.dev1 release yet?

Update: no its still pending merge

I didn’t read it closely, but there could be specific frame_lag settings that somehow puts Godot in a weird positions with its frame synchronization? maybe starts to free-wheel out of control?

vivalzar · January 9, 2024, 4:33pm

Thanks again. I will open an issue to be sure.

vivalzar · January 10, 2024, 1:13pm

I opened this issue:
https://github.com/godotengine/godot/issues/87020

Calinou · January 11, 2024, 3:53pm

For reference, the answer is in Excessive frames per second due to V-Sync not being obeyed since 4.3.dev1 · Issue #87020 · godotengine/godot · GitHub

Double-check V-Sync isn’t forcibly disabled in NVIDIA Control Panel, including for any Godot application profiles in the Manage 3D Settings tab.

Actually yes, it is disabled here. I completely forgot about it. It was a hack to keep the editor from black flickering and stuttering. I removed the profile and the game is correctly synchronizing with the screen now.

system · February 10, 2024, 3:53pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.