COLLABORATION UPDATE: Google & The Forge

Do you feel the ground rumbling? Better Vulkan support incoming! :volcano:

Read more about our concluded collaboration with Google & The Forge on our blog:

If you are currently working on a mobile game, let us know what you think :thought_balloon:

7 Likes

Swag,

Did this improve compatibility at all with the Vulkan Mobile renderer on old(er) devices, or just performance testing on new devices?

and/or how much of this can be back ported into compatibility renderer?

The goal was to improve performance and API “correctness” for newer devices. I suspect that performance has improved for older devices. It is possible that compatibility has improved slightly for older devices as there were places where we used the Vulkan API incorrectly that are now fixed, but it is unlikely. Realistically most compatibility issues with older devices come from them having very poor Vulkan support. To improve compatibility, we need to create targeted workarounds that avoid known driver bugs. We didn’t do anything like that in this project.

Since this is all Vulkan stuff, none of it can be backported to the Compatibility renderer.

1 Like

Hi, I have a few questions if you would be so kind to answer. Are you happy with what the collaboration yielded in terms of performance gain?

I’ve checked the git link, the Sun temple is a static geometry level with few texture maps intended for benchmarking UE4 PBR and mobile features and TPS demo is quite a simple barebones 3D game with very limited graphical fidelity and few almost static npcs.

In what way were those two scenes modified/optimized for mobile testing?

Would you say that running those on a fairly modern phone devices and getting below 30 fps is satisfying performance? Are those percentages (10-20% frame time gain mentioned in the article) calculated from the metrics listed on the git page?
Also, are there any other improvements planed to boost 3D mobile performance in the near future besides optimizations to Vulkan, if so can you mention which?

Much appreciated, thank you!

Yes, I think a 10% - 20% performance gain from a short term engagement like this is a huge win.

The Sun Temple scene was not modified at all other than switching to the mobile rendering backend. The TPS demo was stripped down to the central core of the scene with the addition of a bunch of particle systems, decals, and animated characters.

The TPS demo however was not optimized for mobile which is rather unfortunate as the scene itself is already very unoptimized. It is not really a suitable example project to showcase performance on mobile. We used it as a “worst case scenario” benchmarking tool and I think it was fine for that. So keep in mind, for that scene, the important consideration is not absolute performance numbers, but the change in frame time before and after.

No, I wouldn’t say that I am satisfied with the current performance on mobile devices. We have more work ahead of us to reach a level that I am happy with. The performance level is fine for a lot of games already, but I know we can make it much better.

Yes, the percentage values come roughly from the performance metrics on the Github page. Depending on exactly what data you look at, we could also claim a higher than 20% improvement, but I focused on GPU frame time as that was the focus of this project.

Most of our current planned improvements are on the mobile renderer side rather than on the Vulkan backend side. This project focused on Vulkan because Vulkan on mobile expertise is quite rare (hence why we needed The Forge and Google). Off the top of my head, I don’t know of any major optimizations we could make to our Vulkan backend that would benefit mobile.

For the Mobile renderer, I have a long list of items, here are a few:

  1. Vertex/Fragment barriers (allow overlapping vertex work with previous pass’ fragment work)
  2. VGPR optimizations (reduce VGPR usage in forward shader)
  3. Mediump optimizations (use mediump / half precision everywhere)
  4. Spec constant to reduce light loops
  5. Normals, tangents, shadows, declare mediump in vertex, full precision in fragment (reduces on-chip bandwidth)
  6. Invalidate depth buffer after rendering (reduces VRAM bandwidth)

We will start tackling this list for 4.4, so stay tuned!

2 Likes

Thank you for such a detailed and in-depth answer; I really appreciate it. It’s encouraging to know that there’s room for further performance improvement on mobile devices and that you’re working on it. I’m eagerly looking forward to 4.4. On a side note, it would be great to have the TPS demo polished and optimized for both desktop and mobile. This would serve as an excellent reference point for the community, showcasing what’s possible and optimal, and also as a valuable learning resource for those starting out.

Ya, that would be amazing!

1 Like