Improve performance of spotlight-based fake GI on mobile

Morpheu5 · January 24, 2024, 3:05pm

Hi all. I’m working on recreating a piece of video art from the late 70s in VR with Godot. I have a simplified version of the project here for your perusal.

Here you can see a screenshot from this project.

[upl-image-preview url=https://godotforums.org/assets/files/2024-01-23/1706014119-688281-image.png]

Because I need to target standalone visors, I figured all the fancy GI techniques are off-limits. I’m also testing on my phone (a Pixel 7) for good measure. I have yet to receive my dev headset so I don’t know what it’ll be, and so I have to aim for the greatest common feature set, also for ease of deployment.

My approach to fake GI is to place a very wide spotlight in front of each “screen” (I have three in the actual piece), do a 5-point sampling and average of the texture, and use this average colour as the light’s colour. The final effect is good enough when you have three screens on three of the four walls, so I’m happy with this.

What I’m not happy with is the performance. When testing on my laptop (16" MBP M1 Max), I get a very commendable 250 fps, but if I turn off the sampling, I shoot up to around 1200 (vsync off). When testing on my phone, fake GI off yields about 75 fps, but fake GI on drops down to about 20 fps. If you look at how I do the sampling, what I do is pull the texture from the GPU’s memory into the CPU’s memory, do the sampling on the CPU, and assign the colour to the light. It’s been pointed out to me that this is Very Bad Indeed™ – I did suspect so but hey, it’s worth being told off sometimes.

One thing I tried is to throttle the sampling using a looping Timer, and that does make things slightly better (I get about 50 fps on the phone) but the stutter makes this approach undesirable. You can try the effect yourself by pressing t while running the project, or tapping the screen of your phone.

I’ve received a number of suggestions, including

Using shaders: I’m not sure how to use shaders for this. I’m only familiar with visual shaders and I’m not sure how compute shaders work. My feeling is that I’d need a shader that takes in the texture, does the sample-and-average, and spits out a single colour value. I’m barely familiar with how fragment shaders work and so I can’t see this as a sensible approach, but maybe other types of shaders could help here? Are they even supported on Android?
ray casting to get the colour: I’m not sure how this would work. I assume this means using a fragment shader on the viewport to do the casting, but this wouldn’t be a good idea because I would lose access to the screens when they move out of view. Alternatively, this would need to be done in the CPU entirely, casting rays from somewhere in the room and sampling the surface colours of the “screens”. I can’t see how much of a performance advantage would this give me but, if that avoids grabbing the entire texture, it could potentially work. I have never done ray casting, though, so I’d still be dead in the water.
using reflection probes: unless they have an API to give me an average colour, this would still require me grabbing the probe’s texture, right?
using mipmaps: this would make sense if these were static pictures (at which point I could just bake the GI and be done with it) but they are videos so I’m guessing no mipmaps are generated.

The mipmaps suggestion gave me an idea: create a severely scaled down version of each video, load that one alongside the high res version, and use it to do the sampling on a much smaller texture. I haven’t tried this but, assuming the cost of taking a, say, 16x9 image as opposed to a 1920x1080 image from the GPU is lower, I’d then have to trust the two VideoStreamPlayers to stay in sync for long enough.

I may be approaching this entirely wrong, so I’m very open to any suggestions.

EDIT The scaled down video idea does make things a bit better but I’m still at around 45 fps (up from 20) with a single video, and I can’t imagine that three would be any better. The timer throttling gets me to 60 fps on mobile with a 1/10 s interval and one video so that could potentially work.

EDIT 2 With three videos and the pre-scaled down duplicates I can get up to about 50 fps on my phone, and around 360 on my laptop. I just have to hope that a VR headset has better hardware than my phone or I won’t be hitting 90 any time soon with this approach.

Efi · January 24, 2024, 3:29pm

Why don’t you precalculate the mean color for all the frames before playing the video, put that in an array, and just reference it each frame? Like light.color = your_mean_color_array[video_current_frame]

Morpheu5 · January 24, 2024, 3:43pm

That’s what I’m doing now, it’s a pain to do the pre-processing but if this is the only solution, I’ll take it.

Picking the right value is a bit more complex than that though, as the VideoStreamPlayer only gives you a stream position in seconds so you have to calculate the frame first.

Efi · January 24, 2024, 3:51pm

How is pre-processing a pain? You already have the processing function, just do it all at once instead of each so many frames.
And it’s not complex to get the frame. You just take the start time, put it on a var, then each frame take the current time - start time, and divide that by the video’s framerate. Godot plays videos as realtime, not as frame-by-frame animations.

bepis · January 24, 2024, 5:29pm

I would second pre-processing but I’m not sure there’s a way to seek with the VideoStreamPlayer according to the stream_position property docs:

Changing this value won’t have any effect as seeking is not implemented yet, except in video formats implemented by a GDExtension add-on.

Which sounds like you may need to pre-process in an external program or maybe an editor script if it’s possible to play videos in the editor. I wonder if maybe using the timer trick with interpolated values could work.

Morpheu5 · January 24, 2024, 6:05pm

As @bepis says, Godot’s VideoStreamPlayer doesn’t seek on time nor frames, so I have to do the pre-processing externally. I’m doing it with Octave at the moment and it is a bit of a pain because obviously Octave is slowish processing thousands of frames, and doesn’t know how to output Godot-friendly data, so there’s that extra pain.

The next problem I’m running into at the moment is that the color values in the array and the video playback don’t seem to be synchronized, and I can’t quite figure out why. I might just combine the array solution with the timer trick just to throttle things down in case there’s still come costly computation getting in the way, although I can’t see how, at this point.

bepis · January 24, 2024, 8:39pm

I imagine you could use the stream player’s stream_position to map the current video time to the array index. You might also need to factor in the playback/capture framerate too.

I got interested and threw together this JavaScript implementation for averaging every nth video frame colors. I’m pretty sure it only works in FireFox but will save the RGB values as a JSON file which should be easy enough to import or create a Godot resource from.

Morpheu5 · January 24, 2024, 8:58pm

Yes, mapping the stream_position to the frame accounting for the frame rate is not a problem, the real issue is with my Octave solution which is slow. The other issue I’m having is with the fact that Theora is a bit of a mess. If I run my Octave script on my OGV files, I get quite a few frames dropped for reasons I can’t see, and that results in a shorter array which then goes out of sync with the video in Godot. If instead I run my script on the original MP4 files, no problem, I get the exact number of frames. Octave uses ffmpeg behind the scenes and I’m using it too to convert between formats so I’m really not sure where the problem might be.

I’ll give your solution a shot, thanks!

Efi · January 25, 2024, 9:24am

ffmpeg should also be able to output the average color array you need, but I don’t dare figure out the command for that

bepis · January 26, 2024, 4:34am

I know this is mostly solved but I wanted to show what I came up with to test the little script I posted before. I added an additional “perceived brightness” value for each frame to also modulate the light_energy and the effect is pretty cool overall.

Efi · January 26, 2024, 12:25pm

had a tiny spark of insight on this, may as well add to it
you don’t need the complete array of frame-by-frame computed values, you just need “enough” of them if you lerp the actual value between data points
so say you are at frame current_frame and you have a point at prev_point and a point at next_point, you can lerp(prev_point.color, next_point.color, (current_frame - prev_point.frame) / (next_point.frame - prev_point.frame)) to get the interpolated color value for the light (or color and intensity, however you set it up)
don’t know if it helps you here, but it may help someone else doing something similar

Morpheu5 · January 26, 2024, 12:30pm

OK, long story short, I ran this on an Oculus Quest 1 and lo and behold I have to give up on Godot for this particular application unless I find a way of playing videos using a codec for which the Quest has hardware support – namely, I think h264 and h265.

My three-screens setup was running at about 6-8 fps and the profiler said I had something like 250 ms of processing time, without there being anything in any of my _process or _physics_process functions. I played out a hunch and removed the video streams altogether and there you have it, the scene was running at a more reasonable 60 fps. I can only imagine this is a lack of hardware support for Theora.

Efi · January 26, 2024, 12:38pm

but it does… ???

Morpheu5 · January 26, 2024, 12:53pm

Not what I say tho. I know Godot supports Theora, but apparently the Quest 1 doesn’t support Theora in hardware, hence the abysmal performance which shoots up the moment I disable video playback. If I can’t get Godot to play videos in other codecs, then it’s game over.

bepis · January 26, 2024, 5:19pm

Bummer. I think this is the relevant proposal for improving Godot’s video player:

github.com/godotengine/godot-proposals

Move video playback out of core and into an officially supported GDExtension

opened 02:19AM - 12 Sep 21 UTC

Calinou

topic:core topic:gdextension

*Note: This change was discussed with reduz and others and is probably good to i…mplement.* ### Describe the project you are working on The Godot editor :slightly_smiling_face: ### Describe the problem or limitation you are having in your project Video playback in Godot currently leaves a lot to be desired: - It has a lot of bugs (see open `bug` issues for [Theora](https://github.com/godotengine/godot/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+theora+label%3Abug), [WebM](https://github.com/godotengine/godot/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+webm+label%3Abug)). - There are performance issues due to not being able to benefit from hardware video decoding, especially on mobile hardware. - It takes valuable space in the binary due to video decoding libraries being large. (libtheora and libvpx also have their own dependencies that could be removed from core, such as libopus.) - It lacks features that are a deal-breaker in some projects such as seeking. - The number of supported formats is fairly limited, with patent-encumbered formats not being supported. - [Library updates are difficult for maintainers to perform.](https://github.com/godotengine/godot/issues/8334) In general, libvpx (used for VP8 and VP9 decoding) isn't exactly known to be easy to work with. - It's [difficult to compile on unconventional platforms](https://github.com/godotengine/godot/issues/30914), making the build process unnecessarily complicated there. We have very few contributors knowledgeable with video decoding libraries, so bug fixes and improvements are rarely seen nowadays. ### Describe the feature / enhancement and how it helps to overcome the problem or limitation With GDExtension (the replacement of GDNative in 4.0), we can move video decoding to an officially supported add-on. This add-on will likely use FFmpeg like [godot-videodecoder](https://github.com/kidrigger/godot-videodecoder) currently does, but it may also use another library depending on code size, maintenance quality and licensing. There are many benefits to moving video playback out of core: - The binary size penalty is removed from the Godot editor and export template binaries. Instead, the size cost is moved to the add-on's compiled libraries. - Hardware video decoding could be used on compatible hardware and software. - Support for patent-encumbered formats can be exposed optionally. (If this is implemented, this will be disabled by default.) - Users will need to check their country's regulations and possibly acquire licenses, especially for commercial use. It *is* possible for open source software to support patent-encumbered formats – otherwise, VLC would not be able to play back `.mp4` videos :slightly_smiling_face: ### Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams Perform a change like https://github.com/godotengine/godot/pull/52003, but for VideoPlayer. ### If this enhancement will not be used often, can it be worked around with a few lines of script? No. ### Is there a reason why this should be core and not an add-on in the asset library? Video decoding needs to have hooks in the engine to be efficiently implemented, so it needs dedicated GDExtension work.

Morpheu5 · January 26, 2024, 6:02pm

Oh, I never saw this one. Still, doesn’t look like it’s going to get done any time soon

bepis · January 26, 2024, 6:19pm

Yeah… Good that there’s a solid proposal but may not get much traction until there’s more demand unfortunately.

Calinou · January 26, 2024, 8:53pm

There’s no hardware out there that supports hardware-accelerated Theora decoding, but its CPU requirements are pretty low nowadays. This is because Theora is really designed as a competitor to MPEG-2 as opposed to H.264 (VP8 was competing with that instead). Alas, VP8 and VP9 have proven difficult to maintain in Godot, so support for these formats was removed in Godot 4.0.

That said, the Quest 1 is pretty outdated by now – even the Quest 2 has a lot of trouble keeping up with recent VR games. The generational gap between each Quest device is massive compared to smartphones. There’s a reason the Quest 3 is legitimately considered a game changer, while we never hear about a specific new smartphone flagship being a gamechanger these days…

Morpheu5 · January 27, 2024, 7:08am

I hear everything you say and agree, but 1. I’m a noob in the space and so I don’t know what the state of the art of VR hardware is at the moment. If this is for an art piece which should potentially be enjoyable by many, I can’t put too many restrictions on the hardware, I’m afraid, and 2. this is a hand-me-down I got from work and I’d need a stronger case than Theora to convince them to buy me a Quest 3, considering VR isn’t our main focus and especially if they can make a case for me to use another engine where I can use mpeg. I’m investigating Unreal at the moment but it’s a nightmare on my Mac. I’ll have to re-evaluate on Windows once I get back into the office. I’d rather avoid Unity like the plague.

Morpheu5 · January 28, 2024, 4:43pm

OK so, here’s an interesting one. I was having a look at my project running on the Quest 1, trying to disable things and undo non-default changes I might have made, keeping an eye on the profiler, because that Frame Time > Process Time sitting at 260 ms with nothing in my scripts didn’t quite make sense, especially after disabling the Theora videos and removing them from the scene altogether.

Well, I came to the spotlights that fake the GI and lo and behold, turning the SpotLight3Ds off dropped that Process Time down to 40~60 ms which is still a bit higher than I’d like for an essentially empty scene, but way more reasonable.

So I replace the spotlights with omnilights and the performance hit comes back, but to a much lesser extent: now I have about 100 ms instead of 260.

Does this make sense? I though spotlights were relatively easy to do even on hardware older than 2019…