Improve performance of spotlight-based fake GI on mobile

Hi all. I’m working on recreating a piece of video art from the late 70s in VR with Godot. I have a simplified version of the project here for your perusal.

Here you can see a screenshot from this project.

[upl-image-preview url=https://godotforums.org/assets/files/2024-01-23/1706014119-688281-image.png]

Because I need to target standalone visors, I figured all the fancy GI techniques are off-limits. I’m also testing on my phone (a Pixel 7) for good measure. I have yet to receive my dev headset so I don’t know what it’ll be, and so I have to aim for the greatest common feature set, also for ease of deployment.

My approach to fake GI is to place a very wide spotlight in front of each “screen” (I have three in the actual piece), do a 5-point sampling and average of the texture, and use this average colour as the light’s colour. The final effect is good enough when you have three screens on three of the four walls, so I’m happy with this.

What I’m not happy with is the performance. When testing on my laptop (16" MBP M1 Max), I get a very commendable 250 fps, but if I turn off the sampling, I shoot up to around 1200 (vsync off). When testing on my phone, fake GI off yields about 75 fps, but fake GI on drops down to about 20 fps. If you look at how I do the sampling, what I do is pull the texture from the GPU’s memory into the CPU’s memory, do the sampling on the CPU, and assign the colour to the light. It’s been pointed out to me that this is Very Bad Indeed™ – I did suspect so but hey, it’s worth being told off sometimes.

One thing I tried is to throttle the sampling using a looping Timer, and that does make things slightly better (I get about 50 fps on the phone) but the stutter makes this approach undesirable. You can try the effect yourself by pressing t while running the project, or tapping the screen of your phone.

I’ve received a number of suggestions, including

  • Using shaders: I’m not sure how to use shaders for this. I’m only familiar with visual shaders and I’m not sure how compute shaders work. My feeling is that I’d need a shader that takes in the texture, does the sample-and-average, and spits out a single colour value. I’m barely familiar with how fragment shaders work and so I can’t see this as a sensible approach, but maybe other types of shaders could help here? Are they even supported on Android?
  • ray casting to get the colour: I’m not sure how this would work. I assume this means using a fragment shader on the viewport to do the casting, but this wouldn’t be a good idea because I would lose access to the screens when they move out of view. Alternatively, this would need to be done in the CPU entirely, casting rays from somewhere in the room and sampling the surface colours of the “screens”. I can’t see how much of a performance advantage would this give me but, if that avoids grabbing the entire texture, it could potentially work. I have never done ray casting, though, so I’d still be dead in the water.
  • using reflection probes: unless they have an API to give me an average colour, this would still require me grabbing the probe’s texture, right?
  • using mipmaps: this would make sense if these were static pictures (at which point I could just bake the GI and be done with it) but they are videos so I’m guessing no mipmaps are generated.

The mipmaps suggestion gave me an idea: create a severely scaled down version of each video, load that one alongside the high res version, and use it to do the sampling on a much smaller texture. I haven’t tried this but, assuming the cost of taking a, say, 16x9 image as opposed to a 1920x1080 image from the GPU is lower, I’d then have to trust the two VideoStreamPlayers to stay in sync for long enough.

I may be approaching this entirely wrong, so I’m very open to any suggestions.

EDIT The scaled down video idea does make things a bit better but I’m still at around 45 fps (up from 20) with a single video, and I can’t imagine that three would be any better. The timer throttling gets me to 60 fps on mobile with a 1/10 s interval and one video so that could potentially work.

EDIT 2 With three videos and the pre-scaled down duplicates I can get up to about 50 fps on my phone, and around 360 on my laptop. I just have to hope that a VR headset has better hardware than my phone or I won’t be hitting 90 any time soon with this approach.

Why don’t you precalculate the mean color for all the frames before playing the video, put that in an array, and just reference it each frame? Like light.color = your_mean_color_array[video_current_frame]

1 Like

That’s what I’m doing now, it’s a pain to do the pre-processing but if this is the only solution, I’ll take it.

Picking the right value is a bit more complex than that though, as the VideoStreamPlayer only gives you a stream position in seconds so you have to calculate the frame first.

How is pre-processing a pain? You already have the processing function, just do it all at once instead of each so many frames.
And it’s not complex to get the frame. You just take the start time, put it on a var, then each frame take the current time - start time, and divide that by the video’s framerate. Godot plays videos as realtime, not as frame-by-frame animations.

1 Like

I would second pre-processing but I’m not sure there’s a way to seek with the VideoStreamPlayer according to the stream_position property docs:

Changing this value won’t have any effect as seeking is not implemented yet, except in video formats implemented by a GDExtension add-on.

Which sounds like you may need to pre-process in an external program or maybe an editor script if it’s possible to play videos in the editor. I wonder if maybe using the timer trick with interpolated values could work.

As @bepis says, Godot’s VideoStreamPlayer doesn’t seek on time nor frames, so I have to do the pre-processing externally. I’m doing it with Octave at the moment and it is a bit of a pain because obviously Octave is slowish processing thousands of frames, and doesn’t know how to output Godot-friendly data, so there’s that extra pain.

The next problem I’m running into at the moment is that the color values in the array and the video playback don’t seem to be synchronized, and I can’t quite figure out why. I might just combine the array solution with the timer trick just to throttle things down in case there’s still come costly computation getting in the way, although I can’t see how, at this point.

I imagine you could use the stream player’s stream_position to map the current video time to the array index. You might also need to factor in the playback/capture framerate too.

I got interested and threw together this JavaScript implementation for averaging every nth video frame colors. I’m pretty sure it only works in FireFox but will save the RGB values as a JSON file which should be easy enough to import or create a Godot resource from.

2 Likes

Yes, mapping the stream_position to the frame accounting for the frame rate is not a problem, the real issue is with my Octave solution which is slow. The other issue I’m having is with the fact that Theora is a bit of a mess. If I run my Octave script on my OGV files, I get quite a few frames dropped for reasons I can’t see, and that results in a shorter array which then goes out of sync with the video in Godot. If instead I run my script on the original MP4 files, no problem, I get the exact number of frames. Octave uses ffmpeg behind the scenes and I’m using it too to convert between formats so I’m really not sure where the problem might be.

I’ll give your solution a shot, thanks!

1 Like

ffmpeg should also be able to output the average color array you need, but I don’t dare figure out the command for that

2 Likes

I know this is mostly solved but I wanted to show what I came up with to test the little script I posted before. I added an additional “perceived brightness” value for each frame to also modulate the light_energy and the effect is pretty cool overall.

1 Like

had a tiny spark of insight on this, may as well add to it
you don’t need the complete array of frame-by-frame computed values, you just need “enough” of them if you lerp the actual value between data points
so say you are at frame current_frame and you have a point at prev_point and a point at next_point, you can lerp(prev_point.color, next_point.color, (current_frame - prev_point.frame) / (next_point.frame - prev_point.frame)) to get the interpolated color value for the light (or color and intensity, however you set it up)
don’t know if it helps you here, but it may help someone else doing something similar

OK, long story short, I ran this on an Oculus Quest 1 and lo and behold I have to give up on Godot for this particular application unless I find a way of playing videos using a codec for which the Quest has hardware support – namely, I think h264 and h265.

My three-screens setup was running at about 6-8 fps and the profiler said I had something like 250 ms of processing time, without there being anything in any of my _process or _physics_process functions. I played out a hunch and removed the video streams altogether and there you have it, the scene was running at a more reasonable 60 fps. I can only imagine this is a lack of hardware support for Theora.

1 Like

but it does… ???

Not what I say tho. I know Godot supports Theora, but apparently the Quest 1 doesn’t support Theora in hardware, hence the abysmal performance which shoots up the moment I disable video playback. If I can’t get Godot to play videos in other codecs, then it’s game over.

Bummer. I think this is the relevant proposal for improving Godot’s video player:

1 Like

Oh, I never saw this one. Still, doesn’t look like it’s going to get done any time soon :frowning:

1 Like

Yeah… Good that there’s a solid proposal but may not get much traction until there’s more demand unfortunately.

There’s no hardware out there that supports hardware-accelerated Theora decoding, but its CPU requirements are pretty low nowadays. This is because Theora is really designed as a competitor to MPEG-2 as opposed to H.264 (VP8 was competing with that instead). Alas, VP8 and VP9 have proven difficult to maintain in Godot, so support for these formats was removed in Godot 4.0.

That said, the Quest 1 is pretty outdated by now – even the Quest 2 has a lot of trouble keeping up with recent VR games. The generational gap between each Quest device is massive compared to smartphones. There’s a reason the Quest 3 is legitimately considered a game changer, while we never hear about a specific new smartphone flagship being a gamechanger these days…

1 Like

I hear everything you say and agree, but 1. I’m a noob in the space and so I don’t know what the state of the art of VR hardware is at the moment. If this is for an art piece which should potentially be enjoyable by many, I can’t put too many restrictions on the hardware, I’m afraid, and 2. this is a hand-me-down I got from work and I’d need a stronger case than Theora to convince them to buy me a Quest 3, considering VR isn’t our main focus and especially if they can make a case for me to use another engine where I can use mpeg. I’m investigating Unreal at the moment but it’s a nightmare on my Mac. I’ll have to re-evaluate on Windows once I get back into the office. I’d rather avoid Unity like the plague.

OK so, here’s an interesting one. I was having a look at my project running on the Quest 1, trying to disable things and undo non-default changes I might have made, keeping an eye on the profiler, because that Frame Time > Process Time sitting at 260 ms with nothing in my scripts didn’t quite make sense, especially after disabling the Theora videos and removing them from the scene altogether.

Well, I came to the spotlights that fake the GI and lo and behold, turning the SpotLight3Ds off dropped that Process Time down to 40~60 ms which is still a bit higher than I’d like for an essentially empty scene, but way more reasonable.

So I replace the spotlights with omnilights and the performance hit comes back, but to a much lesser extent: now I have about 100 ms instead of 260.

Does this make sense? I though spotlights were relatively easy to do even on hardware older than 2019…