Why is OpenSimplexNoise.get_noise_2d is so slow compared with native implementation?

:information_source: Attention Topic was automatically imported from the old Question2Answer platform.
:bust_in_silhouette: Asked By Jumer


I tried to generate an image manually from OpenSimplexNoisecalling OpenSimplexNoise.get_noise_2d and noticed that it was noticeably slower than using OpenSimplexNoise.get_image directly. Of course, some difference is expected because one happens directly in the C++ engine and my implementation on GDScript. But the difference is too much to be explained only by that fact.

I’ve checked the source code for OpenSimplexNoise.get_image works and is essentially my implementation (calling get_noise2dfor every pixel).

In this short snippet I compare the times of the two actions:

var noise = OpenSimplexNoise.new()

var t1 = OS.get_ticks_msec()

noise.get_image(1000, 1000)

var t2 = OS.get_ticks_msec()

for x in 1000:
	for y in 1000:
		noise.get_noise_2d(x, y)
var t3 = OS.get_ticks_msec()

print("Native Image: %d\nCreated Image: %d"%[t2-t1, t3-t2])

The output for my crappy laptop (but similar results on different machines) was:

Native Image: 212
Created Image: 533

Even without the cost of actually creating the image in the second case the time elapsed is in the order of 2.5 times more than the native image.

I need to generate the image in the second way for my project because I need the noise in very big coordinates and I would need to generate a prohibitively big image if I were to use the first method. So is there any workaround for this? Where is the loss of efficiency?

:bust_in_silhouette: Reply From: Stormfyre

I am a bit new to godot but I have been studying programming for years.
My best guess is that its function call overhead. try getting one line 1000 long 1000 times and see if there is an improvement.
Unless your size is truly massive the memory for 1 line should not be so bad.
The function call overhead will drop from big O of n^2 to just n. I have not seen the source but my guess is the native code is probably optimising out the native function’s overhead.
Sadly looking at the api there is no way to pick what line you get. Consider going native for time critical code.

:bust_in_silhouette: Reply From: omggomb

Maybe you can implement your own C++ get_image which only uses an Array or Pool*Array. A little more context would be helpful. What kind of size are we talking about? Do you need to create the noise once or continuously?

In the first example you cross the border between GDScript and C++ roughly two times. Once for calling get_image and once for getting the reference to the image back.

in the second example you cross that border 2 * 1000 * 1000 times.

Any time data has to be shoveled between languages, performance will suffer. It’s not about GDScript being slower, it’s about marshalling stuff between languages. That overhead will accumulate.

Yes, that is what I was thinking when I said that GDScript is slower, but didn’t quite explained myself.

The size is potentially infinite because I’m trying to generate a world dinamically depending on the position of the player (Minecraft-like) so using get_image is not an option.

Maybe I consider doing something in GDNative, should the comunitation problem be reduced there? I’ve never done it and I would have to read the documentation first.

Jumer | 2021-04-15 00:33

Can’t you just use a texture and treat it as repeating? Or maybe you can layer multiple textures on top of each other.

So get_noise(20000, 20000) becomes <image>.get_pixel(20000 % img_width, 20000 % img_height).

omggomb | 2021-04-17 07:06

:bust_in_silhouette: Reply From: Mario

I didn’t perform any actual benchmarking on this, but looking at the engine’s source code, it’s easy to make a few assumptions explaining this with both function call overhead as well as memory usage. If we look at the first few lines of the implementation for get_image(), we’ll see that the generated/passed image data is a (grayscale) image using 1 byte per pixel:

Ref<Image> OpenSimplexNoise::get_image(int p_width, int p_height) const {
    Vector<uint8_t> data;
    data.resize(p_width * p_height);

    uint8_t *wd8 = data.ptrw();

Doing the simple multiplication (and ignoring extra overhead from classes/references), this means we’ll be working with roughly 1 MB of data allocated in one go (this is oversimplification here, though). To actually fill the data, the C++ part will call the very same get_noise_2d() function a million times, too. However, the GDScript side will never see those 1 MB, only the reference (4 or 8 bytes based on architecture).

Now if you go with the million calls from GDScript you should expect a difference, even from memory alone, ignoring extra overhead from calling. get_noise_2d() returns a float. While the size of this datatype might change between implementations/platforms etc. you can generally assume that it’s nowadays at least 32 bits (or 4 bytes) long. This means we’re passing 4 bytes per “pixel” resulting in 4 MB of data.

So data wise – only talking about the information moving between the raw C++ part and the script runtime – we’re looking at a factor of 1 million (4 MB vs. 4 byte). As such, having an actual factor of around 2 isn’t that bad overall (mostly thanks to fast memory these days I guess).

But besides that, this totally sounds like a typical case of “you’re (probably) doing it wrong”. If you’re trying to generate a huge or nearly infinite terrain, you should definitely split it up into individual “chunks”, generating them on the fly and only while needed (e.g. close to the player). This will significantly reduce generation time, lower overall memory usage and improve your framerate, too.

If this doesn’t apply to you and you’re trying to do something else, I’m still sticking with “you’re (probably) doing it wrong”, but we’d need a few more details.

But get_image calls also 1000x1000 times to get_noise_2dwhich returns a float so that memory difference is not correct because happens on my GDScript snippet and in the source. But I reckon, as @omggomb pointed, that the communitation between native and GDScript is enough to explain the difference.

About the second question, I’m open for suggestion. To be fair I haven’t done much research on good techniques, just experimenting on my own. Currently it works like this: there are chunks that are generated and deleted dinamically depending on the position of the player. Each chunk has a position and a noise generator OpenSimplexNoise. But then if each chunk is 100 pixel size, the Chunk (0,0) should call OpenSimplexNoise.get_noise_2d(x,y) from (0,0) to (100, 100), but the Chunk (1, 0) from (100, 0, to (200, 100).

That’s the reason why I can’t use the OpenSimplexNoise.get_image function, because then the noise would just repeat again in every Chunk with the same noise properties. I could generate a 200x200 image and just use the part I’m interested, but this would grow very quick as I get away from the origin.

One workaround I’ve been thinking is to just make every Chunk has a different seed, generate the image by get_image and average the borders with the neighbours so it looks smooth on the edges of each chunk.

Jumer | 2021-04-15 00:30