Data-oriented design, Object-oriented programming, and GDScript

I’ve become a bit more obsessed with performance lately, and since there is no thread dedicated to this topic in regards to GDScript currently I’ve made a thread here for it.

This thread will be a place where people can posts their benchmarks related to data-oriented design (vs, and, or) object-oriented programming principles. The benchmarks will be simple and focused on one thing at a time.
The goal of this thread is to uncover many simple ways which when combined and used throughout a project will improve performance.

Each benchmark posted will get linked in the table below so it’ll be easy to look them up between any eventual discussions that will occur in the thread.

3 Likes

Reserving this for the future, in case there is a post text limit.

This post was intended to benchmark the performance difference between pulling a member variable that represents a ‘state’ of an object from memory and using register-register operators to get the state at run-time. The benchmark was flawed however and instead it has been converted to measure only the performance impact of getting a state from pulling a member variable from memory vs getting a state during runtime by checking a related member variable which would imply the state under some condition. To see the original post, you can click the orange pencil and browse the history, however the old benchmark will be redone some other way eventually.

To infer a state at runtime or use a state member variable

This benchmark has two classes which are identical, differing only in that the class Ant has a bool to store a state whether it is dead or not, - which the class Termite do not have. The termite objects must infer this state by comparing its health member variable to a value which infers the state of the bool at runtime.

  • In benchmark_1, it is tested how long it will take to fetch the bool stored in an Ant object an n amount of times.
  • In benchmark_2, it is tested how long it will take to calculate its supposed state by checking if its health is equal to 0 an n amount of times.

In addition to the benchmark being run 100 times, the scene itself is also run 10 times to find an average between those runs.

Benchmark run 1

Settings:
Benchmark cycles: 100,
Array length: 10_000

Results:

  1. benchmark_1: 164, benchmark_2: 167, difference:-3
  2. benchmark_1: 159, benchmark_2: 171, difference:-12
  3. benchmark_1: 132, benchmark_2: 153, difference:-21
  4. benchmark_1: 162, benchmark_2: 176, difference:-14
  5. benchmark_1: 148, benchmark_2: 142, difference:6
  6. benchmark_1: 124, benchmark_2: 130, difference:-6
  7. benchmark_1: 129, benchmark_2: 137, difference:-8
  8. benchmark_1: 163, benchmark_2: 169, difference:-6
  9. benchmark_1: 171, benchmark_2: 166, difference:5,
  10. benchmark_1: 180, benchmark_2: 193, difference:-13

Total difference: -72

Conclusion:
Benchmark_1 is on average 7ms faster than benchmark_2.
The result shows that looking up the bool in memory is some times faster and some times slower than inferring it at runtime. That the tests favors benchmark 1 in this case could be by chance.

Benchmark run 2

Settings:
Benchmark cycles: 100,
Array length: 100_000

Extreme example. This benchmark takes 5 seconds to run. At this point you are not getting frames per second anymore anyway.

Benchmark cycle:

  1. benchmark_1: 2643, benchmark_2: 2887, difference:-244
  2. benchmark_1: 2614, benchmark_2: 2912, difference:-298
  3. benchmark_1: 2628, benchmark_2: 2900, difference:-272
  4. benchmark_1: 2633, benchmark_2: 2946, difference:-313
  5. benchmark_1: 2660, benchmark_2: 2921, difference:-261

Total difference: -1388ms
Conclusion: Benchmark_1 is 277ms faster on average.

Conclusion

The results show that fetching a bool from memory is some time a bit faster and some time a bit slower than using register-register operators at runtime to infer the same state. This could indicate that classes may get away with having less member variables in some cases if desired without having a very noticeable impact on performance.

Code

extends Node2D

# This benchmark measures the performance difference between storing a state as a 
# bool in memory 
# vs 
# using math to calculate it at runtime instead

@export var benchmark_cycles: int = 100
@export var length: int = 1_000

var benchmark_1_results: Dictionary
var benchmark_2_results: Dictionary
var ants: Array[Ant]
var termites: Array[Termite]
var results: Dictionary


func _ready() -> void:
	for i in length: 
		ants.append(Ant.new())
	for i in length: 
		termites.append(Termite.new())

	for i in benchmark_cycles: 
		benchmark_1_results[i] = benchmark_1(i)
		benchmark_2_results[i] = benchmark_2(i)

	measure_results()


func measure_results() -> void:
	var sum_1: float = 0.0
	var sum_2: float = 0.0
	
	for key in benchmark_1_results:
		sum_1 += benchmark_1_results[key]
	for key in benchmark_2_results:
		sum_2 += benchmark_2_results[key]
	
	print("benchmark_1: " + str(sum_1))
	print("benchmark_2: " + str(sum_2))
	print("difference:" + str(sum_1 - sum_2))


func benchmark_1(cycle: int) -> float: 
	var time_start_benchmark: float = Time.get_ticks_msec()
	for ant in ants:
		var task: bool = ant.dead
	return Time.get_ticks_msec() - time_start_benchmark


func benchmark_2(cycle: int) -> float: 
	var time_start_benchmark: float = Time.get_ticks_msec()
	for termite in termites:
		var task: bool = termite.health == 0
	return Time.get_ticks_msec() - time_start_benchmark


class Ant:
	var health: int = 0
	var dead: bool = true


class Termite:
	var health: int = 0

1 Like

The results show that fetching from memory is faster than using register-register operators.

I am out of my depth here for certain, but surely this cannot be true.

These results are a bit unexpected

That is an understatement to be sure! I will come back to this later when I have more time but just wanted to say that surely there must be an error here somewhere. Surely. I will see if I can find one when I have time, but to me this just seems wrong.

It could be that the compiler is doing something far cleverer than imagined, or that the operation is just too simple and not realistic for comparison. I mean over 5s operation with just 277ms difference suggests to me that whatever is going on under the hood is not a register-register vs RAM comparison at all.

Interesting thread though, look forward to seeing more.

1 Like

Ah, turns out I did ‘a stupid’. I am fetching from memory in both cases with ant.dead in benchmark_1 and termite.health in benchmark_2. Well, I’ll need to redo the test in another way.

Edit:
I’ve repurposed the original post now to just be a benchmark on how much of a performance loss it could or could not be to infer a state at runtime rather than storing a state as a member variable.
Not a very useful benchmark, but at least I don’t have to throw it all out. The original post is accessible by clicking the orange pencil if anyone wants to look at it.

1 Like

Edit1: Removed some unused lines of code

Array of Classes vs Class of Arrays

What is this?

The title would in other cases be Array of Structs vs Struct of Arrays but since GDscript don’t have structs I use class in this benchmark. But as I write this, I realize using a Resource would probably be closer to a struct and probably faster too, - but I’ll leave that benchmark for another time as this benchmark could still be useful for those who needs to use classes over resources for any reason.

There are some examples of this online if you want to read or watch some videos on this topic, but to give a brief overview of this concept:
Instead of having an array holding large objects which consists of many members that needs to be looked through in memory even though you only want to extract one member variable from that object, - you create multiple arrays where each array element points to the same individual member variable (or individual components of member variables) across many objects, - for instance the Position property of multiple Node2D scene objects. The main benefit of this concept is speed, and it is particularly useful when batching many objects in a for loop or similar.

In this example I use arrays that points to the properties of the objects, rather than breaking down the data further for simplicity sake. There may or may not be further benefits of taking this further.

Benchmark

Overview

This benchmark instantiates scene node objects in the amount of the property ‘length’. Benchmark_1 goes through an array ant_objects which holds Node2D objects and adjusts each object’s position.y. Benchmark_2 goes through an array objects within a Termite object which holds the position properties of many Node2D objects and adjusts each property’s (position.)y element.

I’ve tried keeping the syntax similar in benchmark_1 and benchmark_2, but there is another for loop commented out if you want to switch it and see if there is any difference yourself.

Settings

Benchmark cycles: 100
Array length: 1000

In other words, there are 1000 objects in each array that gets processed each time benchmark_1 and _2 runs, - and this is repeated 100 times. The results shown in ms is the sum of running benchmark_1 and benchmark_2 100 times, so the actual processing time is somewhere around 100 times shorter.

Conclusion

The benchmark is run 10 times to find an average.

  1. benchmark_1: 27, benchmark_2: 8, difference:19
  2. benchmark_1: 25, benchmark_2: 10, difference:15
  3. benchmark_1: 29, benchmark_2: 7, difference:22
  4. benchmark_1: 27, benchmark_2: 10, difference:17
  5. benchmark_1: 32, benchmark_2: 7, difference:25
  6. benchmark_1: 28, benchmark_2: 8, difference:20
  7. benchmark_1: 29, benchmark_2: 9, difference:20
  8. benchmark_1: 28, benchmark_2: 10, difference:18
  9. benchmark_1: 31, benchmark_2: 6, difference:25
  10. benchmark_1: 30, benchmark_2: 8, difference:22

(sum of difference) / amount of benchmark runs =
203 / 10 = 20.3ms.

In summary, the class of array benchmark is somewhere around 3 times faster than the array of classes.

Code

extends Node2D

@export var benchmark_cycles: int = 100
@export var length: int = 1_000

var benchmark_1_results: Dictionary
var benchmark_2_results: Dictionary
var termites: Termites
var node_termites: Node2D
var node_ants: Node2D
var ant_objects: Array


func _ready() -> void:
	termites = Termites.new()
	node_termites = Node2D.new()
	node_ants = Node2D.new()
	add_child(node_termites)
	add_child(node_ants)

	for i in length: 
		var ant: Node2D = Node2D.new()
		node_ants.add_child(ant)
	ant_objects = node_ants.get_children()

	for i in length: 
		var termite: Node2D = Node2D.new()
		node_termites.add_child(termite)
		termites.positions.append(termite.position)
	termites.number_of_termites = length

	for i in benchmark_cycles: 
		benchmark_1_results[i] = benchmark_1()
		benchmark_2_results[i] = benchmark_2()

	measure_results()


func measure_results() -> void:
	var sum_1: float = 0.0
	var sum_2: float = 0.0
	
	for key in benchmark_1_results:
		sum_1 += benchmark_1_results[key]
	for key in benchmark_2_results:
		sum_2 += benchmark_2_results[key]
	
	print("benchmark_1: " + str(sum_1))
	print("benchmark_2: " + str(sum_2))
	print("difference:" + str(sum_1 - sum_2))


func benchmark_1() -> float: 
	var time_start_benchmark: float = Time.get_ticks_msec()
	#for ant in ant_objects:
		#ant.position.y = 10
	for i in ant_objects.size():
		ant_objects[i].position.y = 10
	return Time.get_ticks_msec() - time_start_benchmark


func benchmark_2() -> float: 
	var time_start_benchmark: float = Time.get_ticks_msec()
	for i in termites.number_of_termites: 
		termites.positions[i].y = 20
	return Time.get_ticks_msec() - time_start_benchmark


class Termites:
	var positions: Array[Vector2]
	var number_of_termites: int

2 Likes

Function object call vs calling member function

What is this?

With GDScript you can store functions as Callable objects into variables. It seems like there could be performance benefits to call function directly in variables than calling the member function of a object. This benchmark checks that.

Benchmark

Overview

This benchmark instantiates scene node objects in the amount of the property ‘length’. Benchmark_1 loops though each of those objects and calls their perform() method. Benchmark_2 loops through an array holding Callables to the same perform() methods and calls them.

Settings

Benchmark cycles: 100
Array length: 1000

There are 1000 objects and the benchmark_1 & _2 are repeated 100 times. The results shown in ms is the sum of running benchmark_1 and benchmark_2 those 100 times.

Results

The benchmark itself is run 10 times to find an average.

benchmark_1: 39.883ms, benchmark_2: 40.504ms, difference: -0.621ms
benchmark_1: 41.367ms, benchmark_2: 41.948ms, difference: -0.581ms
benchmark_1: 40.771ms, benchmark_2: 41.087ms, difference: -0.316ms
benchmark_1: 39.968ms, benchmark_2: 40.44ms, difference: -0.47199999999999ms
benchmark_1: 40.945ms, benchmark_2: 41.703ms, difference: -0.758ms
benchmark_1: 41.01ms, benchmark_2: 41.483ms, difference: -0.473ms
benchmark_1: 41.04ms, benchmark_2: 41.68ms, difference: -0.64ms
benchmark_1: 40.736ms, benchmark_2: 41.348ms, difference: -0.612ms
benchmark_1: 41.434ms, benchmark_2: 41.906ms, difference: -0.472ms
benchmark_1: 41.772ms, benchmark_2: 42.265ms, difference: -0.493ms

(sum of difference) / (amount of benchmark runs) = average.
-5.4ms / 10 = 0.54ms.

Conclusion

In this particular benchmark, calling the method through the object itself is faster than the same method stored in a variable, but not by a lot. Storing a function call as a variable and calling it like in benchmark_2 may have syntax benefits in some scenarios. In most examples it probably won’t make a noticeable difference performance wise.

Code

extends Node2D

@export var benchmark_cycles: int = 100
@export var length: int = 1_000

class Animal:
	var identifier: String
	var name: String
	var favorite_word: String
	var fear: int = 0
	var happiness: int = 0
	var age: int = 0
	var tiredness: int = 0
	var health: int = 0
	var stamina: int = 0 
	var hunger: int = 0 
	func perform() -> bool:
		return true

class Dog extends Animal:
	var need_to_poo: int = 0

class EvilDog extends Dog:
	var lust_for_human_blood: int = 0

var beings: Array[Animal]
var performs: Array[Callable]

var benchmark_1_results: Dictionary
var benchmark_2_results: Dictionary
var results: Dictionary

func _ready() -> void:
	for i in length: 
		beings.append(EvilDog.new())

	for being in beings:
		performs.append(being.perform)

	for i in benchmark_cycles: 
		benchmark_1_results[i] = benchmark_1()
		benchmark_2_results[i] = benchmark_2()

	measure_results()


func benchmark_1() -> float:
	var time_start_benchmark: float = Time.get_ticks_usec()
	for being: Animal in beings: 
		being.perform()
	return Time.get_ticks_usec() - time_start_benchmark


func benchmark_2() -> float:
	var time_start_benchmark: float = Time.get_ticks_usec()
	for perform: Callable in performs: 
		perform.call()
	return Time.get_ticks_usec() - time_start_benchmark


func measure_results() -> void:
	var sum_1: float = 0.0
	var sum_2: float = 0.0
	
	for key in benchmark_1_results:
		sum_1 += benchmark_1_results[key]
	for key in benchmark_2_results:
		sum_2 += benchmark_2_results[key]
	
	sum_1 = sum_1 / 1000
	sum_2 = sum_2 / 1000
	
	print("benchmark_1: " + str(sum_1) + "ms")
	print("benchmark_2: " + str(sum_2) + "ms") 
	print("difference: " + str(sum_1 - sum_2) + "ms")

1 Like

Pop_back (swapback-esque) vs remove_at for arrays

What is this?

Arrays get slower to manipulate the bigger they are. In scenarios where an element must be removed from its array and the array needs to have its other items shifted down can get the cpu stuck at that task for some time. In cases where the order of the array elements do not matter, a swapback can be used. A swapback takes the item at the back of the array and puts it in the position of the element which is about to be removed, and then removes the array element at the back of the array. Thus, the array do not need to shift its elements, it only needs to move an element and remove a slot at the back. This should have performance benefits, and this benchmark tests that.

The array class in GDScript to not have a swapback method, but a swapback can be mimicked with a pop_back. It is possible that benchmark_2 could have been faster with a proper swapback method.

Benchmark

Overview

This benchmark instantiates scene node objects in the amount of the property ‘length’ into the array array and swapback_array. Benchmark_1 and Benchmark_2 runs the amount of times of benchmark_cycles and stores the nanoseconds into a dictionary so the sum of the runs can be found.

Settings

Benchmark cycles: 100
Array length: 1000

There are 1000 objects and the benchmark_1 & _2 are repeated 100 times. The results shown in ms is the sum of running benchmark_1 and benchmark_2 those 100 times.

Results

The benchmark itself is run 10 times to find an average.

benchmark_1: 0.072ms, benchmark_2: 0.016ms, difference: 0.056ms
benchmark_1: 0.096ms, benchmark_2: 0.013ms, difference: 0.083ms
benchmark_1: 0.081ms, benchmark_2: 0.013ms, difference: 0.068ms
benchmark_1: 0.08ms, benchmark_2: 0.008ms, difference: 0.072ms
benchmark_1: 0.093ms, benchmark_2: 0.011ms, difference: 0.082ms
benchmark_1: 0.076ms, benchmark_2: 0.011ms, difference: 0.065ms
benchmark_1: 0.081ms, benchmark_2: 0.013ms, difference: 0.068ms
benchmark_1: 0.081ms, benchmark_2: 0.004ms, difference: 0.077ms
benchmark_1: 0.081ms, benchmark_2: 0.004ms, difference: 0.077ms
benchmark_1: 0.081ms, benchmark_2: 0.012ms, difference: 0.069ms

(sum of difference) / (amount of benchmark runs) = average.
0.717ms / 10 = 0.0717ms.

Conclusion

In scenarios where the order of the array do not matter, using a swapback in an array of 1000 elements has between a 6 to 10 times performance benefit.

Code

extends Node2D

@export var benchmark_cycles: int = 100
@export var length: int = 1_000

var benchmark_1_results: Dictionary
var benchmark_2_results: Dictionary
var results: Dictionary
var array: Array
var swapback_array: Array


func _ready() -> void:
	for i in length:
		array.append(0)
		swapback_array.append(0)
	for i in benchmark_cycles: 
		benchmark_1_results[i] = benchmark_1(i)
		benchmark_2_results[i] = benchmark_2(i)

	measure_results()


func measure_results() -> void:
	var sum_1: float = 0.0
	var sum_2: float = 0.0
	
	for key in benchmark_1_results:
		sum_1 += benchmark_1_results[key]
	for key in benchmark_2_results:
		sum_2 += benchmark_2_results[key]
	
	sum_1 = sum_1 / 1000
	sum_2 = sum_2 / 1000
	
	print("benchmark_1: " + str(sum_1) + "ms")
	print("benchmark_2: " + str(sum_2) + "ms") 
	print("difference: " + str(sum_1 - sum_2) + "ms")


func benchmark_1(i) -> float: 
	var time_start_benchmark: float = Time.get_ticks_usec()
	array.remove_at(i)
	return Time.get_ticks_usec() - time_start_benchmark


func benchmark_2(i) -> float: 
	var time_start_benchmark: float = Time.get_ticks_usec()
	swapback_array[i] = swapback_array.pop_back()
	return Time.get_ticks_usec() - time_start_benchmark

2 Likes