Optimize Node3D using Top Level property

The problem

Node3D has a property called top_level. False by default, it prevents nodes from inheriting their parent transformation. Usually, this property is not needed nor wanted; a common composition pattern of 3D objects is by nesting 3D nodes that have to move along together.

image

Internally, every time the top level node (Character3D) transform changes, all children nodes have to change their transform too in relation to their parent, and their parent, and their parent… As you nest nodes, this becomes a large addition of matrixes that can get really taxing.

What do we do about it

Tying 3D nodes together does not require inheritance. Usually, you just need children nodes to share the same exact global_transformof their parent. There’s no need for inheritance, because that becomes an addition of matrixes, even if a trivial one. What you’re actually looking for is just directly setting their transform as the same of their parent.

Update the bare minimum. For symmetrical or fix shapes, transforming the rotation may not be necessary. If rotation is only a visual necessity, just rotate the mesh node without triggering the whole tree. This becomes relevant in multiplayer games.

Be clever about node structuring. In the above image, the result is the same as in the bottom image. The difference is that in the former, the mesh node is performing an addition of 3 matrixes instead of 2 as in the latest.

image

Also make sure to set scene roots as top level if you organize your SceneTree in a similar way to this: you know the World node is not going to move, but their transform is being added to the others.

Don’t transform twice. It is common practice to offset transformations after their transformation has already been set by built-in processes. If you know you’re going to end up transforming a node, as it could be the case for cameras or meshes, directly transform and omit default transformations.

Use less 3D nodes. Godot’s node philosophy is unavoidable and pleasant to work with, but (in my opinion) designed for 2D, segmented & predictable games. Use less 3D nodes anytime you are given the chance and it does not compromise on ease of use.


Source

I was scavenging the internet looking after optimization techniques for 3D Godot and came across this blog post by Dre Dyson. Although the blog is catered towards network costs, it’s still useful advice for multiplayer games, even if P2P, and CPU savings.

1 Like

This is an interesting theory, but do you have any numbers to back it up? Moving the root node of a Node2D or Node3D scene causes issues anyway, so one would not do that. So what performance benefits are you seeing?

2 Likes

Unfortunately, I do not have any data of my own as this is mostly a transcription of dre’s blog, which does offer its own data. As it can be noted by the last subjective point, I’ve cultivated a skepticism for Godot’s nodes philosophy (not for nodes themselves), so this might be biased.

The article claims a ~35% reduction of “compute time” after applying methods mentioned in this post. You can clearly see this is the data for a server authoritative game hosted on AWS, which makes every cut ms important. Although it might not be as important for a more local game, it can still prove useful, specially if you’re still working with multiplayer or complex scenarios.

I would love to share data I can gather as soon as I do, but it won’t really happen until I get a fully fleshed out game done. I’ve got the feeling this is not about “sheer numbers” as much as it is about how entangled things end up getting.

1 Like

I’m gonna be honest. I don’t trust that blog. It’s full of intrusive ads, which is a cash grab. If they were really consulting on games, they shouldn’t need the cash. None of those numbers are backed up in the article.

Also, the formatting was wonky. It was an unformatted Markup Document. Linked-In tells us he’s unemployed. GitHub tells us he got his Master’s in computer science 2 years ago. Which translates to virtually no practical experience.

I’m not saying that he’s wrong, just that there’s enough suspicious stuff there for me to not take unsupported data at face value.

Having said that, I do agree that the Rig on a 3D character should be rotating independently of the CharacterBody3D it sits under. But that’s a basic game design thing that should always be done. It’s the be-all-end all of network optimization.

2 Likes

Could you elaborate on the benefits of doing this? I currently rotate the CharacterBody3D as it made implementating root motion easier. Now I’m wondering if I stumbled across a foot gun.

So here’s a setup for a CharacterBody3D.

In this screen shot, the node labeled Root is what I would typically call the Rig node. (The node name was tied to the animations, and I didn’t want to deal with that, hence the name.) It’s facing is the only thing that matters. But keep in mind, that the only thing you’re doing is rotating it.

Position is still managed by the ChracterBody3D. So, if you separate out just the rotation, you could still move the CharacterBody3D and your root motion should be fine.

Indeed.

That’s the solution that they get to, yes. The blog talks about consulting the issues of an existing game with a, presumably, bad architecture. In that sense, it is reasonable if a more experienced developer already knew about this.

My main objective with this post, as stated in the title, was to bring light to the top level property and its relevance regarding how transformations are calculated. I unassumingly turned it into a 3D node optimization post when it should have stayed reserved to this property that, I think, can easily be missed.

I may have grown tired lately of squishing my game and got too entangled with my own ideas :sweat_smile:

2 Likes

But why is rotating Root is preferred to rotating the CharacterBody3D? How does it improve Network performance?

The improvement comes from the fact that rotating the CharacterBody3D forces all children to also rotate, leading to many calculations. Oftentimes, rotation does not affect how a body interacts with the world. Think of capsules, commonly used for characters: if you only rotate the Y axis, nothing really changes.

image

Godot_v4.6.2-stable_win64_bSkcXoZ86F

Collisions may not change, but you want your visuals to reflect your character turning. Instead of also rotating the collision shape, which is useless, just rotate the mesh, saving on two calculations (for Character3D and CharacterShape)

image

Godot_v4.6.2-stable_win64_JeEAWXjxaz


As for the network gains, it really depends on your architecture. There’s some bad phrasing here: it’s not “network” performance as in ping or bandwidth but “cloud” performance as in computation time and costs. Regardless, it can still be useful for P2P or offline games. Sometimes, position may change without rotation doing so, or vice versa, and you can save on that data.

Again, I wanted to make this post to rise awareness about a fact of working with matrixes (Transform3D). You have to understand the system and be clever about it, it’s not a general optimization.

1 Like

I think these are hardly going to be a bottleneck in most games, if your game is CPU bound then its probably something else rather than a simple node tree. Godot easily processes hundreds of skeletal animations per frame and they would be much more intense than the depth of most Node trees.

Even the deeper node trees are just chaining matrix multiplications in C++, thats simple, and hardly a problem for a modern multi-threaded processor. Honestly the overhead is negligable even on older hardware, or things like the Pi Zero that can run Minecraft -like graphics. Even if it was a problem, i would say the convenience of the math of the hierarchy and the tidiness of the scene is a good tradeoff for performance.

About the characters - i just wonder whether to implement a different controller lile a rigid body or colliding with skeleton bounding bones like for a ragdoll. That would help with situations where the arm goes through the wall, but im sure its possible with a bit of focus and concentration.

5 Likes

Gotcha that makes sense. Guess I’ll add reworking my rotations to the back log :sweat_smile:.

I had a look at the article, and theres some code worth mentioning … . I am not a great expert but as far as I can see the Node3D is added to the ClassDB in ‘register_scene_types.cpp’ and this allows the gdscript to use the Node3d type.

godot/scene/register_scene_types.cpp at master · godotengine/godot · GitHub godot/scene/register_scene_types.cpp at master · godotengine/godot · GitHub

So the code for the Node3D is in the c++ class.

godot/scene/3d/node_3d.cpp at master · godotengine/godot · GitHub godot/scene/3d/node_3d.cpp at master · godotengine/godot · GitHub

So the article claims that all the child nodes must be updated because the global position has changed. Here is the comment at the top of Node3D.cpp.

/*

 possible algorithms:

 Algorithm 1: (current)

 definition of invalidation: global is invalid

 1) If a node sets a LOCAL, it produces an invalidation of everything above
 .  a) If above is invalid, don't keep invalidating upwards
 2) If a node sets a GLOBAL, it is converted to LOCAL (and forces validation of everything pending below)

 drawback: setting/reading globals is useful and used very often, and using affine inverses is slow

---

 Algorithm 2: (no longer current)

 definition of invalidation: NONE dirty, LOCAL dirty, GLOBAL dirty

 1) If a node sets a LOCAL, it must climb the tree and set it as GLOBAL dirty
 .  a) marking GLOBALs as dirty up all the tree must be done always
 2) If a node sets a GLOBAL, it marks local as dirty, and that's all?

 //is clearing the dirty state correct in this case?

 drawback: setting a local down the tree forces many tree walks often

--

future: no idea

 */

The comment says if the Node3D sets a GLOBAL its converted to LOCAL and forces VALIDATION of everything PENDING below.

And conversion to local uses the Affine inverse.

Vector3 Node3D::to_local(Vector3 p_global) const {
	ERR_READ_THREAD_GUARD_V(Vector3());
	return get_global_transform().affine_inverse().xform(p_global);
}

And for example, rotate_x actually sets the transform….

void Node3D::rotate_x(real_t p_angle) {
	ERR_THREAD_GUARD;
	Transform3D t = get_transform();
	t.basis.rotate(Vector3(1, 0, 0), p_angle);
	set_transform(t);
}

So the functuon set_global_transform() will convert to a local …

void Node3D::set_global_transform(const Transform3D &p_transform) {
	ERR_THREAD_GUARD;
	Transform3D xform = (data.parent && !data.top_level)
			? data.parent->get_global_transform().affine_inverse() * p_transform
			: p_transform;

	set_transform(xform);
}

And set transform …

void Node3D::set_transform(const Transform3D &p_transform) {
	ERR_THREAD_GUARD;
	data.local_transform = p_transform;
	_replace_dirty_mask(DIRTY_EULER_ROTATION_AND_SCALE); // Make rot/scale dirty.

	_propagate_transform_changed(this);
	if (data.notify_local_transform) {
		notification(NOTIFICATION_LOCAL_TRANSFORM_CHANGED);
	}
	fti_notify_node_changed();
}

Sets flags and calls to propogate the changes ….

void Node3D::_propagate_transform_changed(Node3D *p_origin) {
	if (!is_inside_tree()) {
		return;
	}

	for (uint32_t n = 0; n < data.node3d_children.size(); n++) {
		Node3D *s = data.node3d_children[n];

		// Don't propagate to a toplevel.
		if (!s->data.top_level) {
			s->_propagate_transform_changed(p_origin);
		}
	}....//

so i cut the rest because its a long read.

basically each child node calls the function and avoids top_levels

#ifdef TOOLS_ENABLED
	if ((!data.gizmos.is_empty() || data.notify_transform) && !data.ignore_notification && !xform_change.in_list()) {
#else
	if (data.notify_transform && !data.ignore_notification && !xform_change.in_list()) {
#endif
		// SceneTree::xform_change_list is not thread safe to modify, and is read by the main thread when processings are done.
		if (Thread::is_main_thread()) {
			get_tree()->xform_change_list.add(&xform_change);
		} else {
			// For any threaded-processed node, add it to xform_change_list on the main thread in a deferred manner.
			callable_mp(this, &Node3D::_propagate_transform_changed_deferred).call_deferred();
		}
	}
	_set_dirty_bits(DIRTY_GLOBAL_TRANSFORM | DIRTY_GLOBAL_INTERPOLATED_TRANSFORM);
}

It just adds the xform change to a list. The list is only mentioned 4 times in scene_tree.cpp and it appears to be used at the beginning of process() with

void SceneTree::flush_transform_notifications() {
	_THREAD_SAFE_METHOD_

	SelfList<Node> *n = xform_change_list.first();
	while (n) {
		Node *node = n->self();
		SelfList<Node> *nx = n->next();
		xform_change_list.remove(n);
		n = nx;
		node->notification(NOTIFICATION_TRANSFORM_CHANGED);
	}
}

4 Likes

Worth noting too that get_transform() performs the calculation if the flags have been set …

Transform3D Node3D::get_transform() const {
	ERR_READ_THREAD_GUARD_V(Transform3D());
	if (_test_dirty_bits(DIRTY_LOCAL_TRANSFORM)) {
		// This update can happen if needed over multiple threads.
		_update_local_transform();
	}

	return data.local_transform;
}

And in global transform …

Transform3D Node3D::get_global_transform() const {
	ERR_FAIL_COND_V(!is_inside_tree(), Transform3D());

	/* Due to how threads work at scene level, while this global transform won't be able to be changed from outside a thread,
	 * it is possible that multiple threads can access it while it's dirty from previous work. Due to this, we must ensure that
	 * the dirty/update process is thread safe by utilizing atomic copies.
	 */

	uint32_t dirty = _read_dirty_mask();
	if (dirty & DIRTY_GLOBAL_TRANSFORM) {
		if (dirty & DIRTY_LOCAL_TRANSFORM) {
			_update_local_transform(); // Update local transform atomically.
		}

		Transform3D new_global;
		if (data.parent && !data.top_level) {
			new_global = data.parent->get_global_transform() * data.local_transform;
		} else {
			new_global = data.local_transform;
		}

		if (data.disable_scale) {
			new_global.basis.orthonormalize();
		}

		data.global_transform = new_global;
		_clear_dirty_bits(DIRTY_GLOBAL_TRANSFORM);
	}

	return data.global_transform;
}

The code checks for changes in the flag with DIRTY_GLOBAL_TRANSFORMand that was not in set_transform(….) Above, so quickly going to Node3D.h reveals a large comment that explains the situation ….

// For the sake of ease of use, Node3D can operate with Transforms (Basis+Origin), Quaternion/Scale and Euler Rotation/Scale.
	// Transform and Quaternion are stored in data.local_transform Basis (so quaternion is not really stored, but converted back/forth from 3x3 matrix on demand).
	// Euler needs to be kept separate because converting to Basis and back may result in a different vector (which is troublesome for users
	// editing in the inspector, not only because of the numerical precision loss but because they expect these rotations to be consistent, or support
	// "redundant" rotations for animation interpolation, like going from 0 to 720 degrees).
	//
	// As such, the system works in a way where if the local transform is set (via transform/basis/quaternion), the EULER rotation and scale becomes dirty.
	// It will remain dirty until reading back is attempted (for performance reasons). Likewise, if the Euler rotation scale are set, the local transform
	// will become dirty (and again, will not become valid again until read).
	//
	// All this is transparent from outside the Node3D API, which allows everything to works by calling these functions in exchange.
	//
	// Additionally, setting either transform, quaternion, Euler rotation or scale makes the global transform dirty, which will be updated when read again.
	//
	// NOTE: Again, RotationEditMode is _independent_ of this mechanism, it is only meant to expose the right set of properties for editing (editor) and saving
	// (to scene, in order to keep the same values and avoid data loss on conversions). It has zero influence in the logic described above.
	enum TransformDirty {
		DIRTY_NONE = 0,
		DIRTY_EULER_ROTATION_AND_SCALE = 1,
		DIRTY_LOCAL_TRANSFORM = 2,
		DIRTY_GLOBAL_TRANSFORM = 4,
		DIRTY_GLOBAL_INTERPOLATED_TRANSFORM = 8,
	};

So requesting the rotations after using set_transform causes the update.

The Bloggers problem on the web server was probably a large flood of entries in the xform_changed_list that gets applied (assigned) at the start of each call to process() in the function flush_transform_notifications()

(i guess that is where SSE etc would be handy, or, maybe the webserver needs many more threads).

2 Likes

Thank you @pizza_delivery_man for delivering with your dive in the source code. It was an assumption of mine that maybe someone had already accounted for this, and indeed. Godot really seems to have put care in how the tree is traversed to make nodes efficient.

Although your suggestion is plausible, it could also be that this whole issue has been reframed by the blogger as a completely different thing? It still makes sense for me that this does not prevent other processes from being affected. top_level may actually make a difference, not because of expensive intrinsic calculations but because the transformations get accounted by the physics process or network synchronization if the setup allows for it.

I get the feeling this may be more situational and not general in the sense performance gains actually occur under specific conditions, like players not moving but rotating to signal, shoot, interact… If the performance measure are AWS costs it’s easy to see how this may be no benchmark and more of an overall improvements showcase based on real sessions. All of it which raises, effectively, the question of any of this is being impactful for most games.

Yeah its worth checking out but i thibk maybe the Blogger’s server is the wrong type … i dont want to repeat the article

In the above article the section on ‘types’ lists some pros and cons. The type of server for a large online is likely a dedicated server, so how was that handled?

The Blogger says calculating the transforms of all 300 + players makes a difference, so clearly the server was running the game and updating all the clients - i.e. are the clients proving input then requesting transform information from the server ? If all 300 player positions has to go through the server which compute all the transforms then it does sound a bit slow … what do the clients do with the stream of position and rotation information that comes back? I am guessing its the same as what the server does. The answer could be that the server should instead process positions and euler vectors (2 vectors of data from each) from the client to check interactions using a different process … then the client should just set the transform of the parent Node, the child nodes get updated automatically anyway before render. Why would they try to process something that doesnt work on a local simulation or test?

The story is slightly inconsisent, the Blog says the server had the problem but clients have to update the characters too. For larger masses of transforms, wikipedia says UDP is often used, in strategy games for example, then of course updating a flyweight representation of the enemy characters makes real sense.

2 Likes