Adapting MCTS algorithm to GDscript

Godot Version

4.2

Question

In this article, there is a MCTS (Monte Carlo Tree Search) implementation in python. I think I was able to adapt to GDscript, but I need help to get a better understanding of this particular method:

def best_child(self, c_param=0.1):
    
    choices_weights = [(c.q() / c.n()) + c_param * np.sqrt((2 * np.log(self.n()) / c.n())) for c in self.children]
    return self.children[np.argmax(choices_weights)]

Since I’m not used to python, I’m not sure if I was able to adapt it correctly to gdscript:

func get_best_child(exploration_parameter:float = 0.1) -> MCTSNode:
	var choices_weights:int
	for _c in children:
		var c:MCTSNode = _c
		choices_weights = (c.get_winning_score()/c.get_visits()) + exploration_parameter * sqrt((2 * log(get_visits()) / c.get_visits()))
	return children[max(choices_weights)]

Is that right? Is this method running a for loop with all children from children and then calculating whatever choices_weights would be? When I run the code, it gives me an error in the return, saying that max() requires 2 arguments. What would be a similar method in gdscript to argmax()?

NOTE: I changed the name of some methods to be more readable to me.

np.argmax documentation:

Returns the indices of the maximum values along an axis.

So that code:

def best_child(self, c_param=0.1):
    choices_weights = [(c.q() / c.n()) + c_param * np.sqrt((2 * np.log(self.n()) / c.n())) for c in self.children]
    return self.children[np.argmax(choices_weights)]

It computes the value of choice_weight for EACH item, and then it picks the children with the highest weight. (you may Google “python list comprehension” if you care about the details).

You’d have something like this in memory:

children = [A, B, C, D]
choices_weights = [0.5, 0.2, 0.8, 0.1]

np.argmax(choices_weights) => 2 (third item in the weights array)
children[np.arg....] => C

It seems like your code just keeps ONE weight (the latest), so calling max on that value fails, what would be the answer to “give me the max between 5”?

It’s a very pythonic piece of code, in Godot, to achieve the same result, it might make sense to write something like (pseudocode):

current_weight = -inf
current_children = null
for c in children:
    weight = (c.get_winning_score....)
    if weight > current_weight:
         current_children = c
         current_weight = weight
return c

Not that it should be critical on a first write, but it has the benefit of not allocating another array, and going through the children once instead of twice.

Seems like there was a discussion about implementing something similar to np.argmax, but I don’t think it went through.

1 Like

Oh, so choices_weights is actually an Array? That for loop is running like “inside” the array (within the brackets) and appending a value in each run? That’s why the python code is using that argmax method to get the higher value and use its index to get the child node?

1 Like

You got it! That construct is called a list comprehension in Python and some other languages.

1 Like

Great! Happy to learn a new thing today, thanks for the explanation! One more thing: in your pseudocode you use -INF, which is negative infinity, but I don’t know what’s the use on this case. Why use -INF? What effect does it do to the code?

1 Like

Happy to help :pray:

I’ll let you figure out an answer to your last question; you’ll learn a more important skill if you ask yourself “Which value should I use here?”

If your answer is not -INF and your code works, that’s a perfect solution; it just means you’re less lazy than I am :wink:

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.