Why is my recursive function messing up?

Godot Version

4.4

Question

hi, I think I’m very close to figuring out this dang parser but I’m having trouble with it branching down. My script is as follows and SHOULD place dictionaries within the dictionary of the previous choice, but right now its basically creating a new dictionary each line.
→ choice
\ttab
\ttab 2
\ttab 3
→ choice check same line
\ttabariono
\ttaby
\t → choice 2
\t \ttab 4
\t \ttab 5
\t → choice 3
\t \ttab 6
\t \ttab 7

^ script

Output:
8: [{ 8: [“-> choice”, [1]], 9: [“\ttab”, 0], 10: [“\ttab 2”, 0], 11: [“\ttab 3”, 0] }, [1]], 12: [{ 12: [“-> choice check same line”, [1]], 13: [“\ttabariono”, 0], 14: [“\ttaby”, 0], 15: [{ 15: [{ 15: [“\t → choice 2”, [1]], 16: [“\t\ttab 4”, 0], 17: [“\t\ttab 5”, 0] }, 1], 16: [{ 16: [“\t → choice 2”, [1]], 17: [“\t\ttab 5”, 0] }, 1], 17: [{ 17: [“\t → choice 2”, [1]] }, 1], 18: [{ 18: [“\t → choice 2”, [1]], 19: [“\t\ttab 6”, 0], 20: [“\t\ttab 7”, 0] }, 3], 19: [{ 19: [“\t → choice 2”, [1]], 20: [“\t\ttab 7”, 0] }, 3] }, 1] }, [1]],

bold is where the issue starts, it creates choice 2 again but with 16: as the key instead.

thank you so much for the help. I know could probably be saving myself a headache with JSON files but I’m new to coding well, anything and really want to learn how to do this for any future thing I create.

CODE:

func startNest(ID, choiceName, passedIndent) -> Dictionary:
	var nextLineNumber = ID + 1
	var nextText = allScript[ID + 1]
	var assignedLine = assignLineType(allScript[ID])
	var oldIndent: int = passedIndent
	var indent: int
	var text = choiceName
	var FINALRESULTS = {} #need to place the results in this
	var choName = choiceName #makes the name changable through itterations
	
	while nextText.contains("\t") == true: #while there is a possibility of the next tab having a \t
		#there cannot be a choice without a \t under it wheather its a dialog, jump, or detour IT NEEDS TO BE THERE
		#somthing is off with dialog creator you need to get the new indent somehow
		#if indent/ new indent is greater than or equal to passed in indent
		if text.contains("->") == true: #is it a choice?
			var regex = RegEx.new() #gets indents if its a choice
			regex.compile("\t")
			var result = regex.search(text)
			if result != null:
				var string = result.get_string()
				indent = string.length()
			else:
				indent = 0
			
			if indent == oldIndent:
				if indent == 0: #breaks the loop if its the first iteration of the choice - for the original create dict script
					FINALRESULTS = dialogChoice(ID, choName, indent)
					break
				else:
					FINALRESULTS[ID] = [dialogChoice(ID, choName, indent), assignedLineType]
					#update everthing INCLUDING ID, TEXT AND CHONAME NOT NEXT NEXT ONES !!!!!!!!!!
					oldIndent = indent
					choiceName = nextText
					ID = nextLineNumber
					nextLineNumber += 1
					nextText = allScript[nextLineNumber]
			elif indent > oldIndent:
				continue
			else:
				nextLineNumber += 1
				nextText = allScript[nextLineNumber]
				continue
				#continues checking for choices on same or greater level, if it find one on a less it skips it
				
		else: #if its not a choice ignore it and continue to next line
			nextLineNumber += 1
			nextText = allScript[nextLineNumber]
			continue
	return FINALRESULTS
		#looping over different function
		#create dialog branch > creat dialog > when it gets to another branch do the following
		#if tabs > older - start a new nest
		#if its = to older - return back to called function and finish loop- use returned branch as new information
		#if its < than tab back again with the new info
		#if its 0 then end the function and return the dictionary
		
func dialogChoice(ID: int, choiceName, passedIndent: int) -> Dictionary:
	var nextLineNumber: int = ID + 1
	var nextText: String = allScript[ID + 1]
	var text: String = choiceName
	choiceName = {}
	choiceName[ID] = [text, [lineType.CHOICE]]
	dialogCreator(nextLineNumber, nextText, passedIndent, choiceName )
	return choiceName
	
func dialogCreator(ID: int, text: String, passedIndent: int, passedDictionary: Dictionary):
	var assignedLine = assignLineType(allScript[ID])
	var nextLineNumber = ID + 1
	var nextText = allScript[ID + 1]
	var indent: int
	if text.contains("->") == true: #gets indents
		var regex = RegEx.new()
		regex.compile("\t")
		var result = regex.search(text)
		if result != null:
			var string = result.get_string()
			indent = string.length()
		else:
			indent = 0
		if indent > passedIndent:
			passedDictionary[ID] = [startNest(ID, text, indent), assignedLine]
			return  #check for text MAY HAVE TO REMOVE THE FACT THAT ITS A STRING IN FUNCTIONS
		if indent <= passedIndent:
			return 
	var packedStringSize = allScript.size()
	if nextLineNumber < packedStringSize: 
		if text.contains("->") == false: #if its not a choice
			if text.contains("\t") == true: #if it tabbed
				passedDictionary[ID] = [text, assignedLine] #bread and potatoes THIS IS WHAT MAKES THE TEXT
				dialogCreator(nextLineNumber, nextText, passedIndent, passedDictionary)
				return
			else:
				return
		else:
			return passedDictionary
			print("SOMEHOW YOU GOT A CHOICE TO BYPASS THE FIRST IF,,, HOW")

This honestly looks like it’s way overengineered for what it’s supposed to do. Could you maybe first explain what’s the actual use case you’re trying to solve with this?
Did you write it yourself or did you follow any tutorial or supported yourself with ChatGPT etc? These kinds of code snippets are usually a bit hard to follow without thoroughly testing it yourself, so it’ll be hard to give a definite answer on how to fix this particular piece of code, but maybe you’d be willing to accept an alternative solution, once we can understand your use case.

I suspect part of the problem here is regular expressions. Generally, when you’re parsing something, you want to tokenize it, then parse the stream of tokens. So, with your file above, the tokens would look something like:

CHOICE ARROW
NEWLINE
TAB
"tab"
NEWLINE
TAB
"tab 2"
NEWLINE
TAB
"tab 3"
NEWLINE
CHOICE ARROW
"check same line"
NEWLINE
TAB
"tabariono"
NEWLINE
TAB
"taby"
NEWLINE
[...]

Then, typically, the parser eats one token at a time (potentially with lookahead (but there’s a whole rabbit hole here… there are lots of parsing techniques)) and processes iteratively through the token stream until it runs out.

Tokenizing is a bit of an art; I included NEWLINE tokens above for context, but you could also declare that (say) not having a newline in a given place is an error, and have the tokenizer error out with a line number if it encounters that.

You’re using regexes on the text, it seems? But bare regexes and things like contains() might grab you something from the middle or the end of the text, so you may be processing things somewhat out of order. That may cause all sorts of fun.

I can totally accept an alternative solution, i’m trying to make a branching dialog parser it creates a dictionary with the Key being the line number and an array being the Value the array is [text, choice to check for a different piece of code] with text turning into another dictionary when it seen as a choice. Honestly i’m just trying to learn how to do it for future projects because I think this is probably a good skill to know for anything dialog heavy. If you have any alternative solution or something you think I should learn about for this I would love to hear it.

AH, i have it so its already in an array by line when its first made, right now its going through and looking at each section of the array to create the dictionary the lines are represented by ID, is this what you mean by tokenizing it or am I just not understanding it well? I’ve never made a parser before or really touched on any of this stuff before so if you have any recommendations to learn any of those parsing techniques I would be super interested.

Tokenizing is breaking things up into “atomic” things. What that means is kind of up to the person who writes the tokenizer, but the basic idea is to try to take a complicated problem (“I have some text, I want to turn it into something a computer can execute”) and turn it into two simpler problems (“I have some text, I want to turn it into a stream of tokens”, and “I have a stream of tokens, I want to turn that into something a computer can execute”).

So, breaking it up by lines is a good start, but you’d probably find your parser logic simpler to generate if you take each line and split it up into tokens. In your case, it looks to me like the relevant bits are:

  • “-> choice”
  • indenting (indicating scope)
  • text

So, maybe consider a first pass that generates an array. The array would be of tokens, with each token being one of:

  • a specific integer value (something big, maybe 1,000,000) indicating a choice token, probably represented in the code by a const value
  • an integer, representing an indentation depth
  • text strings

So, your example above would produce a list like this:

# assuming "const CHOICE: int = 1000000" in the code somewhere...

[
  CHOICE, 1, "tab", 1, "tab 2", 1, "tab 3", CHOICE, "check same line",
  1, "tabariono", 1, "taby", 1, CHOICE, "2", 2, "tab 4", 2, "tab 5", 2, "tab 6",
  2, "tab 7"
]

At that point the parse logic gets a little simpler; you can do something like:

func parse(tokens: Array):
    for token in tokens:
        if token == CHOICE:
            # add a choice tag at the current scope
        elif typeof(token) == TYPE_INT: # Scope depth
            var new_scope: int = token
            if new_scope != cur_scope:
                # deal with scope change
            else:
                # scope hasn't changed
        elif typeof(token) == TYPE_STRING:
            # append our string on the previous thing, whether it's a text line or a choice tag
        else:
            print("ERROR: Unexpected token \"" + str(token) + "\""

While I see the quick-writing appeal in writing nested choices this way I found using a named page system very helpful in our narrative game Vessels. If it’s your first time writing a domain specific language I’d recommend writing your text this way or similar making control flow very explicit and bound by symbols and names rather than implicitly by indentation.

//declaring a new page starts with #
# start
//this is a comment
-> choice #choice_1
//choices begin with -> and end with a #page_name to read next
-> choice check same line #choice_2

# choice_1
tab
tab2
tab3

# choice_2
tabariono
taby
-> choice 2 #choice_2_2
-> choice 3 #choice_2_3

# choice_2_2
tab 4
tab 5

# choice_2_3
tab6
tab7

Then parsing is filling a Dictionary of pages where each page is an array of text, you could even wait to parse choices until they appear in-game. As much as I don’t want to dump a whole script out of the blue I also want to prove my theory; here’s a parser and reader for this format, see if it fits your needs while being legible.

extends Control

@export_multiline var raw_dialogue: String
var parsed_dialogue: Dictionary[String, Array] = {}

func parse_dialogue() -> bool:
	var page_title: String = "start"
	var working_array: Array[String] = []

	var lines := raw_dialogue.split("\n")
	for line in lines:
		if line.begins_with("#"):
			# move working array into the dictionary for the *last* page we were working on
			parsed_dialogue[page_title] = working_array.duplicate()
			# strip symbols and whitespace from the new page title
			page_title = line.substr(1).strip_edges()
			# create a new array for the new page
			working_array = []
		elif line.begins_with("->"):
			if line.find("#") == -1:
				# add your own parse checks
				push_warning("Choice without a assigned page!")
				return false

			# we could process the choice here or when running.
			working_array.append(line)
		elif not (line.begins_with("//") or line.is_empty()):
			# normal text to display
			working_array.append(line)

	# submit the last page
	parsed_dialogue[page_title] = working_array
	return true


# And that's parsing! Now for displaying. A reader could be a separate script
# but it will need to track the page and line it's on to progress.
var line_index: int = 0
var current_page: Array

func start_page(page_name: String) -> void:
	current_page = parsed_dialogue[page_name]
	assert(current_page, "Couldn't find page '%s'" % page_name)
	line_index = 0
	next_dialogue()


func next_dialogue() -> void:
	# dialogue is over
	if line_index >= current_page.size():
		return

	var line: String = current_page[line_index]
	line_index += 1

	# process choices
	if line.begins_with("->"):
		# slice the page name from `#`
		var page_delimiter: int = line.find("#")
		var goto_page: String = line.substr(page_delimiter+1).strip_edges()
		# slice between -> and # symbols
		var just_text: String = line.substr(2, page_delimiter-3)

		# creating a button for choices
		var button := Button.new()
		button.text = just_text
		button.pressed.connect(start_page.bind(goto_page))
		add_child(button)
		# assuming only choices will follow other choices we auto-progress dialogue
		next_dialogue()
	else:
		# display normal text as labels
		var label := Label.new()
		label.text = line
		add_child(label)


# Debug reading on-start and on-click
func _ready() -> void:
	parse_dialogue()
	start_page("start")

func _input(event: InputEvent) -> void:
	if event is InputEventMouseButton:
		if event.is_pressed():
			next_dialogue()

OK so I think i get what your saying? the number at the end of the arrays in the original output (1 or 0) is i believe what would be the token? (its an assigned enum unless is just needs to be a weird number instead?). instead I would let my original dictionary run and tokenize everything with an added value for depth, and then using that dictionary I would then use the depth to see through if something is nested under something else. or is my issue that I’m getting to stuck on nesting things and dictionaries and should just be using an array instead? I want to use the keys as ID for which line is happening at any given time and I have a separate match function for when the ID is updated that tells godot what to do when an ID is a specific enum like CHOICE DIALOG, TITLE, ect. should I be handling the depth issue there?

ALSO (thank you for answering my questions btw this has been very helpful)

how would I actually go about dealing with this in the function you make?
would I be making a new array with tokens only on that scope and then parsing through those?
is there some kind of recursion I need to do?
how would I deal with the scope change so that it can infinity go down the branch?

AH I planned on making it so you could jump from one section to another based on a title regardless as a future thing. This makes sense as a work around since this is indeed my first time making a domain specific language, my goal was originally to make it as easy and non confusing as possible in the markdown at the expense of a confusing/ complex codebase but your right in that a named page system is probably way better.

You may well find the best thing to do is take @gertkeno 's scheme here and tweak it for your needs. That said, assuming you want to continue with your scheme, presumably you want the output to look something like:

[
  { "prompt": "", "opts": [ "tab", "tab 2", "tab 3"] },
  {
    "prompt": "check same line",
    "opts":
      [
        "tabarinono",
        "taby",
        { prompt: "2", "opts": ["tab 4", "tab 5"] },
        { prompt: "3", "opts": ["tab 6", "tab 7"] }
      ]
    }
]

As an aside, I think if you’re going to do this, you probably want to nail down what the output looks like so you’re sure your player can do everything it needs to. It’s easy to start at the source end and then realize your system actually can’t generate the data you need.

The way I’d probably deal with scope change would be an array of dictionary references. Whenever I increased scope depth, I’d append a reference to the new dictionary into the array. A sibling would replace the reference at the same position in the array. A decrease in scope would be removing the last element(s) in the array.

Well, then there is this… and I agree: