Remove duplicate entries from arrays

Is there a built-in function to remove array entries that occur more than once?
If I have:
myarray = [1,2,3,3,4,5,5,6]

is there anything like myarray.deduplicate() capable of returning:

[1,2,3,4,5,6]

I’m thinking it might get slow if I have to iterate big arrays in gdscript directly…?

Unfortunately godot doesnt provide a function for that. You either have to do it with loops or use a dictionary which is properly easier

Or just ensure you don’t put duplicates in the array in the first place by using has.

Array — Godot Engine (stable) documentation in English

This would require to loop over the whole array anyways, thats why dictionaries are preferred, but if you want to use arrays that might be the best solution

You could get what you want only one of

Var the_thing = reference to the thing

Var number_of_entries = 0

For entry in array:

If entry == thing:

array.remove(entry) (i think the function is remove(), but could remember it wrong.

number_of_entries += 1

If number_of entries > 0:

array.append(thing) #to avoid that it gets added if there are 0 of it)

There are probably typos and wrong case etc above but you get the idea. No indentation either. I’m on the phone but you get the idea i think

As others already noted, the best would be to just use a dictionary as a set.

If you’re stuck with the array, removing duplicates using a temp dictionary is probably the fastest way to do it from script:

func array_remove_duplicates(array: Array) -> Array:
	var output := {}
	for element in array:
		if not element in output:
			output[element] = true
	return output.keys()

Thanks all, a lot of useful info.

For context, I’m adding/joining multiple arrays, so I wont’ be able to always guarantee I will avoid duplicates.
Possibly the easiest approach would be to perform the check within the joining function and call it done.

I just hoped there was an easy in-built to spare me that part :slight_smile:

Agreed, dictionary is definitely the way to go.

This is GDScripts weakest link though, arrays, list and hashsets are far superior in C#.

Join arrays using the same dictionary technique as making unique arrays:

func array_union(arr1: Array, arr2: Array) -> Array:
	var dict := {}
	for a in arr1:
		dict[a] = 1
	for a in arr2:
		dict[a] = 1
	return dict.keys()

Is using a dictionary faster than just using .has():

func array_remove_dupes(array:Array)->Array: 
	var output :Array = [] 
	for element in array: 
		if !output.has(element): 
			output.append(element)
	return output

How about using filter?

func array_append_no_dupes(array1:Array, array2:Array)->Array: 
	var output :Array = [] 
	output = array2.filter(func(n):
		if !array1.has(n): 
			return n)
	return array1  + output

Yes. Should be considerably faster. has() is a linear search. It iterates the whole array. The way you’re using it will perform close to O(n²), while dictionary will do it close to O(n).

Yes it is. When optimizing, always measure. Here is a simple test program for testing @sancho2’s, @normalized’s and my unique array functions. Results:

unique_array_normalized:        160.998 ms
unique_array_dizzycaterpillar:  119.757 ms
unique_array_sancho2:           2105.69 ms

I’m quite shocked that @normalized made a newbie mistake of unnecessarily checking if element already exists in the result dictionary :slight_smile:

func _ready() -> void:
	const COUNT := 2000
	const SIZE := 1000
	var test_array: Array[int]
	test_array.resize(SIZE)
	for i in SIZE:
		test_array[i] = randi()
	
	if true:
		var t0 = Time.get_ticks_usec()
		for i in COUNT:
			unique_array_normalized(test_array)
		var t1 = Time.get_ticks_usec()
		prints("unique_array_normalized:       ", (t1 - t0) / 1000.0, "ms")

	if true:
		var t0 = Time.get_ticks_usec()
		for i in COUNT:
			unique_array_dizzycaterpillar(test_array)
		var t1 = Time.get_ticks_usec()
		prints("unique_array_dizzycaterpillar: ", (t1 - t0) / 1000.0, "ms")

	if true:
		var t0 = Time.get_ticks_usec()
		for i in COUNT:
			unique_array_sancho2(test_array)
		var t1 = Time.get_ticks_usec()
		prints("unique_array_sancho2:          ", (t1 - t0) / 1000.0, "ms")


func unique_array_normalized(array: Array) -> Array:
	var output := {}
	for element in array:
		if not element in output:
			output[element] = true
	return output.keys()

func unique_array_dizzycaterpillar(array: Array) -> Array:
	var output := {}
	for element in array:
		output[element] = true
	return output.keys()
	
func unique_array_sancho2(array:Array)->Array: 
	var output :Array = [] 
	for element in array: 
		if !output.has(element): 
			output.append(element)
	return output

And I made a newbie mistake with the test data. In the previous test the array didn’t have any unique values. Here is better test data and the test results:

	const COUNT := 4000
	const SIZE := 2000
	var test_array: Array[int]
	test_array.resize(SIZE)
	for i in SIZE:
		test_array[i] = randi_range(0, 100)
unique_array_normalized:        205.98 ms
unique_array_dizzycaterpillar:  184.523 ms
unique_array_sancho2:           1188.271 ms

A good performance test would contain multiple test arrays, at least:

  • empty array
  • very small array
  • very big array
  • all elements are unique
  • no unique elements
  • 20% unique elements

You meant - yes it is.

Arghh! :see_no_evil_monkey: :hear_no_evil_monkey: :speak_no_evil_monkey: :smiley:

Good job with the benchmarks!