Problem with text unicodes and hackers

My problem, is if I make my program accept any type of string codes passed to the server, it could be used to write machine code programs on the server.

For example the password. They could literally stick a program and pass this as the password. Then, it would be saved into the user account. A program, a virus or backdoor.

I tried searching the manual for ways to detect characters being used other than english characters, but there was nothing.

I fear I may have to reinvent the wheel and make my own.

Godot supports C#. If you need to sanitize your input there are C# libraries that can do that.

The solution to this is to not run passwords as programs, which shouldn’t be difficult as it’s the default behaviour for Strings to not be run as a program of any sort.

3 Likes

This has happened to me before. It was in the comment posts of an article blog site I made. Someone write a computer program into the comment and the server saved it to a file. He then somehow had the server run that program. After which he broke security of the entire shared host. Posting very bad things, which, he went to jail for, I think he was executed.

Well that’s a blog site, I assume you didn’t make the blog website in Godot. Sounds like you mis-managed your website’s permissions, with naive security a <script> tag could bring down your entire website. Most website backends do a ton of work for you, including running scripts, but Godot does very little.

Here this is what I need. I learned this from the php source code. Only, it deletes the bad characters and gives you back whats left.

#if the string contains an illegal character code, this function returns false.
#if all characters are legal, then it returns true.
#THIS DOES NOT MEAN THERE MIGHT BE A WAY FOR BAD CODE WRITTEN WITH THIS
#I HAVE HEARD OF PROGRAM SPLICING
#it gives you idea of what is needed.
func check_illegal_characters(thestring:String) -> bool:
	
	thestring.replacen("a","")
	thestring.replacen("b","")
	thestring.replacen("c","")
	thestring.replacen("d","")
	thestring.replacen("e","")
	thestring.replacen("f","")
	thestring.replacen("g","")
	thestring.replacen("h","")
	thestring.replacen("i","")
	thestring.replacen("j","")
	thestring.replacen("k","")
	thestring.replacen("l","")
	thestring.replacen("m","")
	thestring.replacen("n","")
	thestring.replacen("o","")
	thestring.replacen("p","")
	thestring.replacen("q","")
	thestring.replacen("r","")
	thestring.replacen("s","")
	thestring.replacen("t","")
	thestring.replacen("u","")
	thestring.replacen("v","")
	thestring.replacen("w","")
	thestring.replacen("x","")
	thestring.replacen("y","")
	thestring.replacen("z","")
	thestring.replacen("1","")
	thestring.replacen("2","")
	thestring.replacen("3","")
	thestring.replacen("4","")
	thestring.replacen("5","")
	thestring.replacen("6","")
	thestring.replacen("7","")
	thestring.replacen("8","")
	thestring.replacen("9","")
	thestring.replacen("0","")
	thestring.replacen("_","")
	thestring.replacen("!","")
	thestring.replacen(".","")
	thestring.replacen(",","")
	thestring.replacen("?","")
	
	if(thestring.length()==0):
		return true
		
	return false
	
	pass	

But most computer programs are not text, many are compiled into machine code. And a bash fork bomb :(){ :|:& };: contains none of the replaced characters.

If someone is sending you a payload to execute, it’s not going to be their source code and assume you’ll compile and run it, it’s going to be pre-compiled machine code.

And this is what happened to the server. Is I didn’t know anything about security at the time. And it was a hacker delight.

The opcodes for the cpu are, 0-255. And this is the very basics of computer machine code. The alphabet characters are actually opcodes for the CPU.

I am of course talking about old stuff, the 8086 intel cpu.

There are many people who can read and write machine code, all they need do is just see the numbers or symbols.

opcodes for x86 are not 0-255, they vary in size for that architecture; most are not printable characters depending on the opcode’s parameters. The 8086 paved way for x86 32bit and now 64bit it is a suprisingly modern processor if we’re talking just about it’s assembly/machine code. If anyone is seeing machine code it’s in hexadecimal, not raw strings.

I assure you if anyone is sending you a virus it won’t always be in plain text, you won’t know what language it’s in, the only way to avoid this is to never run code from your users. Luckily that is easy with Godot, do not load user-submitted resources (including scenes), do not run user-submitted scripts.

The 8086 is 8-bit processor. The 286 was 16 bit. The 386 was 16 bit some some 32 bit instructions. So was the 486. Then the pentium happened, which which was 32 bits. We now have 64 bit processors. All of them still use the 8 bit machine codes, because it allows them to be backward compatible. The 8086 paved the way for desktop computers to happen. It was what was called, a word processor. 8 bits. This means 256 characters. All machine code programs are, plain text. It just have differnet file extension on them, and, not very many modern text editors will open them correctly, because they are designed for human language. All modern files can be stored with 8 bit character codes. Even if you store it 32 or 64 bit, you are still writing the 8 bit codes in it. But, they are supposed to be read 64 bits, not 8.

Updated filter detection program…

#if the string contains an illegal character code, this function returns false.
#if all characters are legal, then it returns true.
#THIS DOES NOT MEAN THERE MIGHT BE A WAY FOR BAD CODE WRITTEN WITH THIS
#I HAVE HEARD OF PROGRAM SPLICING
#it gives you idea of what is needed.
func check_illegal_characters(thestring:String) -> bool:
	
	thestring.replacen("a","")
	thestring.replacen("b","")
	thestring.replacen("c","")
	thestring.replacen("d","")
	thestring.replacen("e","")
	thestring.replacen("f","")
	thestring.replacen("g","")
	thestring.replacen("h","")
	thestring.replacen("i","")
	thestring.replacen("j","")
	thestring.replacen("k","")
	thestring.replacen("l","")
	thestring.replacen("m","")
	thestring.replacen("n","")
	thestring.replacen("o","")
	thestring.replacen("p","")
	thestring.replacen("q","")
	thestring.replacen("r","")
	thestring.replacen("s","")
	thestring.replacen("t","")
	thestring.replacen("u","")
	thestring.replacen("v","")
	thestring.replacen("w","")
	thestring.replacen("x","")
	thestring.replacen("y","")
	thestring.replacen("z","")
	thestring.replacen("1","")
	thestring.replacen("2","")
	thestring.replacen("3","")
	thestring.replacen("4","")
	thestring.replacen("5","")
	thestring.replacen("6","")
	thestring.replacen("7","")
	thestring.replacen("8","")
	thestring.replacen("9","")
	thestring.replacen("0","")
	thestring.replacen("_","")
	thestring.replacen("!","")
	thestring.replacen(".","")
	thestring.replacen(",","")
	thestring.replacen("?","")
	thestring.replacen(":","")
	thestring.replacen("(","")
	thestring.replacen(")","")
	thestring.replacen(";","")
	thestring.replacen("{","")
	thestring.replacen("}","")
	thestring.replacen("=","")
	thestring.replacen("+","")
	thestring.replacen("*","")
	thestring.replacen("&","")
	thestring.replacen("^","")
	thestring.replacen("%","")
	thestring.replacen("$","")
	thestring.replacen("#","")
	thestring.replacen("@","")
	thestring.replacen("`","")
	thestring.replacen("\"","")
	thestring.replacen("~","")
	thestring.replacen("-","")
	thestring.replacen("<","")
	thestring.replacen(">","")
	thestring.replacen(":","")
	thestring.replacen("/","")
	thestring.replacen("\\","")
	thestring.replacen("|","")
	
	
	if(thestring.length()==0):
		return true
		
	return false
	
	pass	

I have not tested the source code I given.

The 8086 is a 16 bit processor, the opcodes themselves are 8 bits long, but with arguments and immediates the entire instruction becomes much larger. Furthermore ascii printable characters are fewer than 127, most of a byte isn’t printable. You can open a executable in a text editor but you will not see many ascii characters, the ones you list certainly show up rarely outside of raw .data:/.text: blocks. And if you do intend to block all 0-255 values a byte can hold then why even accept input? At this rate the only thing someone can submit is whitespace (which happens to actually include a valid opcode).

Here’s a interesting video on making a “printable” program in x86, only using instructions that fit into the ascii table.


Regardless this misunderstanding about the printable nature of machine programs doesn’t help your initial problem, you only need to avoid loading script, resources, and scenes submitted by users, all of these can contain a script that executes even system calls.

No, that is -127 to +128. Unsigned byte is 0-255.

Here is an updated and tested version of the function.

#Check if illegal character coces, anything that is not english letter or punch.
#Returns true if no illegal codes,
#Returns false if illegal code.
func check_illegal_characters(thestring:String) -> bool:
	

	thestring=thestring.replacen("a","")
	thestring=thestring.replacen("b","")
	thestring=thestring.replacen("c","")
	thestring=thestring.replacen("d","")
	thestring=thestring.replacen("e","")
	thestring=thestring.replacen("f","")
	thestring=thestring.replacen("g","")
	thestring=thestring.replacen("h","")
	thestring=thestring.replacen("i","")
	thestring=thestring.replacen("j","")
	thestring=thestring.replacen("k","")
	thestring=thestring.replacen("l","")
	thestring=thestring.replacen("m","")
	thestring=thestring.replacen("n","")
	thestring=thestring.replacen("o","")
	thestring=thestring.replacen("p","")
	thestring=thestring.replacen("q","")
	thestring=thestring.replacen("r","")
	thestring=thestring.replacen("s","")
	thestring=thestring.replacen("t","")
	thestring=thestring.replacen("u","")
	thestring=thestring.replacen("v","")
	thestring=thestring.replacen("w","")
	thestring=thestring.replacen("x","")
	thestring=thestring.replacen("y","")
	thestring=thestring.replacen("z","")
	thestring=thestring.replace("1","")
	thestring=thestring.replace("2","")
	thestring=thestring.replace("3","")
	thestring=thestring.replace("4","")
	thestring=thestring.replace("5","")
	thestring=thestring.replace("6","")
	thestring=thestring.replace("7","")
	thestring=thestring.replace("8","")
	thestring=thestring.replace("9","")
	thestring=thestring.replace("0","")
	thestring=thestring.replace("_","")
	thestring=thestring.replace("!","")
	thestring=thestring.replace(".","")
	thestring=thestring.replace(",","")
	thestring=thestring.replace("?","")
	thestring=thestring.replace(":","")
	thestring=thestring.replace("(","")
	thestring=thestring.replace(")","")
	thestring=thestring.replace(";","")
	thestring=thestring.replace("{","")
	thestring=thestring.replace("}","")
	thestring=thestring.replace("=","")
	thestring=thestring.replace("+","")
	thestring=thestring.replace("*","")
	thestring=thestring.replace("&","")
	thestring=thestring.replace("^","")
	thestring=thestring.replace("%","")
	thestring=thestring.replace("$","")
	thestring=thestring.replace("#","")
	thestring=thestring.replace("@","")
	thestring=thestring.replace("`","")
	thestring=thestring.replace("\"","")
	thestring=thestring.replace("~","")
	thestring=thestring.replace("-","")
	thestring=thestring.replace("<","")
	thestring=thestring.replace(">","")
	thestring=thestring.replace(":","")
	thestring=thestring.replace("/","")
	thestring=thestring.replace("\\","")
	thestring=thestring.replace("|","")

	
	if(thestring.length()==0):
		return true
		
	return false
	
	pass	

or, is that -128 to + 127?
Anyways, what you see are actually power levels given to a capacitor hooked up to the CPU.

But, yeah, for some reason it turns length == 0 even though the string has codes in it.