AI generated content in the Godot Game Engine codebase

Hello Godot Forum Community,

This is probably a very heavy topic for my first post after my introduction. I’m not sure if this is even the correct category to discuss it in.

I have seen many topics about the use of AI generated content in the projects that teams and individuals are creating. I understand that this is a decision for the developers of their projects to make for themselves. However, I haven’t seen a topic discussing the inclusion of AI generated code in the Godot Game Engine itself.

Being an open source project, many people can contribute to Godot itself. I did read that AI generated code contributions are discouraged, but not outright banned. It would be difficult to confirm that all contributions are or are not AI generated, and the inclusion of third party dependencies is another layer of difficulty to the determination.

For full disclosure, I have asked the Godot Foundation about this as well, and I do have a reply from them.

What is the communities opinion on AI generated content in the core of the Godot Game Engine?

Should it be allowed or rejected?

If it is allowed, should it be clearly stated somewhere that it might, or does include AI generated content?

For those developers not wanting to use AI generated content in their projects, should they not be aware if the tool on which they are building their project already includes said AI generated content.

There are many reason why developers might not want to use AI generated content, and this isn’t a topic for the individual reasons. This is more about being informed about what is in the game engine you are using.

I hope this can be a constructive topic, and please let me know if I have stepped outside of the rules of the forum.

Regards,

AcidicWombat

4 Likes

The “human good, ai bad” metric is as arbitrary as it is disruptive and disingenuous. In the world of reason what matters is if the code makes the engine more complete and stable, not what tools were involved in creating it.

Wanting to stay clear of any AI these days is a lost cause. You’d have to go amish, and grow vegetables instead of making games. Are there people who needs this spelled out for them? I guess there are.

5 Likes

If it is vetted and not just mindlessly vibe coded then why reject AI generated code? It will get more people to contribute and lower the skill ceiling for understanding how to code. I see it as good.

There is already AI generated code all over the games industry.

If the engine works as intended and it is not just mindlessly merged with vibe coded PR’s I don’t see how it could be a bad thing.

I feel like if you completely reject LLM’s into a coding based workflow, you are not seeing it as a tool. (or your a really good dev) (not me :frowning: )

For me it didn’t make me faster just more informed on how to approach problems and made to find what I can look for easier. But I always try to double check with crawling forums and documentation if I don’t understand the code given to me.

I’ve seen a subset of people online just mindlessly vibe code a project and then just try to get the LLM to fix it, then finally post about it online asking for help without even trying to fix it themselves. Don’t do that…

6 Likes

AI code is a bit different from other forms of generative AI, since most code released online is done so under licenses that allow its reuse. The main issues I have with other uses of generative AI are the ethical concerns; the fact the creators of said AI have used training data they don’t have rights to use, and are campaigning against repercussions for using the work of others without informed consent.

I don’t personally write code with AI. It’s a tool I have no use for in this regard, since I want to learn how to write my own code without assistance. I’m fine with others using AI to assist them in making their projects, and I’m fine with it being used in making Godot Engine under the following constraints:

  • The training data used to train the AI must be publicly available and released with a license that allows public sharing and use.
  • The AI-generated code must adhere to the best practices and guidelines outlined for contributors to Godot’s engine code.
  • Such code must be reviewed by a human before it’s used. Ideally, multiple humans.
    • AI can be used to assist in the code review process, but the code should get at least one pass without AI.
  • No AI-generated bug reports, the tech isn’t quite there yet, it can generate a bug report that sounds real, but cannot find real bugs.
  • No generative AI should be integrated into Godot Engine itself.
    • As in, the unmodified download for Godot should not come with AI tools to generate art, music, code, or other assets for the user automatically.
  • The Godot Foundation should not endorse or encourage the use of AI in the Godot project, we don’t want people getting the wrong idea.
3 Likes

I don’t know your background, but as someone who has been programming for three decades and working on open source projects for over two decades, I will tell you that it is not difficult to detect LLM-generated code. That’s because it’s sh*tty code. Basically people using LLMs to code solutions to problems will get hit with requests to fix things in their PRs that will be hard to do if the submitter doesn’t know how to program themselves.

Also, since the Best Practices for Contribution are very clear, we aren’t going to see unsolicited contributions that haven’t been discussed either as new features or bugs already. There are human gatekeepers approving code, and there need to be a minimum of two, including the person who is responsible for that area of code approving them.

I don’t think this is a real issue.

To be clear, full disclosure would be sharing that response with us.

This thread kinda feels like ignorance or trolling at the moment.

8 Likes

Allowed. Like others have pointed out. AI assisted code is going to become just as normal as “computer assisted code”.

FOSS projects wide and far have adopted policies regarding the use of AI and you may be hard pressed to find any who would outright forbid the use of any and all AI tools.

Full disclosure in pull requests would be welcome but I don’t strictly find it necessary, perhaps for legal reasons.

It’s more important to name which particular AI was used. Copilot is under legal scrutiny for having been trained on GPL source code. While OpenAI’s ChatGPT is historically been somewhat at odds with copyright, so I would also wait out any outstanding lawsuits and which models they apply to. And any AI “made in China” is to be scrutinized both politically and ethically.

To what end?

There’s only one kind of decision “those developers” would make based on this knowledge: whether to use Godot or avoid it.

By definition “those developers” not inclined to update their toolset will soon be the minority, and become extinct. Same as you can’t cling on to Visual Studio 2005 forever, or insisting to program everything in Notepad++.

That’s actually a good point: if one cannot tell whether the code was generated or handcrafted - does it even matter who or what created that code?

The information density in syntactically restricted text is relatively thin compared to images, audio, video and free-form text (long-form stories as found in books) where it will take much longer to have AI generate content that cannot be told apart from human craftmanship even through tooling.

You may have had only experience with AI producing that kind of code, or you are basing this on code generated by a single, broad prompt such as “make my player jump” - or perhaps older models.

If a software engineer works with the AI through precise prompts and iteratively improving the implementation you definitely cannot tell it apart anymore.

Unless you know the developer and are surprised by the sudden change of code style, formal semantics, work speed, or implementation details - but that’s somewhat beside the point.

5 Likes

To be clear, I was inferring from the OP that they were talking about code generated by an LLM, not code created by an experienced developer using an LLM as a tool.

I do agree with this statement:

I do not care how the code was generated, as long as it is good. But I am very finnicky when approving code to my own projects. As long as the AI does code coverage as well, and I don’t have a problem with the architecture decisions and cognitive load, I could care less.

My experience currently with AI generated code is that it increases cognitive load with the code it writes, instead of decreasing it.

5 Likes

We are already at a point where AI generate art is almost impossible to distinguish (outside of meta data for the image which can be faked anyway).

There are plenty of examples of that already.

4 Likes

Thankyou everyone for your responses. I was expecting a heated discussion on the topic.

I left my wording what I thought was vague so as to get as many genuine responses as possible, but now I think I should clarify.

Yes, I probably should have just said for disclosure, not for full disclosure. I’m not sure if I can release the email due to some of the content in it.

Yes, legal reasons. This is actually why I created this topic. While going over the subject with lawyers, a particular set of issues continued to arise when regarding AI generated content. Who owns the generated content?, Can it be copyrighted?, and what happens when the legal landscape finally settles if the law ends up going against the users of AI generated content.

The result was “We would recommend avoiding anything generated by AI until there is a clear legal precedent, or laws have been updated to cover it.”

2 Likes

There’s a really fascinating read on this topic here:

4 Likes

Besides the experience of the programmer using the LLM, what’s the difference here? An LLM could certainly be used to enhance a normal user’s prompt and add more details, putting it on par with what the experienced programmer might prompt. @dragonforge-dev

I would personally be against any inclusion of AI generated code. It also makes me wonder about Steam’s policy of disclosing AI use.

2 Likes

[quote=“CodeSmile, post:6, topic:129639”]

Copilot is under legal scrutiny for having been trained on GPL source code

[/quote]

But this will boil down to the age old point …

“Humans who are trained in GPL code are welcome to process and generate new code, why shouldnt AI”

And on most decent Scify shows the AI ends up winning out by becoming human and earning human rights.

(I was honestly surpised that commander Data’s positronic brain wasnt already considered sentient, the federation appeared socially and philosophically backward on the matter).

[quote=“ComicallyUnfunny, post:11, topic:129639”]

Besides the experience of the programmer using the LLM, what’s the difference here?

[/quote]

Obviously IF the AI generated code is derived from human code THEN there is no difference from it being taken from web searches ….

But if the judge can concede that human generated code is really no better then the AI code is the same as getting another software generating data form to develop.

That could take hundreds of years, maybe Star Trek was right ….

But that isnt what its a case of … the question of whether the AI can learn to make unique code can also be counter questioned with “can every human programmer always generate unique code” and of course the AI devs could be one step ahead here and say that the humans who wrote unique code had gone through iterations of novelty seeking coding explorations in thier own work, so the AI can be put through the same steps.

Its a matter of time before they release the continuous “online” learning version that iterates its own code into a compiler and fine tunes and beautifies the output … it could be set up by anyone to a limited level with the current tech.

We already know the godot team has QA for thier pull requests, that can easily result in copied or accidently memorised code beibg prevented from entering the codebase.

Theres probably going be a massive slap back from the powers that were saying “we dont want those humans to be trained on our Scientific python leagues above them, so we deleted it”.

The key issue here is not AI itself, but responsibility for the code that ends up in Godot’s main code base. If the result meets quality standards, passes review, and does not violate licences, the tool used to create it is of secondary importance.

2 Likes

That’s kind of like saying that with really high quality paints and paintbrush, someone who just started painting can creating a realistic oil portrait compared to someone who has been painting for years. It’s not about what you put into the prompt - it’s what you get out of it.

For example, if you ask an LLM how to do something, and it tells you that you can use a function that doesn’t exist, what’s the first thing you do? An inexperienced developer is going to try and copy and paste the entire code example and get it to work. Then ask the AI why it doesn’t work. An experienced developer is going to go look that function up and see if it’s real. Then if it is, they will write their own code, instead of relying on the AI example which is going to be example code, and have no information about the context of the project.

An experienced developer using something like GitHub Copilot is going to ask for help inside their IDE, and is going to utilize the LLM inside the context of their own code. But, coming back to what the topic is about: the LLM then gets to use all your code as training data. Even if they say they don’t.

7 Likes

At the end we don’t know what code is written by a human or a LLM.

So what are we even talking about?

1 Like

Just like we obviously can’t tell if this picture was created by an LLM or a human?

I’ve answered enough questions on this forum to know when someone is asking for help with AI code. So that’s one of the things I’m talking about.

4 Likes

Note that fully AI generated content is banned, see this, and any AI assistance in making a PR must be disclosed (just like if you took some part of your PR from an external source or someone wrote a PR for you etc.)

Note that that page does not talk about the legal size of LLM generated contributions, but one of the strong reasons I, as a Godot maintainer, oppose its inclusion in the engine beyond the reasons listed there is that we simply do not know yet what the legal situation will be with generated content and I want to avoid us “poisoning” the codebase by adding code that ends up not being copyrightable and the legal mess that could cause (not to mention the legal side of plagiarism from LLMs which is also not legally established yet)

7 Likes

I found this interesting discussion of accountability sinks and I think it’s applicable to this discussion. I believe that large coporations are trying to hide where training data comes from so that they can use LLMs as accountability sinks and say, “Well we don’t know where we got that, so we can’t be held legally liable for it.”

I’d love for copyleft to apply, but the few court cases out there so far don’t seem promising.

3 Likes

Absolutely, I don’t expect generated content to be given a pass when it comes to copyright it’s far far too easy to use as that, and I don’t see how it’d be possible to prove that someone didn’t use an LLM to write something that violates copyright so if the verdict is “if the LLM you use violates copyright you’re not liable” it would make it ridiculously easy to just ignore copyright

I’m in favor of less harsh or restrictive copyright, but this one feels pretty obvious (at least that anyone using LLMs or other generative tools have an obligation to ensure the output is not copyrighted)

3 Likes