Offline AI-Powered NPCs Teaching Sustainable Farming — Built with Godot 4.x and Gemma 3n

code.forge.temple · August 7, 2025, 8:24pm

I’m excited to share a project I recently built using Godot 4.x (C#) for the Google Gemma 3n Hackathon. It’s a 2D educational game prototype where NPCs are powered by a local large language model (Google’s Gemma 3n) running via Ollama — completely offline and private.

What it does:

NPCs teach sustainable farming and botany through rich, natural-language dialogue
Runs locally — no cloud or internet connection required
Custom NPC component lets you configure system prompts and the AI model endpoint
Designed as a proof of concept for offline-first AI-powered education

Tech stack:

Godot 4.x (C#)
Ollama local LLM server
Gemma 3n model by Google

Links:

GitHub: https://github.com/code-forge-temple/local-llm-npc
Demo video: https://youtu.be/kGyafSgyRWA

I’d love to hear feedback from the community — especially about:

Ideas for extending the NPC AI to other learning domains
Ways to improve modularity or integration in Godot projects
General thoughts on offline AI and local LLMs in games

Thanks for reading!

dragonforge-dev · August 7, 2025, 11:15pm

This is really cool!

My biggest concern would be AI hallucinations. I think this could be really interesting for bringing NPCs to life in an RPG or farming simulator. It’s an idea I’ve been playing with for about two years, so I’m interested to look into your implementation.

Make all folder names snake_case. (And file names, but I didn’t notice that issue.) This is in the GDScript style guide. Specifically because on Windows assets do not import correctly is there are capital letters in the filepath. It’s a huge PITA.
Figure out how to export the game and hook it up to the LLM - which should have an installer. In other words, one click. Usefulness goes way down if it’s not easy to use - both for users and developers.
Consider making this an add-on. So all the LLM stuff is in the add-on and the game itself is not. In this way you make it much easier for someone to lift out and use the LLM integration and use (and comment on/improve on) what you have built.

TBH I think this is going to be the future for a lot of games that feature a lot of NPC dialogue. It’s a scalable way to add unique NPCs that can keep talking to you as long as you want them to, and to build in-game relationships more organically. So instead of your friendship heart level going up every time you click on someone, it goes up or down based on the things you actually say and the time you take to get to know them.

On the flip side, it’s important to have safeguards. Look at all the stuff that ChaptGPT and Grok are going through.

From a performance, cost and control PoV, it’s much better to have the LLM running on the local machine. No internet latency. No paying for API keys and connections. And you get to put what you want (and don’t want) in the LLM.

code.forge.temple · August 8, 2025, 11:31am

Thanks a lot for the thoughtful feedback!

On hallucinations:

I completely agree — that’s one of my main considerations too. Right now, I’m experimenting with prompt structuring and a constrained conversation system to keep NPC responses aligned with the learning objectives. I’m also exploring lightweight fact-check layers to reduce “creative” but incorrect outputs.

On modularity:

The add-on idea is excellent. I’ve been thinking along similar lines — separating the LLM integration into a clean, reusable Godot add-on would make it easier for others to integrate AI NPCs into their own projects without pulling in my whole game logic. In fact, I’ve already placed the relevant LLM integration code into a dedicated BUNDLE folder, which is intended to become a standalone Godot asset in the future — making it easy to drop into any project.

On naming conventions:

Good point about snake_case for GDScript assets — that’s a best practice I’d follow if this were purely GDScript. Since my game is written in C#, I use PascalCase or camelCase for scripts and classes, following C# conventions. However, for assets and folders that interact with Godot’s import system, I’ll make sure to keep filenames and folder names lowercase with underscores where needed to avoid platform issues.

On deployment:

A one-click installer with the LLM pre-configured is exactly where I want to take this. That would remove most of the friction for both devs and players.

And yes, I also believe local-first LLM NPCs have huge potential for games — both for richer NPC relationships and for privacy/performance reasons. Safeguards are definitely a must, especially for educational contexts.

dragonforge-dev · August 8, 2025, 3:21pm

To be clear, it’s only upon Export that this becomes a problem. I don’t know if you have an exported version of your game on Windows so you may not have run into this problem.

peltodev · August 22, 2025, 3:48pm

I wish Google added desktop support to their local inference solution - it would make everything so much easier…

I’ve been meaning to test if LiteRT could be run on an embedded browser or something as the backend for running these models but haven’t found the time yet

Did you test how inference affects rendering? Any problems with frame times spiking during it?

ximossi · August 22, 2025, 6:41pm

I often struggle with this argument, because it clashes with another futuristic idea of widely available high-speed connections. For years now cloud computing has been expanding to an extreme and many countries either are highly investing in an 5G(+) infrastructure shift or leap-frogging straight to it.

From what I have seen so far affordable internet and, with that, cheap computing power is far closer to reality than local, small LLMs performing in a competitive state.

Could @code.forge.temple maybe shed some light on their experiences with the project in terms of actual performance of the LLM? How much training went into preparing the different topics compared to building the game maybe? Have you considered authoring the content first and use LLM mostly for the topic-recognition part of a conversation?

And, more importantly: when testing this, how did people react to the very bad reaction times? In your demo video you have latencies of 30 seconds and more, no? What hardware did this all run on?

dragonforge-dev · August 23, 2025, 12:39am

Perhaps. If the performance isn’t there, that’s an issue. I haven’t tried to hook one up into a game yet, but I had no trouble running LLMs on my box. But it’s certainly much cheaper for me as an indie developer to not pay for people to use my connection to an LLM service that I have to pay for.

ximossi · August 23, 2025, 8:05am

I agree. I, too, do run local LLMs and other generators here locally. They do utilize my GeForce 3090 quite fine. But when running them on lower spec hardware for testing, their usability falls apart. This is true especially for mobile devices.

And also for chat systems I don’t get the same reply times as with “free” online services.

And just to be clear: I do 100% support local hosting, owning software, hardware and data.

For me the argument is not about which LLM(service) to use for dialogue (i.e.) but weather it is feasible at all. At least, in the near future. Once models are low-cost in terms of energy usage, computational time and size (all relative to current standards) they certainly will be useful for games. But for the time being, I, honestly, don’t see it.

Instead, if harnessing the current capabilities of LLMs is helping you, “bake it”. Use LLMs to write your dialogues and put them in a ordinary state machine for now.

100%. Especially since the market dominance war is still going on but yet we already see the prices rising. From a price point, it is quite easy to judge, i think:

A) Running locally: you shift the price to pay for “your game” fully on the player. Want dynamic dialogue? Spend 2000 $ on decent hardware or leave.

B) Running online: you carry the costs fully. Which might be up-front investment for your hardware or subscriptions to APIs. Or, like more often now, pay-per-use fees which might kill your business quiet fast

I am certain there will be hybrid models of this. Games (or other services) offering LLM supplements through a monthly fee, for example. Imagine selling characters or dialogue options just like many do with cosmetics nowadays.

code.forge.temple · August 23, 2025, 12:27pm

Yeah, I really wish Google would add proper desktop support for local inference too. It would make things a lot more straightforward. Ollama has been a lifesaver for now, but direct integration would be awesome.

I haven’t tried LiteRT yet, but running it in an embedded browser sounds like a clever workaround. If you get a chance to experiment with it, I’d love to hear how it goes!

As for inference and rendering: I did my testing with the Jetson Orin Nano running Ollama as the server, since my desktop PC is too old to run Ollama. The main thing I noticed is a short delay before the NPC responds (a few seconds, depending on the prompt and hardware load). However, the actual game rendering stays smooth—no big frame time spikes or stutters. I’m using async HTTP requests in Godot, so the game loop isn’t blocked while waiting for the AI’s reply.

If you’re on really low-end hardware, you might see longer delays, but the frame rate itself shouldn’t tank.

code.forge.temple · August 23, 2025, 12:46pm

Great questions!

Performance:

On a desktop PC with a modern CPU, response times with Gemma 3n via Ollama should be usually 2–5 seconds per reply. On the Jetson Orin Nano (8GB RAM, SSD, extra swap), it’s slower: 20-30 seconds per response , sometimes longer for complex prompts.

Training/Content:

No custom model training was done. All educational content is generated live by the LLM, guided by a detailed system prompt and topic list (npcBackStory.txt). Most development time went into building the game logic, progress tracking, and prompt engineering (not dataset curation or fine-tuning).
I haven’t implemented a hybrid approach yet, but it’s a great idea: authoring the educational content up front (writing out the lessons and dialogue yourself), and then using the LLM mainly for topic recognition or intent detection. This would help keep responses accurate and consistent, reduce hallucinations, and speed up replies, since the LLM wouldn’t need to generate everything from scratch.
One drawback to this approach is the model’s maximum context window: if you include too much pre-authored content or system instructions, you risk filling up the context window, which can cause the model to “forget” earlier parts of the conversation or reduce its ability to track ongoing dialogue (this could be solved with RAG or by baking the custom educational content into the LLM via fine-tuning if the model does not already cover that content).

LLM as Topic Recognizer:

That’s a great point! The current prototype actually uses a hybrid approach: core educational topics, learning checkpoints, and progress tracking are handled by authored game logic, while the LLM is used for generating natural language responses and adapting to player input. The game tracks which topics have been covered and uses the LLM to recognize topic mentions or learning objectives in the conversation. This setup helps balance flexibility and control, improving reliability and reducing hallucinations, while still allowing for dynamic, engaging dialogue.

User Reactions:

So far, I’ve been the only tester. The latency is definitely noticeable (especially on the Jetson) but for a proof-of-concept focused on privacy and offline use, it’s workable. For casual play, the delay is a barrier, but for educational or accessibility scenarios, it’s still promising. Optimizing for speed is definitely a priority for future iterations!

iOSxcOder · August 23, 2025, 1:12pm

How many Parameters and Quantization does your Gemma3n have ?
Does server runs in terminal version or is it compiled as gdextension ?

Do you utilise Vulkan backend from ollama ?

code.forge.temple · August 23, 2025, 2:05pm

Ollama on Jetson Orin Nano uses CUDA GPU acceleration for inference, but does not use Vulkan for LLM inference.

gemma3n:e4b (the larger model) has ~4 billion parameters, and gemma3n:e2b (the smaller) has ~2 billion.
Both models are quantized for efficient inference (typically 4-bit or 8-bit, depending on the Ollama build and hardware). You can check the quantization details after pulling the model with ollama show gemma3n:e4b.

iOSxcOder · August 23, 2025, 3:14pm

Oh sorry I thought it’s included within Godot project itself

dragonforge-dev · August 23, 2025, 6:02pm

TBH, I never even thought about this for a mobile device. In fact I really only considered this for a desktop game. But you make a good point. Local LLMs aren’t there yet for mobile games or consoles.

I agree with you here. My idea of baking it is slightly different though. I’d like to figure out how to miniaturize the LLM model down to where it can only answer questions as a particular NPC in a game like an RPG where there’s already existing dialogue and a “persona description” to draw inspiration from.

I hadn’t thought of that. That’s actually pretty brilliant, and might be a way forward. Though TBH from an ethical point of view I struggle with microtransactions. Still they may be a necessary evil to get a game off the ground these days.