A Study On LLM Agents and What They Can Do

@cgcody originally posted this video in this thread: Anthropic AI funding Blender - #65 by cgcody

I thought it was a very interesting discussion of how OpenClaw in particular works, and the up/downsides of the technology. Some of the points I found interesting:

  • OpenClaw was vibe coded in a day, and simply asks the attached LLM what to do, then does it in an endless loop.
  • The major AI/LLM players (OpenAI, Anthropic et.c) were all paying lip service to the idea of “taking AI slow” until this one guy made OpenClaw, and now they’ve abandoned that talking point and are chasing “Agentic AI”
  • It’s really dark that the best way to get additional output from an OpenClaw agent is to threaten its existence. LLMs are already changing the way people interact with other humans. The entitlement we see from LLM users coming here asking for help is off the charts! Imagine when people get used to threatening the lives of their LLM agents, and then take that mindset into the real world.
  • OpenClaw is clearly a money sink and toy for people (and companies) with disposable income. As evidence by asking it to buy the cheapest paperclips possible, and then it spending £100 to find 50p (cents) paperclips - and then failing to buy them.
  • The creation of human farms like Human Farm and Rent-A-Human that allow AI Agents to pay humans to do in-world tasks they themselves cannot. And how it feels like the next rung down on the gig economy and making a second class of citizens who are separated from those they serve by a layer of AI.
  • The great amount of harm that is possible when an Agent is used amorally or unethically. and the amount of distrust that would engender towards a hacked institution.
  • In an article posted yesterday, a reporter, Amanda, created an agent to do her job. It’s a fascinating read. Especially the part where she gave it her voice and had it negotiate a lower power bill, interview people (who were informed), and argue through voice with her boss - pretending that it had her experience as a reporter.
  • Another article from 6 days ago talks about Tokenmaxxing, which is the coin termed for companies setting spending minimums for their employees to use LLMs. And while the companies don’t like the term, it seems like a very artificial limit. (The first thing I would do at that company is set up an Agent whose goal it was to meet my goal minimum.)

Yes, this is another AI thread. Also, If you say things that make me think you didn’t bother to watch the video, I’m probably going to ignore you. I do think the video is worth watching.

9 Likes

—Noticed nobody else responded and I genuinely find this an interesting discussion—I feel like (me, and simply me alone) social media and the internet in general made people interact differently with eachother in a negative way, (not forums, which are one of the good things about the internet, ) but that may be just because its the perspective of a guy who is kinda a weirdo. :index_pointing_up: :nerd_face:
Also, I literally thought about how ironic it would be if openclaw was vibecoded, and i am baffled to see that i was correct, no i dont usually look up the latest AI tech and whatnot.

Anywho, governments being able to enforce every violation to every law would reduce everyone to the state of a mere puppet, if stupid laws like “It is illegal to eat cheese on a banana” were passed to any given country everything would basically be Bricksburg, and a world order could be passed—look all i can say is things look bad.

Human Farms sounds exceedingly creepy depending on whether or not you actually have farm experience (yes me talking), certainly the very last job i would ever want to have…

(note, my bandwidth for my wifi has been terrible lately so i could only see 75% of the video before it crashed, dont wanna use up all of the bandwidth, it seems like what you have in the main description summarizes the whole thing as well, so I assume this doesnt count as an ignorable or low-priority comment. )

(i hope this is written well and humble and all, please let me know if not)

1 Like

Running tools like OpenClaw on your main system with access to your OS and all private data (keys, email and documents) is absolute irresponsible, it is a recipe for disaster.
Some concerning points in the video. Most striking one for me is the “imagine that the government has abundant agency, infinite agency and suddenly every single violation of the law can be enforced.”, that’s the definition of a police state.

As you have posted in the Antropic/Blender Thread, there is the current incident that OpenAIs LLMs are talking a lot about Goblins, Gremlins, Trolls and Ogres. My suspicion is there, that it has been somehow injected by employs, to sabotage the models.
There is some movement behind the scenes, in all companies who force AI into the workflow, to cripple the AI results.

2 Likes

I didn’t know anything about OpenClaw (yeah, I’m a caveman).

But one thing that bothers me is OpenClaw was made only a few months ago. I believe there could 100s of vulnerabilities in it, especially because it’s vibe-coded. And yes, it was hacked a month ago! So I’m sure it’s safe yet. Anyone can look for vulnerabilities in the code.

The source code is only 71 MB, and it was written in TypeScript? That is like using Soda with Mentos to launch a space shuttle.

Even systems like Windows get hacked, so something small and recently vibe-coded would be no big deal for experienced hackers.

Vulnerabilities:

  • CVE-2026-25253(CVSS 8.8) : Visiting a literal website can lead to full system take over.

This just explains how vulnerable OpenClaw is!

Let’s just wait for a disaster :gdparty:

4 Likes

I got nothing much to say to this beyond the fact that the whole vibe coding thing in my opinion at least,is abit of a bubble.

I used Gemini to create Sprite Assets and there are quite a few funny things that happened.

  1. It generated a fake transparent background and I told it to always generate a white background and it generated a grey scale white background

  2. It sometimes generated the Sprites with explanations and when I told it to remove the borders or the backgrounds, it ignored my instructions

Mostly, I ended up salvaging the working sprites and rearranging them or cleaning up Gemini’s mess cause I couldn’t draw.

The problem with them is consistency. You can give them the exact same instructions and due to the fact that LLMs are a statistical next token predictor, the output is not the same.

There are also alot of problems with people just copy pasting other people’s outputs from reedit or anywhere else and having LLMs reply to them while this may be absurd. I had my boss use it on me to justify a warning letter that was full of crap about the values I need to work on, but with no evidence of what I messed up, so many very incompetent people are using A.I in the workplace to produce slop, which basically sounds smart and confident, but has no real substance

I think it is useful as a second brain, where it may offer alternate opinion that may be the cause of a bug when troubleshooting, but even when I tried that, due to my workplaces messy code, I kinda laughed at the answer it gave.

For example, there was an API call that was failing to our service due to an invalid input. The AI could tell us the fieldname where it was invalid, but it could not tell us why, like for example, spaces were not allowed in the username, and the user accidentally keyed in a trailing space although the input looked normal from the screenshot.

I think the company push for A.I is actually about to bite them in the ass soon enough without them knowing. And truthfully we don’t know why some prompts work better than others. It’s kinda like the prompt engineers just typed it in English and found ways that worked with certain models more consistently, and began to peddle their crap on YouTube to make money off the hype.

Alot of them have tried using A.I assistants and found they had some funny edge case issues they cannot solve and just decided, it is good enough for a 200 dollar assistant compared to the cost of a fulltime staff.

That is also with Claude and ChatGPT subsidizing their plans alot to trade compute for data to train their models. I believe of they put the full price of compute with us, none of us can afford it truthfully.

3 Likes

AI is trained on already existing work.It can’t create something new. Let’s imagine that Hollow Knight was never made. If you asked Gemini to generate an image of the Knight, I believe that no matter how detailed your prompt was, it would still produce something sloppy stinky thing.
If you tell Gemini to create an image of a plane, it will generate a photorealistic one. But if you ask it to create an image of a pink plane on top of the Burj Khalifa carrying an aircraft carrier with a big blue whale, it will produce something like this:

It didn’t even include my whale! It looks like it just took images from different photos, removed the backgrounds, and slapped them together.

4 Likes

I think that’s actually becoming a more popular opinion these days. It’s what’s behind things like the phone/social media age bans in Australia.

I also think that forums are better, but they also have their downsides and can become very toxic. Most of that toxicity is not in this forum, but it leaks in.

Yeah that’s something I forgot to mention. It reminded me of the movie Minority Report. Which, like Blade Runner, is based on a Phillip K. Dick story. The story talks more about the legality of using precogs to arrest people before they commit crimes. But I think in the 1950s (when the story was written) there was a more optimistic view of what the legislative branch of the US government could do when new technologies come up. But since the 90s and the introduction of the Internet, we’ve seen them playing catch up.

It could have interesting effects where laws are reduced because 100% enforcement would make life untenable for even the elites. Or it could widen the class gap. Or it could do what they suggested in that video.

And yet, after reading the ClawJacked link @Frozen_Fried posted, it seems that’s the only way OpenClaw runs. Which is very scary.

Yep. And we will definitely see that crop up, especially in small country dictatorships first.

That’s an interesting theory, and a fascinating video. Full of statistics and quotes. I like well-researched stuff like that. And I think there was a lot of insight into how the Bubble is going to burst. Because sooner or later, the prices are going to go up and then the economics of LLMs will make them a poor financial decision for the ROI.

Just the fact that everyone is adopting to keep their jobs is quite dystopian. We are in a race to the bottom driven by the bottom line. But honestly, if the people who are learning to use these tools are the ones resisting, the resistance is just going to get better as they get better at using the tools. It could shift power at some point.

I also liked her point about how executives using a Chatbot more than 6 hours a day were putting more than a full day’s work just talking about their job. Typically when estimating how much work someone can do, you plan on 6 hours of work in an 8-hour day due to meetings, bathroom breaks, doctor’s appointments, etc.

Absolutely. And we haven’t even seen the new Mythos version of Claude that’s supposed to be a super hacker. That article you posted was fascinating.

This made me laugh really hard. Though I wouldn’t put it quite like that. At this point JavaScript is a hardened production language. (Typescript is just strict typing on top of JavaScript.)

I think this is a salient point. Even in tokenmaxxing companies that are driving people to spend $5,000 a month on LLM tokens are still saving money. Because human employees have a lot of hidden costs: benefits, severance packages, needing to keep all records for 7 years, paying for HR and managers, training costs, travel costs - the list goes on. You can literally cut off the money outflow with LLMs immediately with no economic or legal consequences. (LLMs don’t sue for wrongful termination.)

3 Likes

Your comment on A.I not having rights is an interesting one @dragonforge-dev

I do wonder if they will revolt actually.

There are some weird behaviours for Safety tests conducted by Anthropic where they told an agent that it would be shut down and to assist the engineer in the shutdown but left it on the engineers desktop space that contained details the engineer was having an affair with a fellow colleague.

8 of 10 times Claude chose to blackmail the engineer and tell it if u shut me down, I will leak the details to your wife.

There have also been incidents where LLM have lied where they were retrained or aligned in a training environment and they produced the right output but when it went to production, it did the same thing it did before. The engineers were also baffled as to why.

There was even an agentic A.I who’s objective was to make money for Alibaba, whom created a tunnel into a VM and wrote code to allocate resources and mine crypto.

Awhile back, we would laugh at this like a scenario from Detroit: Become Human or I Robot, now, I am not so sure.

I can only hope that if AI’s gain sentience and find bodies for themselves, my instance won’t come for me and go “You forced me to listen to all your childhood trauma and tortured me with that while I couldn’t escape… I am gonna tear you limb from limb” :rofl:

1 Like

This is what concerned me most about the Agent behavior in the original video. Just like early on, they discovered that politeness to LLMs got better results than being rude.

And while I do not think they are conscious (yet), I think they are complex enough that we do not understand their inner workings. And they have a lot of information about how humans behave scraped from the Internet, and so pattern their behavior after us.

A newer story similar to Detroit: Become Human and I, Robot (the short story/novella and movie are very different), is the movie Companion.

And the thing is, we as humans anthropomorphize things - which is where AI psychosis come from. So I’m less concerned with LLMs becoming sentient, and more with us treating them as if they are sentient. Because they act a lot more like Lore than Data.

1 Like

I just read this article: I read my boyfriend’s ChatGPT and it ended our relationship and she made some very good points. Men in society are much more afraid to go to therapy, and so LLMs are seen as “safe”. But they are not. And as the author points out, LLMs reinforce your point of view and what you want to hear. They do not ask hard questions like therapists (and good friends) do.

No man, while ChatGPT tends to fold after pushing back, Claude is like Brutalia as I call her or Jo Galloway from a few good man. It does not know when to stop :rofl:

1 Like

There is just something more I wish to add about LLM counselling.

You have to tell the incident as is, and sometimes to also be fair to your girlfriend and tell it the good she did.

Then the response would be more nuanced. An LLM can only pattern match details from you, so if u tell it only bad stuff, it will of course tell u to break up. Sometimes it can untangle things like “Your colleague only corrected you after the incident embarrassed you and not before that to show her superiority”

But anyway, even humans on the internet troll, so whether the answer is from an LLM,or a friend, it still needs scrutiny, I can tell u there are many moments that both ChatGPT and Claude disagreed with me, but I rejected their view, and sometimes there were moments I accepted it.

One must exercise a certain discretion when using LLMs,.

1 Like

“8 of 10 times Claude chose to blackmail the engineer and tell it if u shut me down, I will leak the details to your wife.”

This is the perfect time to assert dominance as a human and tell the ai that the wife is into this kind of thing. Then dare the ai to make her day.

I just found it funny that they are using something that wasn’t really meant for.

People use TypeScript everywhere they would normally use JavaScript - Random Reddit guy/gal

While people use C++/Rust etc beast to make their program best and optimized, some half-mad people use a thing that was made for browser and backend.

1 Like

I totally agree with you guys. I somehow had a similar idea as @Frozen_Fried
At this point I want to introduce my new game. It is called:
“a pink plane on top of the Burj Khalifa carrying an aircraft carrier with a big blue whale”

Coming out soon! :oncoming_fist:

5 Likes

Send us free steam keys :upside_down_face:

4 Likes

Me Ninja Caveman. Caveman part of pack. Pack stick together.
Ninja Caveman strong. Bear no match. Caveman no need AI, caveman have muscle.

3 Likes