Chat GPT appears to hallucinate or outright lie about everything

Buttflapper@lemmy.world · 18 天前

Chat GPT appears to hallucinate or outright lie about everything

linearchaos@lemmy.world · 17 天前

I don’t want to sound like an AI fanboy but it was right. It gave you minimum requirements for most VR games.

No man Sky’s minimum requirements are at 1060 and 8 gigs of system RAM.

If you tell it it’s wrong when it’s not, it will wake s*** up to satisfy your statement. Earlier versions of the AI argued with people and it became a rather sketchy situation.

Now if you tell it it’s wrong when it’s wrong, It has a pretty good chance of coming back with information as to why it was wrong and the correct answer.

VinS@sh.itjust.works · 17 天前

Well I asked some questions yesterday about classes of DAoC game to help me choose a starter class. It totally failed there attributing skills to wrong class. When poking it with this error it said : you are right, class x don’t do Mezz, it’s the speciality of class Z.

But class Z don’t do Mezz either… I wanted to gain some time. Finally I had to do the job myself because I could not trust anything it said.

linearchaos@lemmy.world · 17 天前

God I loved DAoC, Play the hell of it back in it’s Hey Day.

I can’t help but think it would have low confidence on it though, there’s going to be an extremely limited amount of training data that’s still out there. I’d be interested in seeing how well it fares on world of Warcraft or one of the newer final fantasies.

The problem is there’s as much confirmation bias positive is negative. We can probably sit here all day and I can tell you all the things that it picks up really well for me and you can tell me all the things that it picks up like crap for you and we can make guesses but there’s no way we’ll ever actually know.

VinS@sh.itjust.works · 17 天前

I like it for brainstorming while debbuging, finding funny names, creating stories “where you are the hero” for the kids or things that don’t impact if it’s hallucinating . I don’t trust it for much more unfortunately. I’d like to know your uses cases where it works. It could open my mind on things I haven’t done yet.

DAoC is fun, playing on some freeshard (eden actually, started one week ago, good community)

linearchaos@lemmy.world · 17 天前

ere you are the hero” for the kids or things that don’t impact if it’s hallucinating . I don’t trust it for much more unfortunately. I’d like to know your uses cases where it works. It could open my mind on things I haven’t done yet.

DAoC is fun, playing on some freeshard (eden actually, started one week

It always seems to attract the nicest and best people.

I had switched to WoW by the time burning crusades picked up, might be worth a revisit one day if for no other reason than to take a tour :)

linearchaos@lemmy.world · 17 天前

No, you can’t trust AI or Google or anything else on the internet for the most part. It’s just a tool. AI is a little less trustworthy but still a useful tool if you wield it correctly.

some time passes

heh I think I found out the source of this particular issue. All the original content is gone and the Camelot herald wiki is incomplete. even a google search is turning up poor results.

We need to get something trained on archive.org :)

more time passes

hmm even digging around in archive.org that’s a hard one to find, classes.ofcamelot.com would have had it, but you have to dig through every class.

I think I had it on my old guild site, but it looks like even that it no longer archived.

so sad.

CairhienBookworm@lemmy.world · 17 天前

I learned early on you can’t rely on them for factual information for reasons you stated.

I use them for creative writing tasks (drafting up emails, letters, etc), generating ideas, for creating excel formulas, basic python, vba functions, etc.

ngwoo@lemmy.world · 17 天前

OP those minimum requirements are taken directly from the Meta Quest 3 support page.

Dasus@lemmy.world · 18 天前

“Converted what I said into the truth”

Now I’m not against the point you’re making in any way, I think the bots are hardcore yes men.

Buut… I have a 1060 and I got it around when No Man’s Sky came out, and I did try it on my 4k LED TV. It did run, but it also stuttered quite a bit.

Now I’m currently thinking of updating my card, as I’ve updated the rest of the PC last year. A 3070 is basically what I’m considering, unless I can find a nice 4000 series with good VRAM.

My point here being that this isn’t the best example you could have given, as I’ve basically had that conversation several times in real life, exactly like that, as “it runs” is somewhat subjective.

LLM’s obviously have trouble with subjective things, as we humans do too.

But again, I agree with the point you’re trying to make. You can get these bots to say anything. It amused me that the blocks are much more easily circumvented just by telling them to ignore something or by talking hypothetically. Idk but at least very strong text based erotica was easy to get out of them last year, which I think should not have been the case, probably.

Red_October@lemmy.world · 17 天前

Yeah? That’s… how LLMs work. It doesn’t KNOW anything, it’s a glorified auto-fill. It knows what words look good after what’s already there, it doesn’t care whether anything it’s saying is correct, it doesn’t KNOW if it’s correct. It doesn’t know what correct even is. It isn’t made to lie or tell the truth, those concepts are completely unknown to it’s function.

LLMs like ChatGPT are explicitly and only good at composing replies that look good. They are Convincing. That’s it. It will confidently and convincingly make shit up.

breadsmasher@lemmy.world · 18 天前

I have some vague memory of lyrics, which I am trying to find the song title theyre from. I am pretty certain of the band. Google was of no use.

I asked ChatGPT. It gave me a song title. Wasn’t correct. It apologised and gave me a different one - again, incorrect. I asked it to provide the lyrics to the song it had suggested. It gave me the correct lyrics for the song it had suggested, but inserted the lyrics I had provided, randomly into the song.

I said it was wrong - it apologised, and tried again. Rinse repeat.

I feel part of the issue is LLMs feel they have to provide an answer, and can’t say it doesn’t know the answer. Which highlights a huge limitation of these systems - they can’t know if something is right or wrong. Where these systems suggest can index and parse vast amounts of data and suggest you can ask it questions about that data, fundamentally (imo) it needs to be able to say “I dont have the data to provide that answer”

SlopppyEngineer@lemmy.world · 18 天前

they have to provide an answer

Indeed. That’s the G in chatGPT. It stands for generative. It looks at all the previous words and “predicts” the most likely next word. You could see this very clearly with chatGPT-2. It just generated good looking nonsense based on a few words.

Then you have the P in chatGPT, pre-trained. If it happens to have received training data on what you’re asking, that data is shown. It it’s not trained on that data, it just uses what is more likely to appear and generates something that looks good enough for the prompt. It appears to hallucinate, lie, make stuff up.

It’s just how the thing works. There is serious research to fix this and a recent paper claimed to have a solution so the LLM knows it doesn’t know.

subignition@piefed.social · edit-2 18 天前

~~The “P” is for predictive, not pre-trained. Generative Predictive Text~~

Edit: Nope I was wrong.

explore_broaden@midwest.social · 18 天前

That’s not right, it’s generative pre-trained transformer.

subignition@piefed.social · 18 天前

Well today I learned, thanks for the correction.

ThePowerOfGeek@lemmy.world · 18 天前

I’ve had a similar experience. Except in my case I used lyrics for a really obscure song where I knew the writer. I asked Chat GPT, and it gave me completely the wrong artist. When I corrected it, it apologized profusely and agreed with exactly what I had said. Of course, it didn’t remember that correct answer, because it can’t add to it update its data source.

hperrin@lemmy.world · 18 天前

It’s trained on internet discussions and people on the internet rarely say, “I don’t know”.

bungleofjoy@programming.dev · 18 天前

LLMs don’t “feel”, “know”, or “understand” anything. They spit out statistically most significant answer from it’s data-set, that is all they do.

NuXCOM_90Percent@lemmy.zip · 18 天前

The issue is: What is right and what is wrong?

"mondegreen"s are so ubiquitous that there are multiple websites dedicated to it. Is it “wrong” to tell someone that the song where Jimi Hendrix talked about kissing a guy is Purple Haze? And even pointing out where in the song that happens has value.

In general, I would prefer it if all AI Search Engines provided references. Even a top two or three pages. But that gets messy when said reference is telling someone they misunderstood a movie plot or whatever. “The movie where Anthony Hopkins pays Brad Pitt for eternal life using his daughter is Meet Joe Black. Also you completely missed the point of that movie” is a surefired way to make customers incredibly angry because we live in bubbles where everything we do or say (or what influencers do or say and we pretend we agree with…) is reinforced, truth or not.

And while it deeply annoys me when I am trying to figure out how to do something in Gitlab CI or whatever and get complete nonsense based on a single feature proposal from five years ago? That… isn’t much better than asking for help in a message board where people are going to just ignore the prompt and say whatever they Believe.

In a lot of ways, the backlash against the LLMs reminds me a lot of when people get angry at self checkout lines. People have this memory of a time that never was where cashiers were amazingly quick baggers and NEVER had to ask for help to figure out if something was an Anaheim or Poblano pepper or have trouble scanning something or so forth. Same with this idea of when search (for anything non-trivial) was super duper easy and perfect and how everyone always got exactly the answer they wanted when they posted on a message board rather than complete nonsense (if they weren’t outright berated for not searching for a post from ten years ago that is irrelevant).

JackGreenEarth@lemm.ee · 18 天前

It all depends on the training data and preprompt. With the right combination of those, it will admit when it doesn’t know an answer most of the time.

mozz@mbin.grits.dev · 18 天前

May I offer you a fairly convincing explanation

jeeva@lemmy.world · 18 天前

I enjoyed reading this, thank you.

subignition@piefed.social · 18 天前

This is the best article I’ve seen yet on the topic. It does mention the “how” in brief, but this analogy really explains the “why” Gonna bookmark this in case I ever need to try to save another friend or family member from drinking the Flavor-Aid

leftzero@lemmynsfw.com · 17 天前

So, they’ve basically accidentally (or intentionally) made Eliza with extra steps (and many orders of magnitude more energy consumption).

mozz@mbin.grits.dev · 17 天前

I mean, it’s clearly doing something which is impressive and useful. It’s just that the thing that it’s doing is not intelligence, and dressing it up convincingly imitate intelligence may not have been good for anyone involved in the whole operation.

leftzero@lemmynsfw.com · 17 天前

Impressive how…? It’s just statistics-based very slightly fancier autocomplete…

And useful…? It’s utterly useless for anything that requires the text it generates to be reliable and trustworthy… the most it can be somewhat reliably used for is as a somewhat more accurate autocomplete (yet with a higher chance for its mistakes to go unnoticed) and possibly, if trained on a custom dataset, as a non-quest-essential dialogue generator for NPCs in games… in any other use case it’ll inevitably cause more harm than good… and in those two cases the added costs aren’t remotely worth the slight benefits.

It’s just a fancy extremely expensive toy with no real practical uses worth its cost.

The only people it’s useful to are snake oil salesmen and similar scammers (and even then only in the short run, until model collapse makes it even more useless).

All it will have achieved in the end is an increase in enshittification, global warming, and distrust in any future real AI research.

HubertManne@moist.catsweat.com · 17 天前

I find they all act like yes men. Some do seem to do another search but eliminate results I find suspect and some just keep replying with the same thing.

thedeadwalking4242@lemmy.world · 17 天前

You asked a generic machine a generic question and it gave you an extremely generic response. What did you expect? There was no context. It should have asked you more questions about what you’ll be doing.

SynopsisTantilize@lemm.ee · 17 天前

No it doesn’t this with my code when I ask for it to proof read a snippet.

sircac@lemmy.world · 18 天前

What would you expect from a word predictor, a knife is mostly useless for nailing, you are using them for the wrong purpose…

TimeSquirrel@kbin.melroy.org · 17 天前

What? You don’t have a set of cutting hammers in the kitchen?

Christer Enfors@lemm.ee · 17 天前

Lol, of course not.

… my cutting hammers are in the bathroom.

subignition@piefed.social · 17 天前

Damn, you’re living in the future. I’m still stuck using three shells.

Karyoplasma@discuss.tchncs.de · 16 天前

Pretty sure my splitting maul would cut my steak, the plate it’s on and the table below. Don’t listen.

Dieva@lemmy.world · 17 天前

One thing I do to help with this is often ask it to double check itself, it sounds kind of stupid but works quite well most of the time to help cut out hallucinations or factual errors

vxx@lemmy.world · 17 天前

I think we shouldn’t expect anything other than language from a language model.

ABCDE@lemmy.world · 18 天前

Yes and no. 1060 is fine for basic VR stuff. I used my Vive and Quest 2 on one.

myersguy@lemmy.simpl.website · 18 天前

deleted by creator

finitebanjo@lemmy.world · 17 天前

For me it is stupid to expect these machines to work any other way. They’re literally designed such that they’re just guessing words that make sense in a context, the whole statement then assembled from these valid tokens sometimes checked again by… another machine…

It’s always going to be and always has been a bullshit generator.

QuentinQuiver@slrpnk.net · 17 天前

You can use the RAG tactic to make it more useful. That involves starting with reputable sources as input, which creates an AI character that’s essentially supposed to be an expert in a certain topic.

The normal AI system is a scammer who tries to convince others to act like them… just like me and other internet trolls or crazy people. It needs some snark to act like a real person does, but pure snark is quite useless.

Essentially: nonsense in, nonsense out Or science books and journals in, sci fi speculation out

finitebanjo@lemmy.world · 17 天前

No, again, because each word is a token which together makes a phrase and each phrase is a token that makes a statement. Since these Tokens are generated individually, it will never have any real underlying logic. It’s just sentence probability. Even if your sample data is free of nonsense, the LLM will still generate nonsense.

zbyte64@awful.systems · 17 天前

RAG is a search engine that sometimes summarizes incorrectly and uses 10x the energy. Such a dumb product.

WhyFlip@lemmy.world · 17 天前

Analytics Engineer