AI trained on AI garbage spits out AI garbage.

ModerateImprovement@sh.itjust.works · 8 months ago

AI trained on AI garbage spits out AI garbage.

downpunxx@fedia.io · edit-2 7 months ago

deleted by creator

Crazyslinkz@lemmy.world · 8 months ago

Garbage in; Garbage out.

BluesF@lemmy.world · 8 months ago

Recycle the garbage that comes out… Still more garbage out.

_haha_oh_wow_@sh.itjust.works · 8 months ago

Shit-fueled ouroboros

lemmeout@lemm.ee · 8 months ago

You can’t explain it!

kromem@lemmy.world · 8 months ago

I’d be very wary of extrapolating too much from this paper.

The past research along these lines found that a mix of synthetic and organic data was better than organic alone, and a caveat for all the research to date is that they are using shitty cheap models where there’s a significant performance degrading in the synthetic data as compared to SotA models, where other research has found notable improvements to smaller models from synthetic data from the SotA.

Basically this is only really saying that AI models across multiple types from a year or two ago in capabilities recursively trained with no additional organic data will collapse.

It’s not representative of real world or emerging conditions.

werefreeatlast@lemmy.world · 8 months ago

Maybe we can use it to train the other AIs to help ourselves.

Madrigal@lemmy.world · 8 months ago

“On two occasions I have been asked, ‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’ I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.” - Charles Babbage

CookieOfFortune@lemmy.world · 8 months ago

Of course modern UX design is very much based on getting the right answer with the wrong inputs (autocorrect, etc).

lennivelkant@discuss.tchncs.de · 8 months ago

I believe Robustness was the term I learned years ago: the ability of a system to gracefully handle user error, make it easy to recover from or fix, clearly communicate what was wrong etc.

Of course, nothing is ever perfect and humans are very creative at fucking up, and a lot of companies don’t seem to take UX too seriously. Particularly when the devs get tunnel vision and forget about user error being a thing…

bionicjoey@lemmy.ca · 8 months ago

The business people adopting AI: “who cares what it’s trained on? It’s intelligent right? It’ll just sort through the garbage and magically come up with the right answers to everything”

RecluseRamble@lemmy.dbzer0.com · 8 months ago

Not so hard to imagine given that these people have always seen technical systems as magic.

FlashZordon@lemmy.world · 8 months ago

The AI art is inbreeding.

Lvxferre [he/him]@mander.xyz · 8 months ago

Model degeneration is an already well-known phenomenon. The article already explains well what’s going on so I won’t go into details, but note how this happens because the model does not understand what it is outputting - it’s looking for patterns, not for the meaning conveyed by said patterns.

Frankly at this rate might as well go with a neuro-symbolic approach.

CeeBee_Eh@lemmy.world · 8 months ago

The issue with your assertion is that people don’t actually work a similar way. Have you ever met someone who was clearly taught "garbage’?

Lvxferre [he/him]@mander.xyz · 8 months ago

The issue with your assertion is that people don’t actually work a similar way.

I’m talking about LLMs, not about people.

CeeBee_Eh@lemmy.world · 8 months ago

I know you are, but the argument that an LLM doesn’t understand context is incorrect. It’s not human level understanding, but it’s been demonstrated that they do have a level of understanding.

And to be clear, I’m not talking about consciousness or sapience.

Lvxferre [he/him]@mander.xyz · 8 months ago

I know you are, but the argument that an LLM doesn’t understand context is incorrect

Emphasis mine. I am talking about the textual output. I am not talking about context.

It’s not human level understanding

Additionally, your obnoxiously insistent comparison between LLMs and human beings boils down to a red herring.

Not wasting my time further with you.

[For others who might be reading this: sorry for the blatantly rude tone but I got little to no patience towards people who distort what others say, like the one above.]

CeeBee_Eh@lemmy.world · 8 months ago

I got little to no patience towards people who distort what others say,

My original reply was meant to be tongue-in-cheek, but I guess I forgot about Poe’s law. I’m not a layman, for the record. I’ve worked with AI for over a decade

Not wasting my time further with you.

Ditto. Have a nice day.

CileTheSane@lemmy.ca · 8 months ago

but it’s been demonstrated that they do have a level of understanding.

Citation needed

CeeBee_Eh@lemmy.world · 8 months ago

Here you go

https://youtu.be/gQddtTdmG_8

CileTheSane@lemmy.ca · edit-2 8 months ago

A better mathematical system of storing words does not mean the LLM understands any of them. It just has a model that represents the relation between words that it uses.

If I put 10 minus 8 into my calculator I get 2. The calculator doesn’t actually understand what 2 means, or what subtracting represents, it just runs the commands that gives the appropriate output.

CeeBee_Eh@lemmy.world · edit-2 8 months ago

That’s a bad analogy, because the calculator wasn’t trained using an artificial neural network literally designed by studying biological brains (aka biological neutral networks).

And “understand” doesn’t equate to consciousness or sapience. For example, it is entirely and factually correct to state that an LLM is capable of reasoning. That’s not even up for debate. The accuracy of an LLM’s reasoning capability is one of the fundamental benchmarks used for evaluating its quality.

But that doesn’t mean it’s “thinking” in the way most people consider.

Edit: anyone up voting this CileTheSane clown is in the same boat of not comprehending how LLMs work.

PenisDuckCuck9001@lemmynsfw.com · edit-2 8 months ago

I’m autistic and sometimes I feel like an ai bot spewing out garbage in social situations. If I do what people normally do and make it sound believable, maybe no one will notice.

tal@lemmy.today · 8 months ago

Well, you’ve got a timestamped copy of much of the Web that existed up until latent-diffusion models at archive.org. That may not give you access to newer information, but it’s a pretty whopping big chunk of data to work with.

palordrolap@kbin.run · 8 months ago

Hopefully archive.org have measures in place to stop people from yanking all their data too quickly. As least not without a hefty donation or something. As a user it can chug a bit, and I’m hoping that’s the rate-limiting I’m talking about and not that they’re swamped.

Grimy@lemmy.world · edit-2 8 months ago

That would go against the principal of the archive imo but regardless, if you take away all means of acquiring data freely, you are just giving companies like OpenAI and Google who already have copies of it an insane advantage.

AI isn’t going away, we need to make sure we have free access to it as to not give our whole economy to a handful of companies.

superminerJG@lemmy.world · 8 months ago

News at 11.

Admiral Patrick@dubvee.org · 8 months ago

As junk web pages written by AI proliferate, the models that rely on that data will suffer.

Good.

_haha_oh_wow_@sh.itjust.works · 8 months ago

interdasting

Catoblepas@lemmy.blahaj.zone · 8 months ago

AI making itself sick and worthless after flooding the internet with trash just gives me a warm glow.

Sundray@lemmus.org · 8 months ago

AI writing, scraped by AI, producing more AI writing…

So not “gray goo” exactly, but “gray slop”?

SkaveRat@discuss.tchncs.de · 8 months ago

People are already comparing older content with Low Background Steel, as it’s uncontaminated

FaceDeer@fedia.io · edit-2 8 months ago

And they’re overlooking that radionuclide contamination of steel actually isn’t much of a problem any more, since the surge in background radionuclides caused by nuclear testing peaked in 1963 and has since gone down almost back to the original background level again.

I guess it’s still a good analogy, though. People bring up Low Background Steel because they think radionuclide contamination is an unsolved problem (despite it having been basically solved), and they bring up “model collapse” because they think it’s an unsolved problem (despite it having been basically solved). It’s like newspaper stories, everyone sees the big scary front page headline but nobody pays attention to the little block of text retracting it on page 8.

cordlesslamp@lemmy.today · 8 months ago

Oh no, the AI are inbreeding.