OpenAI declares AI race “over” if training on copyrighted works isn’t fair use

cyrano@lemmy.dbzer0.com · 7 months ago

OpenAI declares AI race “over” if training on copyrighted works isn’t fair use

Embargo@lemm.ee · edit-2 7 months ago

Oh no! How will I generate a picture of Sam Altman blowing himself now!?

This is fine🔥🐶☕🔥@lemmy.world · 7 months ago

Wdym? He removed his rib or something?

Embargo@lemm.ee · 7 months ago

I was thinking more of a Sam 1 and Sam 2 type situation.

Zier@fedia.io · 7 months ago

Photoshop, just like the rest of us.

latenightnoir@lemmy.world · edit-2 7 months ago

Sad to see you leave (not really, tho’), love to watch you go!

Edit: I bet if any AI developing company would stop acting and being so damned shady and would just ASK FOR PERMISSION, they’d receive a huge amount of data from all over. There are a lot of people who would like to see AGI become a real thing, but not if it’s being developed by greedy and unscrupulous shitheads. As it stands now, I think the only ones who are actually doing it for the R&D and not as eye-candy to glitz away people’s money for aesthetically believable nonsense are a handful of start-up-likes with (not in a condescending way) kids who’ve yet to have their dreams and idealism trampled.

daniskarma@lemmy.dbzer0.com · 7 months ago

In Spain we trained an AI using a mix of public resources available for AI training and public resources (legislation, congress sessions, etc). And the AI turned out quite good. Obviously not top of the line, but very good overall.

It was a public project not a private company.

HakFoo@lemmy.sdf.org · 7 months ago

But what data would it be?

Part of the “gobble all the data” perspective is that you need a broad corpus to be meaningfully useful. Not many people are going to give a $892 billion market cap when your model is a genius about a handful of narrow subjects that you could get deep volunteer support on.

OTOH maybe there’s probably a sane business in narrow siloed (cheap and efficient and more bounded expectations) AI products: the reinvention of the “expert system” with clear guardrails, the image generator that only does seaside background landscapes but can’t generate a cat to save its life, the LLM that’s a prettified version of a knowledgebase search and NOTHING MORE

latenightnoir@lemmy.world · edit-2 7 months ago

You’ve highlighted exactly why I also fundamentally disagree with the current trend of all things AI being for-profit. This should be 100% non-profit and driven purely by scientific goals, in which case using copyrighted data wouldn’t even be an issue in the first place… It’d be like literally giving someone access to a public library.

Edit: but to focus on this specific instance, where we have to deal with the here-and-now, I could see them receiving, say, 60-75% of what they have now, hassle-free. At the very least, and uniformly distributed. Again, AI development isn’t what irks most people, it’s calling plagiarism generators and search engine fuck-ups AI and selling them back to the people who generated the databases - or, worse, working toward replacing those people entirely with LLMs! - they used for those abhorrences.

Train the AI to be factually correct instead and sell it as an easy-to-use knowledge base? Aces! Train the AI to write better code and sell it as an on-board stackoverflow Jr.? Amazing! Even having it as a mini-assistant on your phone so that you have someone to pester you to get the damned laundry out of the washing machine before it starts to stink is a neat thing, but that would require less advertising and shoving down our throats, and more accepting the fact that you can still do that with five taps and a couple of alarm entries.

Edit 2: oh, and another thing which would require a buttload of humility, but would alleviate a lot of tension would be getting it to cite and link to its sources every time! Have it be transformative enough to give you the gist without shifting into plagiarism, then send you to the source for the details!

BlameTheAntifa@lemmy.world · edit-2 7 months ago

I have conflicting feelings about this whole thing. If you are selling the result of training like OpenAI does (and every other company), then I feel like it’s absolutely and clearly not fair use. It’s just theft with extra steps.

On the other hand, what about open source projects and individuals who aren’t selling or competing with the owners of the training material? I feel like that would be fair use.

What keeps me up at night is if training is never fair use, then the natural result is that AI becomes monopolized by big companies with deep pockets who can pay for an infinite amount of random content licensing, and then we are all forever at their mercy for this entire branch of technology.

The practical, socioeconomic, and ethical considerations are really complex, but all I ever see discussed are these hard-line binary stances that would only have awful corporate-empowering consequences, either because they can steal content freely or because they are the only ones that will have the resources to control the technology.

CrazyLikeGollum@lemmy.world · 7 months ago

What’s wrong with the sentiment expressed in the headline? AI training is not and should not be considered fair use. Also, copyright laws are broken in the west, more so in the east.

We need a global reform of copyright. Where copyrights can (and must) be shared among all creators credited on a work. The copyright must be held by actual people, not corporations (or any other collective entity), and the copyright ends after 30 years or when the all rights holders die, whichever happens first. That copyright should start at the date of initial publication. The copyright should be nontransferable but it should be able to be licensed to any other entity only with a majority consent of all rights holders. At the expiration of the copyright the work in question should immediately enter the public domain.

And fair use should be treated similarly to how it is in the west, where it’s decided on a case-by-case basis, but context and profit motive matter.

bruhssa@lemmy.world · 7 months ago

These fuckers are the first one to send tons of lawyers whenever you republish or use any IP of them. Fuck these idiots.

TheBrideWoreCrimson@sopuli.xyz · 7 months ago

My main takeaway is that some contrived notion of “national security” has now become an acceptable justification for business decisions in the US.

shaggyb@lemmy.world · 7 months ago

“How am I supposed to make any money if I can’t steal all of my products to sell back to the world that produced them?”

Yeah, fuck that. The whole industry deserves to die.

geography082@lemm.ee · 7 months ago

Fuck these psychos. They should pay the copyright they stole with the billions they already made. Governments should protect people, MDF

Ferroto@lemmy.world · 7 months ago

Siegfried@lemmy.world · 7 months ago

Idk about that, but openai is probably over

Shanmugha@lemmy.world · 7 months ago

National security my ass. More like his time span to show more dumb “achievements” while getting richer depends on it and nothing else

JHD@lemmy.world · 7 months ago

Strange that no one mentioned openai making money off copyrighted works.

F_OFF_Reddit@lemmy.world · 7 months ago

No amigo, it’s not fair if you’re profiting from it in the long run.

Rhoeri@lemmy.world · 7 months ago

What a giant load of crap.

SlopppyEngineer@lemmy.world · 7 months ago

He’s afraid of losing his little empire.

OpenAI also had no clue on recreation the happy little accident that gave them chatGPT3. That’s mostly because their whole thing was using a simple model and brute forcing it with more data, more power, more nodes and then even more data and power until it produced results.

As expected, this isn’t sustainable. It’s beyond the point of decreasing returns. But Sam here has no idea on how to fix that with much better models so goes back to the one thing he knows: more data needed, just one more terabyte bro, ignore the copyright!

And now he’s blaming the Chinese into forcing him to use even more data.