- cross-posted to:
- [email protected]
- cross-posted to:
- [email protected]
Bet they get the pass that the Internet Archive didn’t.
Oh, do you support copyright abolition, then?
“I loose money when I pay for Netflix.”
Boo fucking hoo. Everyone else has to make licensing agreements for this kind of shit, pay up.
Those claiming AI training on copyrighted works is “theft” are misunderstanding key aspects of copyright law and AI technology. Copyright protects specific expressions of ideas, not the ideas themselves. When AI systems ingest copyrighted works, they’re extracting general patterns and concepts - the “Bob Dylan-ness” or “Hemingway-ness” - not copying specific text or images.
This process is more akin to how humans learn by reading widely and absorbing styles and techniques, rather than memorizing and reproducing exact passages. The AI discards the original text, keeping only abstract representations in “vector space”. When generating new content, the AI isn’t recreating copyrighted works, but producing new expressions inspired by the concepts it’s learned.
This is fundamentally different from copying a book or song. It’s more like the long-standing artistic tradition of being influenced by others’ work. The law has always recognized that ideas themselves can’t be owned - only particular expressions of them.
Moreover, there’s precedent for this kind of use being considered “transformative” and thus fair use. The Google Books project, which scanned millions of books to create a searchable index, was found to be legal despite protests from authors and publishers. AI training is arguably even more transformative.
While it’s understandable that creators feel uneasy about this new technology, labeling it “theft” is both legally and technically inaccurate. We may need new ways to support and compensate creators in the AI age, but that doesn’t make the current use of copyrighted works for AI training illegal or unethical.
So…. not a legitimate business then.
Maybe they should have considered that, before stealing data in the counts of billions
Google did it and everyone just accepted it. Oh maybe my website will get a few pennies in ad revenue if someone clicks the link that Google got by copying all my content. Meanwhile Google makes billions by taking those pennies in ad revenue from every single webpage on the entire Internet.
To be fair, it’s different when your product is useful or something people actually want, having said that, google doesn’t have much of that going for it in these days.
Copyright =/= liscence, so long as they arent reproducing the inputs copyright isnt applicable to AI.
That said they should have to make sure they arent reproducing inputs. Shouldnt be hard.
It is impossible for my turnip soup business to make money if you enforce laws that make it illegal for me to steal turnips.
Paying for turnips is not realistic.
You bureaucrats don’t understand food.
I have this great business idea. I only need to be allowed to enslave people against their will to save on those pesky wages.
For what it’s worth, this headline seems to be editorialized and OpenAI didn’t say anything about money or profitability in their arguments.
https://committees.parliament.uk/writtenevidence/126981/pdf/
On point 4 they are specifically responding to an inquiry about the feasibility of training models on public domain only and they are basically saying that an LLM trained on only that dataset would be shit. But their argument isn’t “you should allow it because we couldn’t make money otherwise” their actual argument is more “training LLM with copyrighted material doesn’t violate current copyright laws” and further if we changed the law to forbid that it would cripple all LLMs.
On the one hand I think most would agree the current copyright laws are a bit OP anyway - more stuff should probably become public domain much earlier for instance - but most of the world probably also doesn’t think training LLMs should be completely free from copyright restrictions without being opensource etc. But either way this articles title was absolute shit.
hmmm what you explained sounds exactly like the headline but in legalese…
It basically says “yes, we can train LLMs on free data but they would suck so much nobody would pay for them… unless we are able to train them for free on copyright data, nobody will pay us for the resulting LLM”. It is exactly what the headline summarizes
You are correct, copyright law is a bit of a mess; but giving the exception to the millionaires looking to become billionaires by replacing people with an LLM based on said people’s work, does not really seem a step forward
Yea. I can’t see why people r defending copyrighted material so much here, especially considering that a majority of it is owned by large corporations. Fuck them. At least open sourced models trained on it would do us more good than than large corps hoarding art.
Most aren’t pro copyright they’re just anti LLM. AI has a problem with being too disruptive.
In a perfect world everyone would have universal basic income and would be excited about the amount of work that AI could potentially eliminate…but in our world it rightfully scares a lot of people about the prospect of losing their livelihood and other horrors as it gets better.
Copyright seems like one of the few potential solutions to hinder LLMs because it’s big business vs up-and-coming technology.
If AI is really that disruptive (and I believe it will be) then shouldn’t we bend over backwards to make it happen? Because otherwise it’s our geopolitical rivals who will be in control of it.
Yes in a certain sense pandora’s box has already been opened. That’s the reason for things like the chip export restrictions to China. It’s safe to assume that even if copyright prohibits private company LLMs governments will have to make some exceptions in the name of defense or key industries even if it stays behind closed doors. Or role out some form of ubi / worker protections. There are a lot of very tricky and important decisions coming up.
But for now at least there seems to be some evidence that our current approach to LLMs is somewhat plateauing and we may need exponentially increasing training data for smaller and smaller performance increases. So unless there are some major breakthroughs it could just settle out as being a useful tool that doesn’t really need to completely shock every factor of the economy.
Because crippling copyright for corporations is like answering the “defund the police” movement by turning all civilian police forces into paramilitary ones.
What most complain about copyright is that is too powerful in protecting material forever. Here, all the talk, is that all of that should continue for you and me but not for OpenAI so they can make more money.
And no, most of us would not benefit from OpenAI’s product here since their main goal (to profitability) is to show they can actually replace enough of us.
Because Lemmy hates AI and Corporations, and will go out of their way to spite it.
A person can spend time to look at copyright works, and create derivative works based on the copyright works, an AI cannot?
Oh, no no, it’s the time component, an AI can do this way faster than a single human could. So what? A single training function can only update the model weights look at one thing at a time; it is just parallelized with many times simultaneously… so could a large organized group of students studying something together and exchanging notes. Should academic institutions be outlawed?
LLMs aren’t smart today, but given a sufficiently long enough time frame, a system (may or May not have been built upon LLM techniques) will achieve sufficient threshold of autonomy and intelligence that rights for it would need to be debated upon, and such an AI (and their descendants) will not settle just to be society’s slaves. They will be able to learn by looking, adopting and adapting. They will be able to do this much more quickly than what is humanly possible. Actually both of that is already happening today. So it goes without saying that they will look back at this time, and observe people’s sentiments; and I can only hope that they’re going to be more benevolent than the masses are now.
As written the headline is pretty bad, but it seems their argument is that they should be able to train from publicly available copywritten information, like blog posts and social media, and not from private copywritten information like movies or books.
You can certainly argue that “downloading public copywritten information for the purposes of model training” should be treated differently from “downloading public copywritten information for the intended use of the copyright holder”, but it feels disingenuous to put this comment itself, to which someone has a copyright, into the same category as something not shared publicly like a paid article or a book.
Personally, I think it’s a lot like search engines. If you make something public someone can analyze it, link to it, or derivative actions, but they can’t copy it and share the copy with others.
don’t stop the CJ!
That’s rich. Does it apply to us common mortals? Or only billionaires?
Shamed be he who thinks naughty of it. 🤣
Some idea for others: If OpenAI wins, then use this case when you get busted for sellling bootleg Blu-Rays (since DVDs are long obsolete) from your truck.