Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Pro@programming.dev · 3 个月前

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Dr. Moose@lemmy.world · edit-2 3 个月前

Unpopular opinion but I don’t see how it could have been different.

There’s no way the west would give AI lead to China which has no desire or framework to ever accept this.
Believe it or not but transformers are actually learning by current definitions and not regurgitating a direct copy. It’s transformative work - it’s even in the name.
This is actually good as it prevents market moat for super rich corporations only which could afford the expensive training datasets.

This is an absolute win for everyone involved other than copyright hoarders and mega corporations.

kromem@lemmy.world · 3 个月前

I’d encourage everyone upset at this read over some of the EFF posts from actual IP lawyers on this topic like this one:

Nor is pro-monopoly regulation through copyright likely to provide any meaningful economic support for vulnerable artists and creators. Notwithstanding the highly publicized demands of musicians, authors, actors, and other creative professionals, imposing a licensing requirement is unlikely to protect the jobs or incomes of the underpaid working artists that media and entertainment behemoths have exploited for decades. Because of the imbalance in bargaining power between creators and publishing gatekeepers, trying to help creators by giving them new rights under copyright law is, as EFF Special Advisor Cory Doctorow has written, like trying to help a bullied kid by giving them more lunch money for the bully to take.

Entertainment companies’ historical practices bear out this concern. For example, in the late-2000’s to mid-2010’s, music publishers and recording companies struck multimillion-dollar direct licensing deals with music streaming companies and video sharing platforms. Google reportedly paid more than $400 million to a single music label, and Spotify gave the major record labels a combined 18 percent ownership interest in its now-$100 billion company. Yet music labels and publishers frequently fail to share these payments with artists, and artists rarely benefit from these equity arrangements. There is no reason to believe that the same companies will treat their artists more fairly once they control AI.

Lovable Sidekick@lemmy.world · edit-2 3 个月前

You’re getting douchevoted because on lemmy any AI-related comment that isn’t negative enough about AI is the Devil’s Work.

𝕛𝕨𝕞-𝕕𝕖𝕧@lemmy.dbzer0.com · 3 个月前

Some communities on this site speak about machine learning exactly how I see grungy Europeans from pre-18th century manuscripts speaking about witches, Satan, and evil… as if it is some pervasive, black-magic miasma.

As someone who is in the field of machine learning academically/professionally it’s honestly kind of shocking and has largely informed my opinion of society at large as an adult. No one puts any effort into learning if they see the letters “A” and “I” in all caps, next to each other. Immediately turn their brain off and start regurgitating points and responding reflexively, on Lemmy or otherwise. People talk about it so confidently while being so frustratingly unaware of their own ignorance on the matter, which, for lack of a better comparison… reminds me a lot of how historically and in fiction human beings have treated literal magic.

That’s my main issue with the entire swath of “pro vs anti AI” discourse… all these people treating something that, to me, is simple & daily reality as something entirely different than my own personal notion of it.

Lovable Sidekick@lemmy.world · 3 个月前

I see this exact mental non-process in so much social media. I think the endless firehose of memes and headlines is training people to glance at an item, spend minimal brain power processing it and forming a binary opinion, then up/downvote and scroll on. When that becomes people’s default mental process, you’ve got Idiocracy, and that’s what we’ve got. But I see no solution. You can lead a horse to water but you can’t make it spend more than two seconds before screaming at the water and calling it EVIL.

ɯᴉuoʇuɐ@lemmy.dbzer0.com · 3 个月前

Large AI companies themselves want people to be ignorant of how AI works, though. They want uncritical acceptance of the tech as they force it everywhere, creating a radical counterreaction from people. The reaction might be uncritical too, I’d prefer to say it’s merely unjustified in specific cases or overly emotional, but it doesn’t come from nowhere or from sheer stupidity. We have been hearing about people treating their chatbots as sentient beings since like 2022 (remember that guy from Google?), bombarded with doomer (or, from AI companies’ point of view, very desirable) projections about AI replacing most jobs and wreaking havoc on world economy - how are ordinary people supposed to remain calm and balanced when hearing such stuff all the time?

ClamDrinker@lemmy.world · edit-2 3 个月前

This so very much. I’ve been saying it since 2020. People who think the big corporations (even the ones that use AI), aren’t playing both sides of this issue from the very beginning just aren’t paying attention.

It’s in their interest to have those positive to AI defend them by association by energizing those negative to AI to take on an “us vs them” mentality, and the other way around as well. It’s the classic divide and conquer.

Because if people refuse to talk to each other about it in good faith, and refuse to treat each other with respect, learn where they’re coming from or why they hold such opinions, you can keep them fighting amongst themselves, instead of banding together and demanding realistic, and fair policies in regards to AI. This is why bad faith arguments and positions must be shot down on both the side you agree with and the one you disagree with.

deathbird@mander.xyz · edit-2 3 个月前

Idgaf about China and what they do and you shouldn’t either, even if US paranoia about them is highly predictable.
Depending on the outputs it’s not always that transformative.
The moat would be good actually. The business model of LLMs isn’t good, but it’s not even viable without massive subsidies, not least of which is taking people’s shit without paying.

It’s a huge loss for smaller copyright holders (like the ones that filed this lawsuit) too. They can’t afford to fight when they get imitated beyond fair use. Copyright abuse can only be fixed by the very force that creates copyright in the first place: law. The market can’t fix that. This just decides winners between competing mega corporations, and even worse, up ends a system that some smaller players have been able to carve a niche in.

Want to fix copyright? Put real time limits on it. Bind it to a living human only. Make it non-transferable. There’s all sorts of ways to fix it, but this isn’t it.

ETA: Anthropic are some bitches. “Oh no the fines would ruin us, our business would go under and we’d never maka da money :*-(” Like yeah, no shit, no one cares. Strictly speaking the fines for ripping a single CD, or making a copy of a single DVD to give to a friend, are so astronomically high as to completely financially ruin the average USAian for life. That sword of Damocles for watching Shrek 2 for your personal enjoyment but in the wrong way has been hanging there for decades, and the only thing that keeps the cord that holds it up strong is the cost of persuing “low-level offenders”. If they wanted to they could crush you.

Anthropic walked right under the sword and assumed their money would protect them from small authors etc. And they were right.

Atlas_@lemmy.world · 3 个月前

Maybe something could be hacked together to fix copyright, but further complication there is just going to make accurate enforcement even harder. And we already have Google (in YouTube) already doing a shitty job of it and that’s… One of the largest companies on earth.

We should just kill copyright. Yes, it’ll disrupt Hollywood. Yes it’ll disrupt the music industry. Yes it’ll make it even harder to be successful or wealthy as an author. But this is going to happen one way or the other so long as AI can be trained on copyrighted works (and maybe even if not). We might as well get started on the transition early.

Dr. Moose@lemmy.world · edit-2 3 个月前

I’ll be honest with you - I genuinely sympathize with the cause but I don’t see how this could ever be solved with the methods you suggested. The world is not coming together to hold hands and koombayah out of this one. Trade deals are incredibly hard and even harder to enforce so free market is clearly the only path forward here.

Optional@lemmy.world · 3 个月前

Judges: not learning a goddamned thing about computers in 40 years.

mlg@lemmy.world · 3 个月前

Yeah I have a bash one liner AI model that ingests your media and spits out a 99.9999999% accurate replica through the power of changing the filename.

cp

Out performs the latest and greatest AI models

BlueMagma@sh.itjust.works · 3 个月前

This ruling stated that corporations are not allowed to pirate books to use them in training. Please read the headlines more carefully, and read the article.

Knock_Knock_Lemmy_In@lemmy.world · 3 个月前

But, corporations are allowed to buy books normally and use them in training.

jsomae@lemmy.ml · 3 个月前

Please read the comment more carefully. The observation is that one can proliferate a (legally-attained) work without running afoul of copyright law if one can successfully argue that cp constitutes AI.

interdimensionalmeme@lemmy.ml · 3 个月前

I call this legally distinct, this is legal advice.

sugar_in_your_tea@sh.itjust.works · 3 个月前

mv will save you some disk space.

milicent_bystandr@lemm.ee · 3 个月前

Unless you’re moving across partitions it will change the filesystem metadata to move the path, but not actually do anything to the data. Sorry, you failed, it’s jail for you.

mlg@lemmy.world · 3 个月前

stupid inodes preventing me from burning though my drive life

CriticalMiss@lemmy.world · 3 个月前

This 240TB JBOD full of books? Oh heavens forbid, we didn’t pirate it. It uhh… fell of a truck, yes, fell off a truck.

FaceDeer@fedia.io · 3 个月前

That’s not what this ruling was about. That part is going to an actual trial.

booly@sh.itjust.works · 3 个月前

It took me a few days to get the time to read the actual court ruling but here’s the basics of what it ruled (and what it didn’t rule on):

It’s legal to scan physical books you already own and keep a digital library of those scanned books, even if the copyright holder didn’t give permission. And even if you bought the books used, for very cheap, in bulk.
It’s legal to keep all the book data in an internal database for use within the company, as a central library of works accessible only within the company.
It’s legal to prepare those digital copies for potential use as training material for LLMs, including recognizing the text, performing cleanup on scanning/recognition errors, categorizing and cataloguing them to make editorial decisions on which works to include in which training sets, tokenizing them for the actual LLM technology, etc. This remains legal even for the copies that are excluded from training for whatever reason, as the entire bulk process may involve text that ends up not being used, but the process itself is fair use.
It’s legal to use that book text to create large language models that power services that are commercially sold to the public, as long as there are safeguards that prevent the LLMs from publishing large portions of a single copyrighted work without the copyright holder’s permission.
It’s illegal to download unauthorized copies of copyrighted books from the internet, without the copyright holder’s permission.

Here’s what it didn’t rule on:

Is it legal to distribute large chunks of copyrighted text through one of these LLMs, such as when a user asks a chatbot to recite an entire copyrighted work that is in its training set? (The opinion suggests that it probably isn’t legal, and relies heavily on the dividing line of how Google Books does it, by scanning and analyzing an entire copyrighted work but blocking users from retrieving more than a few snippets from those works).
Is it legal to give anyone outside the company access to the digitized central library assembled by the company from printed copies?
Is it legal to crawl publicly available digital data to build a library from text already digitized by someone else? (The answer may matter depending on whether there is an authorized method for obtaining that data, or whether the copyright holder refuses to license that copying).

So it’s a pretty important ruling, in my opinion. It’s a clear green light to the idea of digitizing and archiving copyrighted works without the copyright holder’s permission, as long as you first own a legal copy in the first place. And it’s a green light to using copyrighted works for training AI models, as long as you compiled that database of copyrighted works in a legal way.

FreedomAdvocate@lemmy.net.au · 3 个月前

Makes sense. AI can “learn” from and “read” a book in the same way a person can and does, as long as it is acquired legally. AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

Some people just see “AI” and want everything about it outlawed basically. If you put some information out into the public, you don’t get to decide who does and doesn’t consume and learn from it. If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.

elrik@lemmy.world · 3 个月前

AI can “learn” from and “read” a book in the same way a person can and does

This statement is the basis for your argument and it is simply not correct.

Training LLMs and similar AI models is much closer to a sophisticated lossy compression algorithm than it is to human learning. The processes are not at all similar given our current understanding of human learning.

AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

The current Disney lawsuit against Midjourney is illustrative - literally, it includes numerous side-by-side comparisons - of how AI models are capable of recreating iconic copyrighted work that is indistinguishable from the original.

If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.

An AI doesn’t create works on its own. A human instructs AI to do so. Attribution is also irrelevant. If a human uses AI to recreate the exact tone, structure and other nuances of say, some best selling author, they harm the marketability of the original works which fails fair use tests (at least in the US).

𝕛𝕨𝕞-𝕕𝕖𝕧@lemmy.dbzer0.com · 3 个月前

Even if we accept all your market liberal premise without question… in your own rhetorical framework the Disney lawsuit should be ruled against Disney.

If a human uses AI to recreate the exact tone, structure and other nuances of say, some best selling author, they harm the marketability of the original works which fails fair use tests (at least in the US).

Says who? In a free market why is the competition from similar products and brands such a threat as to be outlawed? Think reasonably about what you are advocating… you think authorship is so valuable or so special that one should be granted a legally enforceable monopoly at the loosest notions of authorship. This is the definition of a slippery-slope, and yet, it is the status quo of the society we live in.

On it “harming marketability of the original works,” frankly, that’s a fiction and anyone advocating such ideas should just fucking weep about it instead of enforce overreaching laws on the rest of us. If you can’t sell your art because a machine made “too good a copy” of your art, it wasn’t good art in the first place and that is not the fault of the machine. Even big pharma doesn’t get to outright ban generic medications (even tho they certainly tried)… it is patently fucking absurd to decry artist’s lack of a state-enforced monopoly on their work. Why do you think we should extend such a radical policy towards… checks notes… tumblr artists and other commission based creators? It’s not good when big companies do it for themselves through lobbying, it wouldn’t be good to do it for “the little guy,” either. The real artists working in industry don’t want to change the law this way because they know it doesn’t work in their favor. Disney’s lawsuit is in the interest of Disney and big capital, not artists themselves, despite what these large conglomerates that trade in IPs and dreams might try to convince the art world writ large of.

elrik@lemmy.world · 3 个月前

you think authorship is so valuable or so special that one should be granted a legally enforceable monopoly at the loosest notions of authorship

Yes, I believe creative works should be protected as that expression has value and in a digital world it is too simple to copy and deprive the original author of the value of their work. This applies equally to Disney and Tumblr artists.

I think without some agreement on the value of authorship / creation of original works, it’s pointless to respond to the rest of your argument.

𝕛𝕨𝕞-𝕕𝕖𝕧@lemmy.dbzer0.com · edit-2 3 个月前

I think without some agreement on the value of authorship / creation of original works, it’s pointless to respond to the rest of your argument.

I agree, for this reason we’re unlikely to convince each other of much or find any sort of common ground. I don’t think that necessarily means there isn’t value in discourse tho. We probably agree more than you might think. I do think authors should be compensated, just for their actual labor. Art itself is functionally worthless, I think trying to make it behave like commodities that have actual economic value through means of legislation is overreach. It would be more ethical to accept the physical nature of information in the real world and legislate around that reality. You… literally can “download a car” nowadays, so to speak.

If copying someone’s work is so easily done why do you insist upon a system in which such an act is so harmful to the creators you care about?

elrik@lemmy.world · 3 个月前

Because it is harmful to the creators that use the value of their work to make a living.

There already exists a choice in the marketplace: creators can attach a permissive license to their work if they want to. Some do, but many do not. Why do you suppose that is?

FreedomAdvocate@lemmy.net.au · edit-2 2 个月前

Your very first statement calling my basis for my argument incorrect is incorrect lol.

LLMs “learn” things from the content they consume. They don’t just take the content in wholesale and keep it there to regurgitate on command.

On your last part, unless someone uses AI to recreate the tone etc of a best selling author and then markets their book/writing as being from said best selling author, and doesn’t use trademarked characters etc, there’s no issue. You can’t copyright a style of writing.

elrik@lemmy.world · 3 个月前

I’ll repeat what you said with emphasis:

AI can “learn” from and “read” a book in the same way a person can and does

The emphasized part is incorrect. It’s not the same, yet your argument seems to be that because (your claim) it is the same, then it’s no different from a human reading all of these books.

Regarding your last point, copyright law doesn’t just kick in because you try to pass something off as an original (by, for ex, marketing a book as being from a best selling author). It applies based on similarity whether you mention the original author or not.

FreedomAdvocate@lemmy.net.au · edit-2 3 个月前

Are you taking that as me saying that they “learn in the same way” as in…by using their eyes to see it and ears to listen to it? You seem to be reading waaaaay too much into a simple sentence. AI “learns” by consuming the content. People learn by consuming the content.

It applies based on similarity whether you mention the original author or not.

That’s if you’re recreating something. Writing fan-fiction isn’t a violation of copyright.

WraithGear@lemmy.world · edit-2 3 个月前

If what you are saying is true, why were these ‘AI’s” incapable of rendering a full wine glass? It ‘knows’ the concept of a full glass of water, but because of humanities social pressures, a full wine glass being the epitome of gluttony, art work did not depict a full wine glass, no matter how ai prompters demanded, it was unable to link the concepts until it was literally created for it to regurgitate it out. It seems ‘AI’ doesn’t really learn, but regurgitates art out in collages of taken assets, smoothed over at the seams.

alsimoneau@lemmy.ca · 3 个月前

Copilot did it just fine

WraithGear@lemmy.world · 3 个月前

1 it’s not full, but closer then it was.

I specifically said that the AI was unable to do it until someone specifically made a reference so that it could start passing the test so it’s a little bit late to prove much.

alsimoneau@lemmy.ca · 3 个月前

The concept of a glass being full and of a liquid being wine can probably be separated fairly well. I assume that as models got more complex they started being able to do this more.

WraithGear@lemmy.world · edit-2 3 个月前

You mean when the training data becomes more complete. But that’s the thing, when this issue was being tested, the’AI’ would swear up and down that the normally filled wine glasses were full, when it was pointed out that it was not indeed full, the ‘AI’ would agree, and change some other aspect of the picture it didn’t fully understand. You got wine glasses where the wine would half phase out of the bounds of the cup. And yet still be just as empty. No amount of additional checks will help without an appropriate reference

I use ‘AI’ extensively, i have one running locally on my computer, i swap out from time to time. I don’t have anything against its use with certain exceptions. But i can not stand people personifying it beyond its scope

Here is a good example. I am working on an APP so every once in a wile i will send it code to check. But i have to be very careful. The code it spits out will be unoptimized like: variable1=IF (variable2 IS true, true, false) .

Some have issues with object permanence, or the consideration of time outside its training data. Its like saying a computer can generate a true random number, by making the function to calculate a number more convoluted.

ɯᴉuoʇuɐ@lemmy.dbzer0.com · 3 个月前

Bro are you a robot yourself? Does that look like a glass full of wine?

alsimoneau@lemmy.ca · 3 个月前

If someone ask for a glass of water you don’t fill it all the way to the edge. This is way overfull compared to what you’re supposed to serve.

wpb@lemmy.world · 3 个月前

Omg are you an llm?

ɯᴉuoʇuɐ@lemmy.dbzer0.com · 3 个月前

Oh man…

That is the point, to show how AI image generators easily fail to produce something that rarely occurs out there in reality (i.e. is absent from training data), even though intuitively (from the viewpoint of human intelligence) it seems like it should be trivial to portray.

FaceDeer@fedia.io · 3 个月前

AIs are capable of generating an image of a full wine glass.

WraithGear@lemmy.world · edit-2 3 个月前

“it was unable to link the concepts until it was literally created for it to regurgitate it out“

-WraithGear

The’ problem was solved before their patch. But the article just said that the model is changed by running it through a post check. Just like what deep seek does. It does not talk about the fundamental flaw in how it creates, they assert if does, like they always did

FaceDeer@fedia.io · 3 个月前

I don’t see what distinction you’re trying to draw here. It previously had trouble generating full glasses of wine, they made some changes, now it can. As a result, AIs are capable of generating an image of a full wine glass.

This is just another goalpost that’s been blown past, like the “AI will never be able to draw hands correctly” thing that was so popular back in the day. Now AIs are quite good at drawing hands, and so new “but they can’t do X!” Standards have been invented. I see no fundamental reason why any of those standards won’t ultimately be surpassed.

badcommandorfilename@lemmy.world · 3 个月前

Ask a human to draw an orc. How do they know what an orc looks like? They read Tolkien’s books and were “inspired” Peter Jackson’s LOTR.

Unpopular opinion, but that’s how our brains work.

burntbacon@discuss.tchncs.de · edit-2 3 个月前

Fuck you, I won’t do what you tell me!

>.>

<.<

spoiler

I was inspired by the sometimes hilarious dnd splatbooks, thank you very much.

ɯᴉuoʇuɐ@lemmy.dbzer0.com · 3 个月前

AI can “learn” from and “read” a book in the same way a person can and does,

If it’s in the same way, then why do you need the quotation marks? Even you understand that they’re not the same.

And either way, machine learning is different from human learning in so many ways it’s ridiculous to even discuss the topic.

AI doesn’t reproduce a work that it “learns” from

That depends on the model and the amount of data it has been trained on. I remember the first public model of ChatGPT producing a sentence that was just one word different from what I found by googling the text (from some scientific article summary, so not a trivial sentence that could line up accidentally). More recently, there was a widely reported-on study of AI-generated poetry where the model was requested to produce a poem in the style of Chaucer, and then produced a letter-for-letter reproduction of the well-known opening of the Canterbury Tales. It hasn’t been trained on enough Middle English poetry and thus can’t generate any of it, so it defaulted to copying a text that probably occurred dozens of times in its training data.

ᕙ(⇀‸↼‶)ᕗ@lemm.ee · 3 个月前

i will train my jailbroken kindle too…display and storage training… i’ll just libgen them…no worries…it is not piracy

axEl7fB5@lemmy.cafe · 3 个月前

why do you even jailbreak your kindle? you can still read pirated books on them if you connect it to your pc using calibre

Vanilla_PuddinFudge@infosec.pub · edit-2 3 个月前

.mobi sucks
koreader doesn’t

ᕙ(⇀‸↼‶)ᕗ@lemm.ee · 3 个月前

when not in use i have it load images from my local webserver that are generated by some scripts and feature local news or the weather. kindle screensaver sucks.

j0ester@lemmy.world · edit-2 3 个月前

Hehe jailbreak an Android OS. You mean “rooting”.

minorkeys@lemmy.world · edit-2 3 个月前

Of course we have to have a way to manually check the training data, in detail, as well. Not reading the book, im just verifying training data.

fum@lemmy.world · 3 个月前

What a bad judge.

This is another indication of how Copyright laws are bad. The whole premise of copyright has been obsolete since the proliferation of the internet.

gian @lemmy.grys.it · 3 个月前

What a bad judge.

Why ? Basically he simply stated that you can use whatever material you want to train your model as long as you ask the permission to use it (and presumably pay for it) to the author (or copytight holder)

j0ester@lemmy.world · 3 个月前

Huh? Didn’t Meta not use any permission, and pirated a lot of books to train their model?

gian @lemmy.grys.it · 3 个月前

True. And I will be happy if someone sue them and the judge say the same thing.

patatahooligan@lemmy.world · 3 个月前

“Fair use” is the exact opposite of what you’re saying here. It says that you don’t need to ask for any permission. The judge ruled that obtaining illegitimate copies was unlawful but use without the creators consent is perfectly fine.

LifeInMultipleChoice@lemmy.world · edit-2 3 个月前

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

They may be trying to put safeguards so it isn’t directly happening, but here is an example that the text is there word for word:

VoterFrog@lemmy.world · 3 个月前

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

A student can absolutely buy a text book and then teach the other students the information in it for free. That’s not redistribution. Redistribution would mean making copies of the book to hand out. That’s illegal for people and companies.

LifeInMultipleChoice@lemmy.world · edit-2 3 个月前

The language model isn’t teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source. It is not alive, it has no thoughts. It has no “its own words.” (As seen by the judgement that its words cannot be copyrighted.) It only has other people’s words. Every word it spits out by definition is plagiarism, whether the work was copyrighted before or not.

People wonder why works, such as journalism are getting worse. Well how could they ever get better if anything a journalist writes can be absorbed in real time, reworded and regurgitated without paying any dos to the original source. One journalist article, displayed in 30 versions, dividing the original works worth up into 30 portions. The original work now being worth 1/30th its original value. Maybe one can argue it is twice as good, so 1/15th.

Long term it means all original creations… Are devalued and therefore not nearly worth pursuing. So we will only get shittier and shittier information. Every research project… Physics, Chemistry, Psychology, all technological advancements, slowly degraded as language models get better, and original sources deminish returns.

booly@sh.itjust.works · 3 个月前

just spitting the information back out, without paying the copyright source

The court made its ruling under the factual assumption that it isn’t possible for a user to retrieve copyrighted text from that LLM, and explained that if a copyright holder does develop evidence that it is possible to get entire significant chunks of their copyrighted text out of that LLM, then they’d be able to sue then under those facts and that evidence.

It relies heavily on the analogy to Google Books, which scans in entire copyrighted books to build the database, but where users of the service simply cannot retrieve more than a few snippets from any given book. That way, Google cannot be said to be redistributing entire books to its users without the publisher’s permission.

VoterFrog@lemmy.world · 3 个月前

The language model isn’t teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source.

You could honestly say the same about most “teaching” that a student without a real comprehension of the subject does for another student. But ultimately, that’s beside the point. Because changing the wording, structure, and presentation is all that is necessary to avoid copyright violation. You cannot copyright the information. Only a specific expression of it.

There’s no special exception for AI here. That’s how copyright works for you, me, the student, and the AI. And if you’re hoping that copyright is going to save you from the outcomes you’re worried about, it won’t.

gian @lemmy.grys.it · 3 个月前

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

Well, it would be interesting if this case would be used as precedence in a case invonving a single student that do the same thing. But you are right

fum@lemmy.world · 3 个月前

This was my understanding also, and why I think the judge is bad at their job.

LifeInMultipleChoice@lemmy.world · 3 个月前

I suppose someone could develop an LLM that digests textbooks, and rewords the text and spits it back out. Then distribute it for free page for page. You can’t copy right the math problems I don’t think… so if the text wording is what gives it credence, that would have been changed.

WraithGear@lemmy.world · 3 个月前

If a human did that it’s still plagiarism.

LifeInMultipleChoice@lemmy.world · edit-2 3 个月前

Oh I agree it should be, but following the judges ruling, I don’t see how it could be. You trained an LLM on textbooks that were purchased, not pirated. And the LLM distributed the responses.

(Unless you mean the human reworded them, then yeah, we aren’t special apparently)

FreedomAdvocate@lemmy.net.au · 3 个月前

Not at all true. AI doesn’t just reproduce content it was trained on on demand.

WraithGear@lemmy.world · 3 个月前

It can, the only thing stopping it is if it is specifically told not to, and this consideration is successfully checked for. It is completely capable of plagiarizing otherwise.

FaceDeer@fedia.io · 3 个月前

For the purposes of this ruling it doesn’t actually matter. The Authors claimed that this was the case and the judge said “sure, for purposes of argument I’ll assume that this is indeed the case.” It didn’t change the outcome.

WraithGear@lemmy.world · 3 个月前

I mean, they can assume fantasy, and it will hold weight because laws are interpreted by the court, not because the court is correct.

FaceDeer@fedia.io · 3 个月前

It made the ruling stronger, not weaker. The judge was accepting the most extreme claims that the Authors were making and still finding no copyright violation from training. Pushing back those claims won’t help their case, it’s already as strong as it’s ever going to get.

As far as the judge was concerned, it didn’t matter whether the AI did or did not “memorize” its training data. He said it didn’t violate copyright either way.

FaceDeer@fedia.io · 3 个月前

That’s not at all what this ruling says, or what LLMs do.

Copyright covers a specific concrete expression. It doesn’t cover the information that the expression conveys. So if I paint a portrait of myself, that portrait is covered by copyright. If someone looks at the portrait and says “this is a portrait of a tall, dark, handsome deer-creature of some sort with awesome antlers” they haven’t violated that copyright even if they’re accurately conveying the same information that the portrait is conveying.

The ruling does cover the assumption that the LLM “contains” the training text, which was asserted by the Authors and was not contested by Anthropic. The judge ruled that even if this assertion is true it doesn’t matter. The LLM is sufficiently transformative to count as a new work.

If you have an LLM reproduce a copyrighted text, the text is still copyrighted. That doesn’t change. Just like if a human re-wrote it word-for-word from memory.

LifeInMultipleChoice@lemmy.world · 3 个月前

It’s a horrible ruling. If you want to see why I say so I put some of the reasonung in the other comment who responded to that.

https://lemmy.world/comment/17884664

😈MedicPig🐷BabySaver😈@lemmy.world · 3 个月前

Fuck the AI nut suckers and fuck this judge.

BlueMagma@sh.itjust.works · 3 个月前

This ruling stated that corporations are not allowed to pirate books to use them in training. Please read the headlines more carefully, and read the article.

😈MedicPig🐷BabySaver😈@lemmy.world · 3 个月前

Nah, my comment stands.

Randomgal@lemmy.ca · 3 个月前

You’re poor? Fuck you you have to pay to breathe.

Millionaire? Whatever you want daddy uwu

eestileib@lemmy.blahaj.zone · 3 个月前

That’s kind of how I read it too.

But as a side effect it means you’re still allowed to photograph your own books at home as a private citizen if you own them.

Prepare to never legally own another piece of media in your life. 😄

Prox@lemmy.world · 3 个月前

FTA:

Anthropic warned against “[t]he prospect of ruinous statutory damages—$150,000 times 5 million books”: that would mean $750 billion.

So part of their argument is actually that they stole so much that it would be impossible for them/anyone to pay restitution, therefore we should just let them off the hook.

krashmo@lemmy.world · 3 个月前

Funny how that kind of thing only works for rich people

modifier@lemmy.ca · 3 个月前

Hold my beer.

Womble@lemmy.world · 3 个月前

The problem isnt anthropic get to use that defense, its that others dont. The fact the the world is in a place where people can be fined 5+ years of a western European average salary for making a copy of one (1) book that does not materially effect the copyright holder in any way is insane and it is good to point that out no matter who does it.

Buske@lemmy.world · 3 个月前

Ahh cant wait for hedgefunds and the such to use this defense next.

artifex@lemmy.zip · 3 个月前

Ah the old “owe $100 and the bank owns you; owe $100,000,000 and you own the bank” defense.

IllNess@infosec.pub · 3 个月前

In April, Anthropic filed its opposition to the class certification motion, arguing that a copyright class relating to 5 million books is not manageable and that the questions are too distinct to be resolved in a class action.

I also like this one too. We stole so much content that you can’t sue us. Naming too many pieces means it can’t be a class action lawsuit.

interdimensionalmeme@lemmy.ml · 3 个月前

What is means is they don’t own the models. They are the commons of humanity, they are merely temporary custodians. The nightnare ending is the elites keeping the most capable and competent models for themselves as private play things. That must not be allowed to happen under any circumstances. Sue openai, anthropic and the other enclosers, sue them for trying to take their ball and go home. Disposses them and sue the investors for their corrupt influence on research.

Lovable Sidekick@lemmy.world · edit-2 3 个月前

Lawsuits are multifaceted. This statement isn’t a a defense or an argument for innocence, it’s just what it says - an assertion that the proposed damages are unreasonably high. If the court agrees, the plaintiff can always propose a lower damage claim that the court thinks is reasonable.

Thistlewick@lemmynsfw.com · 3 个月前

You’re right, each of the 5 million books’ authors should agree to less payment for their work, to make the poor criminals feel better.

If I steal $100 from a thousand people and spend it all on hookers and blow, do I get out of paying that back because I don’t have the funds? Should the victims agree to get $20 back instead because that’s more within my budget?

Womble@lemmy.world · 3 个月前

You think that 150,000 dollars, or roughly 180 weeks of full time pretax wages at 15$ an hour, is a reasonable fine for making a copy of one book which doe no material harm to the copyright holder?

Thistlewick@lemmynsfw.com · 3 个月前

No I don’t, but we’re not talking about a single copy of one book, and it is grovellingly insidious to imply that we are.

We are talking about a company taking the work of an author, of thousands of authors, and using it as the backbone of a machine that’s goal is to make those authors obsolete.

When the people who own the slop-machine are making millions of dollars off the back of stolen works, they can very much afford to pay those authors. If you can’t afford to run your business without STEALING, then your business is a pile of flaming shit that deserves to fail.

Womble@lemmy.world · edit-2 3 个月前

Except it isnt, because the judge dismissed that part of the suit, saying that people have complete right to digitise and train on works they have a legitimate copy of. So those damages are for making the unauthorised copy, per book.

And it is not STEALING as you put it, it is making an unauthorised copy, no one loses anything from a copy being made, if I STEAL your phone you no longer have that phone. I do find it sad how many people have drunk the capitalist IP maximalist stance and have somehow convinced themselves that advocating for Disney and the publishing cartel being allowed to dictate how people use works they have is somehow sticking up for the little guy

Lovable Sidekick@lemmy.world · edit-2 3 个月前

None of the above. Every professional in the world, including me, owes our careers to looking at examples of other people’s work and incorporating their work into our own work without paying a penny for it. Freely copying and imitating what we see around us has been a human norm for thousands of years - in a process known as “the spread of civilization”. Relatively recently it was demonized - for purely business reasons, not moral ones - by people who got rich selling copies of other people’s work and paying them a pittance known as a “royalty”. That little piece of bait on the hook has convinced a lot of people to put a black hat on behavior that had been considered normal forever. If angry modern enlightened justice warriors want to treat a business concept like a moral principle and get all sweaty about it, that’s fine with me, but I’m more of a traditionalist in that area.

Thistlewick@lemmynsfw.com · 3 个月前

Nobody who is mad at this situation thinks that taking inspiration, riffing on, or referencing other people’s work is the problem when a human being does it. When a person writes, there is intention behind it.

The issue is when a business, owned by those people you think ‘demonised’ inspiration, take the works of authors and mulch them into something they lovingly named “The Pile”, in order to create derivative slop off the backs of creatives.

When you, as a “professional”, ask AI to write you a novel, who is being inspired? Who is making the connections between themes? Who is carefully crafting the text to pay loving reference to another authors work? Not you. Not the algorithm that is guessing what word to shit out next based on math.

These businesses have tricked you into thinking that what they are doing is noble.

Lovable Sidekick@lemmy.world · 3 个月前

That’s 100% rationalization. Machines have never done anything with “inspiration”, and that’s never been a problem until now. You probably don’t insist that your food be hand-carried to you from a farm, or cooked over a fire you started by rubbing two sticks together. I think the mass reaction against AI is part of a larger pattern where people want to believe they’re crusading against evil without putting out the kind of effort it takes to fight any of the genuine evils in the world.

Phoenixz@lemmy.ca · edit-2 3 个月前

This version of too big to fail is too big a criminal to pay the fines.

How about we lock them up instead? All of em.

MTK@lemmy.world · 3 个月前

Check out my new site TheAIBay, you search for content and an LLM that was trained on reproducing it gives it to you, a small hash check is used to validate accuracy. It is now legal.

booly@sh.itjust.works · 3 个月前

The court’s ruling explicitly depended on the fact that Anthropic does not allow users to retrieve significant chunks of copyrighted text. It used the entire copyrighted work to train the weights of the LLMs, but is configured not to actually copy those works out to the public user. The ruling says that if the copyright holders later develop evidence that it is possible to retrieve entire copyrighted works, or significant portions of a work, then they will have the right sue over those facts.

But the facts before the court were that Anthropic’s LLMs have safeguards against distributing copies of identifiable copyrighted works to its users.

nodiratime@lemmy.world · edit-2 3 个月前

Does it “generate” a 1:1 copy?

MTK@lemmy.world · 3 个月前

You can train an LLM to generate 1:1 copies

S_H_K@lemmy.dbzer0.com · 3 个月前

Gives you versions like this

S_H_K@lemmy.dbzer0.com · 3 个月前

Learning

Machine peepin’ is tha study of programs dat can improve they performizzle on a given task automatically.[41] It has been a part of AI from tha beginning.[e] In supervised peepin’, tha hustlin data is labelled wit tha expected lyrics, while up in unsupervised peepin’, tha model identifies patterns or structures up in unlabelled data.

There is nuff muthafuckin kindz of machine peepin’.

  😗👌

kazerniel@lemmy.world · 3 个月前

thanks I hate it xD

altphoto@lemmy.today · 3 个月前

So authors must declare legally “this book must not be used for AI training unless a license is agreed on” as a clause in the book purchase.

Dragomus@lemmy.world · 3 个月前

So, let me see if I get this straight:

Books are inherently an artificial construct. If I read the books I train the A(rtificially trained)Intelligence in my skull.
Therefore the concept of me getting them through “piracy” is null and void…

JcbAzPx@lemmy.world · 3 个月前

No. It is not inherently illegal for AI to “read” a book. Piracy is going to be decided at trial.

GreenKnight23@lemmy.world · 3 个月前

I am training my model on these 100,000 movies your honor.

BlueMagma@sh.itjust.works · 3 个月前

This ruling stated that corporations are not allowed to pirate books to use them in training. Please read the headlines more carefully, and read the article.

GreenKnight23@lemmy.world · 3 个月前

thank you Captain Funsucker!

DragonTypeWyvern@midwest.social · 3 个月前

Trains model to change one pixel per frame with malicious intent

sugar_in_your_tea@sh.itjust.works · 3 个月前

From dark gray to slightly darker gray.

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Claude AI maker Anthropic bags key “fair use” win for AI platforms, but faces trial over damages for millions of pirated works – ai fray