So when’s the ruling against OpenAI and the like using the same copyrighted material to train their models
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
Hot on the heels of this one, I'd imagine.
Aaaaaany minute now.
Fat chance. Line must go up.
So, let's say we create an llm that will be fed will all the copyrighted data and we design it, so that it recalls the originals when asked?! Does that count as piracy or as the kind of legal shananigans openai is doing?
But OpenAI not being allowed to use the content for free means they are being prevented from making a profit, whereas the Internet Archive is giving away the stuff for free and taking away the right of the authors to profit. /s
Disclaimer: this is the argument that OpenAI is using currently, not my opinion.
Ah, I see you got that all wrong.
Open IA uses that content to generate billions in profit on the backs of The People. The Internet Archive just does it for the good of The People.
We can't have that. "Good for The People" is not how the economy works, pal. We need profit and exploitation for the world to work...
It's two different things happening. One is redistribution, which isn't allowed and the other is fair use, which is allowed. You can't ban someone from writing a detailed synopsis of your book. That's all an llm is doing. It's no different than a human reading the material and then using that to write something similar.
Fuck Copyright.
A system for distributing information and rewarding it's creators should not be one based on scarcity, given that it costs nothing to copy and distribute information.
It was fine when the limited duration was a reasonable number of years. Anything over 30 years max before being in the public domain is too long.
Yeah. In a better world where the US court system doesn't get weaponized and rulings aren't delayed for years or decades, I would argue 8 to 15 years is the reasonable number, depending on the type of information being copyrighted.
Not a surprise, but still somehow crushing. It's a loss for us all.
Really unfortunate. I wonder why nobody foresaw this when they started the stupid NEL thing.
“We are reviewing the court’s opinion and will continue to defend the rights of libraries to own, lend, and preserve books.”
Unpopular opinion: They stepped out of their fucking lane. There are already laws that protect actual libraries, in fact most nations have laws to ensure libraries have access to all locally published works.
One good thing to come of this is I've now joined my national and local libraries.
Agreed. While a noble cause, it was honestly predictable.
I don't understand why they did that. Their status was already quite shaky. They really shot themselves and their users in the foot
The Internet Archive is a library.
Not only are they a member of the Boston Library Consortium, but their entire operation is based around preserving not just webpages, but books, and other forms of media.
They even offer loans of various materials to and from other libraries, and digitize & archive works from the Library of Congress, the Smithsonian, the New York Public Library, and more.
To say the Internet Archive isn't an "actual library," and has "stepped out of their fucking lane" is ridiculous.
This ruling doesn't just affect the Internet Archive, it affects every single other library out there that wants to lend ebooks, and digitize their existing physical copies of books for digital lending.
Other libraries have licenses. And follow them.
Internet archive digitized actual books and lent out copies (which was already 100% not legal under current law), then thought it was a good idea to just say "fuck it" and remove the thin veil of legitimacy that kept publishers from caring too much by removing the "one copy at a time per book" policy and daring the publishers to do something about it.
How's them boots taste? Can you taste your corporate overlords butthole?
How about, instead of throwing a tantrum about the courts doing the only thing they had any authority to do, you spend your efforts lobbying to fix IP law?
Yeah because that has ever changed anything. I'll just keep voting harder while I'm at it.
They removed the one copy rule temporarily, during the pandemic, it's now in place again. But the publishers have made any digitized lending illegal, not just more than one copy, any digitized lending. It is now illegal for them to scan and distribute even one single copy of any book.
It was never a problem with the single-copy restriction, and the publishers didn't bring up that restriction at all as the purpose of the suit, instead attacking the entirety of scanning & lending, even using Controlled Digital Lending (CDL) systems, like the Internet Archive, and other libraries use.
Even regardless of that, the First-sale Doctrine enables all existing secondary markets for copyrighted material. It's how you can lend a book to a friend, sell a used book after you're finished it, or swap copies of a video game on disk with somebody.
The Internet Archive is included in this. Changing the method of distribution (lending a digital copy vs a physical copy) has no functional distinction, and the publishers in the lawsuit were not able to demonstrate material harm, instead just stating that it wasn't "fair use," and should thus be illegal, regardless of the fact that they weren't harmed by the supposedly non-fair use.
And on top of that, fuck the law if it's unjust. I don't care if it's supposedly (even if not true) "100% not legal under current law" to do, it should be, and this ruling is unjust.
Any digitized lending was always illegal.
The law was abundantly clear. You cannot distribute wholesale copies of someone else's work. Publishers didn't bother because the scale was small and they didn't want to take the PR hit for a scale that didn't matter.
The first sale doctrine, necessarily, can only possibly apply to a physical object. There is no such thing as a "single copy" of a digital object. Every time that "single copy" moves is a new copy. There is no legal framework in the US that even acknowledges the premise of a digital copy. It's always a license.
You need new laws to apply to the digital world. There is absolutely zero room for ambiguity that what the Internet archive did never in any way was protected. This ruling was a literal guarantee the minute the Internet Archive removed their (unambiguously not in any way legal) pretense of a "single copy". There isn't a court in the country that would even consider ruling any other way, because the law is well beyond clear. This ruling happened because the Internet Archive forced it to happen. If they had left open mass scale piracy to pirate sites they would have been fine.
If their lawyers advised them that there was even a possibility that this argument could work, they should be disbarred. They would be better off spending their money on lobbying for better laws than pursuing a case less likely than winning the power ball jackpot 5 draws in a row.
Any digitized lending was always illegal.
the law is well beyond clear.
I think Title 17, Chapter 108 of the U.S. Code would beg to differ. Digitized lending was always allowed, especially for libraries and archives. The only ambiguous part was the number of copies allowed to be digitized of any individual work, (many of the books the Internet Archive digitized only had one copy digitized and lent at any given time) so most of what the Internet Archive engaged in was fully legal under this code, and only a fraction of the 500 million titles that are now illegal to lend would have been affected, even though all 500 million can now not be legally lent due to this ruling.
You need new laws to apply to the digital world.
True, we can agree on that. We need new laws. Until that point, no change will happen if the boundaries are not pushed.
I guarantee you there hasn't been anywhere near the current level of momentum for the rights of libraries to lend digitized books any time prior to this court case. If the Internet Archive hadn't done it in the first place, we would be in the same situation we're in after this ruling.
Them doing so pushes the issue forward.
This ruling was a literal guarantee the minute the Internet Archive removed their (unambiguously not in any way legal) pretense of a “single copy”
As I'll say again, this was not the premise under which the publishers won this case. They won the case under the premise that any digitized lending was not transformative, and thus not "fair use," even though it's legal under other statutes. The number of copies held no bearing on the ruling.
Literally every digital "loan" is multiple separate, unrecoverable copies. That law is not about digital lending and cannot be applied to digital lending.
All digital lending of copyrighted material without an explicit license to do so is copyright infringement, and it was always a guarantee that the ruling would happen.
The removal of the "single copy" lie isn't relevant to the legal status. It's relevant because it forced the hands of the publishers to take action. There was never any possibility of any ruling but the obvious blanket "you can't do that" that the law dictates, once IA forced them to take it to court.
That law is not about digital lending and cannot be applied to digital lending.
That's provably incorrect.
"it is not an infringement of copyright for a library or archives [...] to reproduce no more than one copy or phonorecord of a work"
Title 17, USC 101 defines a copy as "...material objects, other than phonorecords, in which a work is fixed by any method now known or later developed, and from which the work can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device..."
Digital replication falls under the legal definition of copying in the US Code, and is directly cited in the prior section of the code I reference in my last reply.
The Internet Archive's loans also utilize DRM, a standard kind of software used by every other library out there to restrict further replication of copies. This same technology is in use with libraries who have contracts with publishers to directly download and publish digital copies of non-printed ebooks, which would violate that contract by not using DRM. The Internet Archive, without any express contract from publishers, is still implementing the strongest measures of protection that the publishers themselves would require whether or not content was directly licensed from them instead of being scanned in from a physical copy.
It’s relevant because it forced the hands of the publishers to take action.
Nothing forced them to do anything. These publishers voluntarily decided to file a lawsuit because of mounting pressure from libraries as a collective to stop charging insanely high prices on ebook rentals from publishers, which they saw as being undermined by the fact that the Internet Archive was able to still pay for the books in question, but lend them out in the same manner that physical books are already lent, just through a screen.
As I mentioned before, if the Internet Archive had never done this in the first place, public outcry would be practically nonexistent, and the Internet Archive wouldn't be lending out those books at all, just like they're not legally able to now. There is no difference to if they had or had not done this, other than the fact that it is now more visible in the public sphere, and has active legal challenges instead of being quietly subverted by regulation and practices publishers have continued to mount against all libraries to re-establish what it means to own a copyrighted work.
It is literally impossible to send a file over the internet with no more than one copy. Every additional "loan" is multiple additional copies. Even if we ignore that, you're very conveniently ignoring the "material objects" part of that definition, which again, completely and unconditionally disqualifies a loan over the internet.
DRM is entirely irrelevant. It has no bearing on anything.
They filed a lawsuit because IA flagrantly and egregiously violated their rights. They openly fucking dared them to. And now they don't get ignored on their limited copy illegal lending and can't get away with any copies.
you’re very conveniently ignoring the “material objects” part of that definition
I'm not, it's just that the wording of the definition could lead to you interpreting it as such. It does not mean what you think it means.
In essence, it's saying that if a material object is "fixed" (under copyright law, that tends to mean captured in a medium that allows it to be perceived, reproduced, or otherwise communicated) it is considered a copy. Copyright law generally considers things like written texts (i.e. transcribing a book onto other sheets of paper) to be copies, but it also includes things like recordings, which are very much nontangible. (although still stored on tangible hardware) Also, note the "either directly or with the aid of a machine or device" section of that description.
DRM is entirely irrelevant. It has no bearing on anything.
The fact you consider DRM to be irrelevant in a conversation about managing legal access to digitally distributed content shows a lot about your understanding of this topic, to say the least.
DRM is highly relevant. If it were not, then all libraries would already be illegally publishing copies under the agreements they sign with publishers when they distribute books through DRM-protected applications like Overdrive or Libby. Legal consequence also does not extend past the original publisher if the intent was clearly not to deliberately allow for further copying. (i.e. if the Internet Archive stated they lent books so users could copy them and later share them with friends, that would be a violation. Instead, they have loan terms, limits, and DRM)
If anything done by a user after the lending of any material, outside of reasonable safeguards (like DRM) was to be considered illegal, then any store would be liable if someone used a kitchen knife to kill someone, and any chemical distributor would be liable if someone in a lab mixed the wrong chemicals together and made an explosion. Liability has an end point, and DRM helps signify that by placing technical restrictions on redistribution of material, while also carrying heavy legal penalties for breaking it, which would not be present if it wasn't applied in the first place.
Publishers should not be able to sue libraries for lending their books, digital, physical, or otherwise. Especially when the publishers could not demonstrate any material harm.
You are actively defending multi-billion dollar publishing companies suing a library for lending content they legally acquired, using faulty interpretations of the law, and deference to lawsuits as a means of judging the morality of actions. You haven't made a single point that wasn't either verifiably untrue, or misinformed.
I would advise you to reevaluate your position.
DRM is relevant to the legal redistribution because that is part of the terms of their license agreement and for no other reason. Their entire lending practice of digital copies is legal because, and only because, they have contracts that specifically determine how they may do so.
It does not in any way alter the nature of blatant illegal copies. It does not make every loan not multiple distinct illegal copies.
I'm actively opposing people telling insane, completely unhinged lies that aren't even loosely connected to reality to validate a position that every single person with a shred of common sense knew was going to get laughed out of court the day they did it and did get laughed out of court. If you tried this case a million times, Internet Archive wouldn't have a chance in any of them.
Petition for changes to the law. Don't lie and pretend the law says what you want it to.
DRM is relevant to the legal redistribution because that is part of the terms of their license agreement and for no other reason.
This is simply not true. If someone takes means to prevent illegal action, in a situation where they can choose to either do so, or not do so, taking those means shows they are attempting to prevent any negative legal outcomes.
The Internet Archive was explicitly, voluntarily enacting similar policy to libraries that directly license books from publishers, because they knew that it would show they were making an effort to lend responsibly. To me, it seems they carried on this set of ethics to when they opened up more copies than they originally had on hand, because that was during a time when library branches were becoming physically inaccessible, and physical resources were becoming increasingly hard to access, thus, responsible lending would include effectively making the inaccessible physical copies in other libraries accessible. That part might not be considered legal, but again, who cares? These publishers saw a substantial increase in profits during the time they were supposedly hurt by the Internet Archive, and continue to squeeze traditional libraries for every penny they can get under exploitative lending agreements. What the Internet Archive did was for the objective moral good of society.
If anything's illegal, it's compelling libraries to only license your content directly from you for a higher rate, while trying to discourage them from using the physical copies they can buy once like any other sane person.
Petition for changes to the law. Don’t lie and pretend the law says what you want it to.
I have not misrepresented the law by pretending it says something else. I have given you citations and quotes straight from the letter of the law, directly backing up my claims, while proving your blanket statements that all digitized lending was illegal as patently false.
Petitioning to change the law is not the only way to change it. For instance, I believe piracy from, say, streaming services, is ethical, if those same streaming services are jacking up rates, adding ads, and enshittifying their core product for the sake of making a quick buck. how else are you supposed to change things?
I'm sure you've seen the immense public backlash and legislative attempts to fix the rapidly enshittifying entertainment industry. They haven't worked.
Look, even regardless of all my arguments for how I believe the vast majority of what the Internet Archive did was legal, I don't care if it was. Because, in the end, If you own a book, you should be allowed to let other people read it. If people are losing access to literature, you should be able to make it available to as many people as possible. If companies are rapidly exploiting the public library system and looting it for everything it has, you should be able to offer an alternative.
These publishers do not deserve my, nor your sympathy.
Direct link to the court document: https://storage.courtlistener.com/recap/gov.uscourts.ca2.60988/gov.uscourts.ca2.60988.306.1.pdf
Side note: court listener's RECAP is often quite disliked by the legal system. They do not like it when people put stuff from PACER fee waved sources on there like Aaron Schwartz did. https://en.m.wikipedia.org/wiki/Free_Law_Project
Woah, I wish I had known about this sooner. Thanks!
If OpenAI can get away with going through copy-righted material, then the answer to piracy is simple: round up a bunch of talented Devs from the internet who are writing and training AI models, and let's make a fantastic model trained on what the internet archive has. Tell you what, let Mistral's engineers lead that charge, and put an AGPL license on the project so that companies can't fuck us over.
I refuse to believe that nobody has thought of this yet
What do you think Mistral trains its models on? Public domain stuff?
An AI trained on old Internet material would be like a synthetic Grandpa Simpson:
"In my day we said 'all your base' and laughed all day long, because it took all day to download the video."
I wonder who’ll end up buying the archive.org domain and what they’ll use it for
The archive isn't completely dead with that yet. There is still a lot of free domain stuff and private uploads on there. A lot of public records too.
And I think you can't just randomly buy a .org domain, can you? You have to be officially a nonprofit.
I remember for example couchsurfing had to change from a .org to .com when their tax exempt status was rejected by the irs and they went for profit.
You definitely can just buy a .org, I own multiple.
The archive isn't completely dead with that yet.
They've just been sued into almost certain bankruptcy.
And I think you can't just randomly buy a .org domain, can you? You have to be officially a nonprofit.
lol, no.
.org just means "organization". There are literally no rules on who can own one.
4chan is a .org domain...
what does warrior do? The git readme seems to just be setup instructitons
If only the readme clearly said what it was with a link you could click…
Yeah I'm wondering as well. It seems to save webpages, whereas the issue is with scanned books which may be removed from IA...
I had the same question. Here's the answer:
The Archive Team Warrior is a virtual archiving appliance. You can run it to help with the Archive Team archiving efforts. It will download sites and upload them to our archive—and it’s really easy to do!
The warrior is a container running inside a virtual machine, so there is almost no security risk to your computer. ("Almost", because in practice nothing is 100% secure.) The warrior will only use your bandwidth and some of your disk space, as well as some of your CPU and memory. It will get tasks from and report progress to the Tracker.
Libgen.rs
o7