this post was submitted on 25 Aug 2023
88 points (100.0% liked)
Technology
37739 readers
500 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
It's baffling to me seeing comments like this as if the 'AI' is some natural intelligence just hanging out going around reading books it's interested in for the hell of it.. No. These are software companies illegally using artists works (which we require licensing for commercial use) to develop a commercial, profit generating product. Whatever the potential outputs of the AI are is irrelevant when the sources used to train it were obtained illegally.
There is nothing illegal about what they're doing. You may want it to be illegal, but it's not illegal until laws are actually passed to make it illegal. Things are not illegal by default.
Copyright only prevents copying works. Not analyzing them. The results of the analysis are not the same as the original work.
It is illegal. As an artist, if another individual or company wants to use my work for their own commercial purposes in any way, even if just to 'analyze' (since the analysis is part of their private commercial product), they still need to pay for a license to do so. Otherwise it's an unauthorized use and theft. Copyright doesn't even play into it at that point, and would be a separate issue.
I think you need to review the relevant laws, that's not true.
For example, your comment that I'm responding to is copyrighted and you own the copyright. I just quoted part of it in my response without your permission, and that's an entirely legal fair use. I also pasted your comment into Notepad++ and did a word count, there are 64 words in it. That didn't break any laws either.
A lot of people have very expansive and incorrect ideas about how intellectual property works.
First of all, a random online comment is not protected by copyright law afaik.
Secondly, if you did take something protected by copyright and then used it for commercial purposes (to make money off it), like these LLMs do, then you would be breaking the law.
In short, I'd say you are using a flawed analogy from the start.
Also copyright is not about just copying but also distributing as well. Playing.(radio) songs in your coffee shop for clients is treated differently than you listening to it at home. You generally can't just profit off someone else's work without them allowing it.
You got a fundamental aspect of copyright law wrong right in the first line.
Your comments are indeed protected by copyright.
That's wrong too. Whether or not someone's making money off of a copyright violation will affect the damages you can sue them for, but it's copyright violation either way.
Technically true, but what does it have to do with these circumstances?
Generally speaking, sure you can. Why couldn't you? People do work that other people profit off of all the time. If a carpenter builds a desk and then I go sit at it while doing my job and earning millions of dollars, I don't need to ask the carpenter's permission.
Copyright has a few extra limitations, but those limitations are on copying stuff without permission.
Yet what these companies are doing does not constitute 'fair use', period, no matter how much you want to argue otherwise.
Simply repeating "no it isn't" isn't an argument.
theft removes the original. copying is not theft.
I keep rereading this comment and as someone in R&D... I'm so astonished that people think that companies just spontaneously come up with everything they produce without looking around. Companies start off almost every venture by analyzing any work in the field that's been done and reverse engineering it. It's how basically anyone you've heard of works. It goes double for art. Inspiration is key for art. Composers will break down the sheet music of great compositions, graphic designers will have walls full of competitors designs, cinematographers will study movies frame by frame.
It's illegal to create a derivative work which is what the output of an LLM is.
No, it's not. Something that is merely in the style of something else is not a derivative work. If that were the case there'd be lawsuits everywhere.
LLMs regurgitate their training set. This has been proven many times. In fact from what I've seen LLMs are either regurgitating or hallucinating.
With great respect I believe that to be a gross simplification of what an LLMs does. There is no training set stored in the LLM, only statistics about what word set is likely to follow what word set. There is not regurgitation of the date - if that was the case, they temperature parameter wouldn’t matter when it very much does.
A slightly compressed JPG of an oil painting is still, at least for purposes of intellectual property rights, not distinct from the original work on canvas. Sufficiently complex and advanced statistics on a work are not substantially different from the work itself. It's just a different way of storing a meaningful representation.
These LLMs are all more or less black boxes. We really cannot conclusively say one way or another whether they are storing and using the full original work in some form or another. We do know that they can be coaxed into spitting out the original work, though, which sure implies it is in there.
And if the work of a human that needs to be fed is being used by one of these bots -- which is pretty much by definition a commercial purpose given that all the relevant bots are operated as such -- then that human should be getting paid.
Only very rarely, under extreme cases of overfitting. Overfitting is a failure state that LLM trainers want to avoid anyway, for reasons unrelated to copyright.
There simply isn't enough space in a LLM's neural network to be storing actual copies of the training data. It's impossible, from a data compression perspective, to fit it in there.
There are lawsuits everywhere
It's not just style. From what I understand (I've never used any of the generative AI tools), they supposedly can and do output chunks of text verbatim from copyrighted works.
Yeah, and even if it WERE truly intelligent -- which these SALAMIs are almost certainly not -- it doesn't even matter.
A human and a robot are not the same. They have different needs and must be afforded different moral protections. Someone can buy a book, read it, learn from it, and incorporate things it learned from that experience into their own future work. They may transform it creatively or it may plagiarize or it may rest in some grey area in-between where it isn't 100% clear if it was novel or plagiarized. All this is also true for a LLM "AI". -- But whether or not this process is fundamentally the same or not isn't even a relevant question.
Copyright law isn't something that exists because it is a pure moral good to protect the creative output of a person from theft. It would be far more ethical to say that all the outputs of human intellect should be shared freely and widely for all people to use, unencumbered by such things. But if creativity is rewarded with only starvation, creativity will go away, so copyright exists as a compromise to try and ensure there is food in the bellies of artists. And with it, we have an understanding that there is a LOT of unclear border space where one artist may feed on the output of another to hopefully grow the pot for everyone.
The only way to fit generative bots into the philosophical framework of copyright is to demand that the generative bots keep food in the bellies of the artists. Currently, they threaten it. It's just that simple. People act like it's somehow an important question whether they "learn" the same way people do, but the question doesn't matter at all. Robots don't get the same leeway and protection afforded to humans because robots do not need to eat.
Well said.
no. it exists to stop 18th century london print shops from breaking each others' knees, and subsequently has been expanded to continue to serve the interests of industry.
I think it's a pretty important question whether we're reaching the end of the distinction between human and machine. People will begin to use machine minds more and more as part of their work. Tying strings now to the works of machines is screwing the creators of tomorrow. The line between what a person creates and what a machine creates WILL evaporate. It's not a matter of if, but when.
Imagine we put a ton of regulations on people who use power tools to do carpentry. I'm sure the carpenters around the time power tools were created figured "That's not true craftsmanship. They shouldn't be able to make a living off that!" But the carpenters of today would be screwed by these regulations because of course they have to use the latest technology to stay competitive.
As for the argument that we're taking the food out of creative's mouths: I don't think anyone is not buying Stephen King novels now because they can just ask for a Stephen King style novel from ChatGPT. You can pirate Stephen King already. People aren't fascinated by LLMs because of how well they plagiarize. They're fascinated by them because they're capable of transformative works, not unlike humans. Nobody is typing "Write a Stephen King Novel" they're typing, "Harold and Kumar go to White Castle but it's Snoop Dogg and Betty White in the style of Stephen King." As much as I'm sure King would love to suck up all royalties for these stories, there's no universe where it makes sense that he should. You don't own what you inspire.
but you're wrong about the philosophical framework of copyright.