OpenAI accuses New York Times of paying someone to "hack" ChatGPT

midian182

Posts: 9,589   +120
Staff member
WTF?! The New York Times' copyright lawsuit against OpenAI has taken an unexpected twist after the tech company accused the newspaper of hiring someone to "hack" ChatGPT and other products to generate misleading evidence supporting its claim. OpenAI's use of the term "hack" may be a stretch, though.

The NYT sued OpenAI and Microsoft in December for using millions of its articles to train their systems without permission or compensation. The suit states that millions of the Times' copyrighted news pieces, in-depth investigations, opinion features, reviews, how-to guides, and more were used to train the chatbots, which now compete with the news outlet as a source of information.

In a filing in Manhattan federal court on Monday, OpenAI alleged that the Times "paid someone to hack" its products to generate 100 examples of copyright infringement.

OpenAI claims that it took the Times tens of thousands of attempts to generate "highly anomalous results" and that it achieved this using "deceptive prompts that blatantly violate OpenAI's terms of use."

"They were able to do so only by targeting and exploiting a bug (which OpenAI has committed to addressing) by using deceptive prompts that blatantly violate OpenAI's terms of use," OpenAI's lawyer wrote. "And even then, they had to feed the tool portions of the very articles they sought to elicit verbatim passages of, virtually all of which already appear on multiple public websites."

"Normal people do not use OpenAI's products in this way [...] In the ordinary course, one cannot use ChatGPT to serve up Times articles at will," OpenAI continued.

OpenAI does not name the "hired gun" who it claims the Times hired to manipulate ChatGPT's output, nor does it accuse the paper of actual hacking. This sounds more like standard prompt engineering, and the Times agrees.

"What OpenAI bizarrely mischaracterizes as 'hacking' is simply using OpenAI's products to look for evidence that they stole and reproduced The Times' copyrighted works. And that is exactly what we found. In fact, the scale of OpenAI's copying is much larger than the 100-plus examples set forth in the complaint," said Ian Crosby, Susman Godfrey partner and lead counsel for the publication. "In this filing, OpenAI doesn't dispute – nor can they – that they copied millions of The Times' works to build and power its commercial products without our permission."

The use of copyrighted work in the training of generative AIs has led to numerous lawsuits from authors, artists, and creators. OpenAI said in the filing it believes AI companies will win cases like these based on fair use. It notes that The Times "cannot prevent AI models from acquiring knowledge about facts."

It was reported back in August that the Times had been in "tense negotiations" over reaching a licensing deal with OpenAI and Microsoft that would allow the former to legally train its GPT model off of material published by the Times, something the newspaper previously decided to prohibit. But the talks broke down, leading to the current lawsuit. OpenAI already has an agreement in place with Reuters and Axel Springer to use their content for training purposes, and is said to be in talks with CNN, Fox Corp., and Time to secure licensing deals.

Permalink to story.

 
" OpenAI claims that it took the Times tens of thousands of attempts to generate "highly anomalous results" and that it achieved this using "deceptive prompts that blatantly violate OpenAI's terms of use." "

Well they always argued that you can't blind trust on ChatGPT outputs... So why now OpenAI is worried?

 
It's an interesting point: if OpenAI can show that the claimed copyright violations were made in bad faith, it would greatly weaken the argument against them. At a certain point the DMCA could potentially be invoked to call that illegal. But whether or not a judge/jury will agree with OpenAI's claims is hard to tell.
 
Frivolous lawsuit, specifically tailored to cover the fact that they are actually using copyrighted material under an obscure technicality, which by the way is also their own fault.

I wonder if they will use the same excuse when AI will be used to enslave us: “it is irrelevant because they exploited a bug in our software”
 
Frivolous lawsuit, specifically tailored to cover the fact that they are actually using copyrighted material under an obscure technicality,
LOL, what? The fair use exclusion is neither obscure nor "a technicality". Your argument is like claiming that, if you read a NYT article, you're not allowed to discuss the factual details of the story with anyone.

I'll refrain from commenting for now on the irony of those who most support pirating copyrighted material cheering for the NYT here.
 
LOL, what? The fair use exclusion is neither obscure nor "a technicality". Your argument is like claiming that, if you read a NYT article, you're not allowed to discuss the factual details of the story with anyone.

I'll refrain from commenting for now on the irony of those who most support pirating copyrighted material cheering for the NYT here.
Fair use is not for making money on material you don’t have a licence for. As simple as that.
 
Fair use is not for making money on material you don’t have a licence for. As simple as that.
You cannot copyright information or ideas; only a particular expression of them. Training a model on copyrighted material is no different than training a writer, artist, or professional on that same material. It's not what goes in that counts-- it's what comes out.

Here, apparently, the NYT was able to spoof the system by feeding it their own copyrighted material, to get it to regurgitate it back at them. If that is indeed what happened, then there is no violation. Case closed. As simple as that.
 
You cannot copyright information or ideas; only a particular expression of them. Training a model on copyrighted material is no different than training a writer, artist, or professional on that same material. It's not what goes in that counts-- it's what comes out.

Here, apparently, the NYT was able to spoof the system by feeding it their own copyrighted material, to get it to regurgitate it back at them. If that is indeed what happened, then there is no violation. Case closed. As simple as that.
If you want training on copyrighted material you need to pay for it, one way or another. And if you use said material in a material of your own you need to have proper citations in place, otherwise it is plagiarism.

If you want access to NYT articles you need to pay a subscription. And even if you do, using any material in there in an “original” “generative” piece, still requires proper citations.

As simple as that.
 
Last edited:
If you want training on copyrighted material you need to pay for it, one way or another
Err, no. 99% of the stories you've read on this particular website -- as many others -- are the result of the author reading someone else's copyrighted story, and then paraphrasing it appropriately. Perfectly legal.

. And if you use said material in a material of your own you need to have proper citations in place, otherwise it is plagiarism.
Heh, you're using terms you don't understand. Plagiarism relates to ethics in an academic setting; it has no legal meaning. The term you mean is infringement. Only individuals can commit plagiarism; they can plagiarize copywritten or non-copywritten material; and all the citations in the world won't add to or stop an infringement case. Organizations -- such as the NYT or OpenAI -- commit infringement.

Or in this case -- they don't.
 
Err, no. 99% of the stories you've read on this particular website -- as many others -- are the result of the author reading someone else's copyrighted story, and then paraphrasing it appropriately. Perfectly legal.


Heh, you're using terms you don't understand. Plagiarism relates to ethics in an academic setting; it has no legal meaning. The term you mean is infringement. Only individuals can commit plagiarism; they can plagiarize copywritten or non-copywritten material; and all the citations in the world won't add to or stop an infringement case. Organizations -- such as the NYT or OpenAI -- commit infringement.

Or in this case -- they don't.
Perfectly legal indeed for THIS site. Open AI’s however are using copyrighted materials behind a paywall, including opinion pieces, how-to guides news analysis and other stuff. NYT opted out, their opt out was ignored and Open AI refuses to pay for licensing. NYT is entitled to sue and will most likely win. As simple as that.

As for my plagiarism reference you’re simply playing semantics, like a lawyer is. You knew the intent and ignored it.
 
Perfectly legal indeed for THIS site. Open AI’s however are using copyrighted materials behind a paywall...
Heh, no. Copyrighted material "behind a paywall" has exactly the same status as any other copywritten material. A NYT paywalled feature vs. a TechSpot free-to-read story: both receive equivalent legal protection. There really isn't any room for debate on this; are you next going to attempt to argue that water isn't wet?
 
Back