NYT Sues Perplexity Over Use of Copyrighted Work

1,740 Views | 25 Replies | Last: 1 day ago by Over_ed
infinity ag
How long do you want to ignore this user?
This attacks the basis of the entire AI industry. These fancy AI models collect data from the internet and other sources and process them in complex ways to a point where if you send it a human language question, it can pull from this knowledge base and give you an intelligent (mostly) sounding answer. It is pretty remarkable.

However, you cannot get away from the fact that the data isn't theirs. They get it from the internet which is owned by others like NYT. What is right and what is wrong here?

Technically AI cos use other people's data for free and run their businesses and make money. Should they be allowed? Or should they pay for it?

If NYT and others decides to be obstinate and refuse money and not want to share its IP, then AI companies cannot run effectively.

What is the way out here?

New York Times Sues A.I. Start-Up Perplexity Over Use of Copyrighted Work
https://www.nytimes.com/2025/12/05/technology/new-york-times-perplexity-ai-lawsuit.html
Quote:

By Cade Metz and Michael M. Grynbaum
Cade Metz reported from San Francisco, and Michael M. Grynbaum from New York.
Dec. 5, 2025 Updated 10:11 a.m. ET


The New York Times claimed in a lawsuit on Friday that its copyrights were repeatedly violated by Perplexity, an artificial intelligence start-up that has built a cutting-edge internet search engine.

The Times said in its lawsuit that it had contacted Perplexity several times over the past 18 months, demanding that the start-up stop using the publication's content until the two companies negotiated an agreement. But Perplexity continued to use The Times's material.


The suit, filed in federal court in New York, is the latest in a growing legal battle between copyright holders and A.I. companies that includes more than 40 cases around the country. On Thursday, The Chicago Tribune filed a suit against Perplexity, accusing it of copyright infringement. And last year, Dow Jones, owner of The Wall Street Journal, The New York Post and other publications, made similar claims in a lawsuit against the start-up.

The Times's suit is the second it has filed against A.I. companies. In 2023, The Times sued OpenAI and its partner Microsoft, arguing that the companies trained their A.I. systems using millions of Times articles without offering compensation. Microsoft and OpenAI, the maker of the chatbot ChatGPT, have disputed the claims.


Perplexity, a San Francisco company founded in 2022 by a former OpenAI engineer and other entrepreneurs, operates a search engine powered by the same type of A.I. technology that underpins ChatGPT.

The suit accuses Perplexity of violating The Times's copyrights in several ways, most notably when the start-up's search engine retrieves information from a website or database and uses that information to generate a piece of text and to respond to queries from internet users. That would not be a fair use of that material, the suit claimed, because Perplexity grabbed large chunks of the publication's content in some cases, entire articles and provided information that directly competed with what The Times offered its readers.


"Perplexity provides commercial products to its own users that substitute for The Times, without permission or remuneration," the suit said.


Quote:


The Times also accused Perplexity of damaging its brand. In some cases, the suit said, Perplexity's search engine made up information what A.I. researchers call "hallucination" and falsely attributed that information to The Times.

Bird Poo
How long do you want to ignore this user?
AG
infinity ag said:

This attacks the basis of the entire AI industry. These fancy AI models collect data from the internet and other sources and process them in complex ways to a point where if you send it a human language question, it can pull from this knowledge base and give you an intelligent (mostly) sounding answer. It is pretty remarkable.

However, you cannot get away from the fact that the data isn't theirs. They get it from the internet which is owned by others like NYT. What is right and what is wrong here?

Technically AI cos use other people's data for free and run their businesses and make money. Should they be allowed? Or should they pay for it?

If NYT and others decides to be obstinate and refuse money and not want to share its IP, then AI companies cannot run effectively.

What is the way out here?


I think you are confusing "data" with information. If AI plagiarized the NYT then they would have a case. Are they going to go after every student in America that cites their articles for research projects?
aggiehawg
How long do you want to ignore this user?
AG
Dumb question since I don't use any AI apps but don't they give any attribution to the author/source of the info?
TexAgs91
How long do you want to ignore this user?
AG
> NYT Sues Perplexity Over Use of Copyrighted Work

I think this highlights another issue. AI is using highly sketchy sources for its training
No, I don't care what CNN or Miss NOW said this time
Ad Lunam
Ryan the Temp
How long do you want to ignore this user?
AG
It will be interesting to see how this plays out, because AI tools in the academic arena going back to the first lawsuit against Turnitin have prevailed primarily on the basis of revenue generated from the use of any one individual source is de minimis. There is probably an argument to be made that the compositional output constitutes a form of paraphrasing sources, but that doesn't necessarily square with attributing blatantly false information to a source.

Perplexity is an AI tool that is geared toward academia and scholarly sources and it serves this purpose extremely well. Anecdotally speaking, everyone i know who uses it, myself included, uses it to identify scholarly source material, not for wholesale composition of written prose (but it will do that if you ask it to).
infinity ag
How long do you want to ignore this user?
Bird Poo said:

infinity ag said:

This attacks the basis of the entire AI industry. These fancy AI models collect data from the internet and other sources and process them in complex ways to a point where if you send it a human language question, it can pull from this knowledge base and give you an intelligent (mostly) sounding answer. It is pretty remarkable.

However, you cannot get away from the fact that the data isn't theirs. They get it from the internet which is owned by others like NYT. What is right and what is wrong here?

Technically AI cos use other people's data for free and run their businesses and make money. Should they be allowed? Or should they pay for it?

If NYT and others decides to be obstinate and refuse money and not want to share its IP, then AI companies cannot run effectively.

What is the way out here?


I think you are confusing "data" with information. If AI plagiarized the NYT then they would have a case. Are they going to go after every student in America that cites their articles for research projects?


On the flip side, if NYT didn't have something, Perplexity couldn't use it. So Perp is using something useful from NYT and not paying for it.

Sure, they could go after college kids, but they make a judgement call, it woul hurt their reputation by doing so. So they let it slide.
infinity ag
How long do you want to ignore this user?
aggiehawg said:

Dumb question since I don't use any AI apps but don't they give any attribution to the author/source of the info?


Not always. There are some situations where it gives a reply and has links to wikipedia and other sources. And there is no way to police this.
Philip J Fry
How long do you want to ignore this user?
AG
I guess I plagiarize every time I read an article and maintain a memory of it?

ChatGPT often gives source links too. Not sure about perplexity
torrid
How long do you want to ignore this user?
AG
Does AI go behind a paywall, or does utilize what is available from a standard internet search? Because to me it appears what everyone is calling "AI" is just a fancy Google search.
Ryan the Temp
How long do you want to ignore this user?
AG
Quote:

On the flip side, if NYT didn't have something, Perplexity couldn't use it. So Perp is using something useful from NYT and not paying for it.

Years ago, a group of students sued claiming Turnitin was using their intellectual property to generate revenue. Turnitin stores and maintains every single paper students upload to it as part of plagiarism prevention programs at roughly 16,000 colleges and universities. Courts ruled their claim was invalid because their individual work had a negligible effect on Turnitin's overall revenue.

Given the volume of sources Perplexity has access to, the NYT is probably as big a fish as that group of students who sued Turnitin was.
Quote:

Sure, they could go after college kids, but they make a judgement call, it woul hurt their reputation by doing so. So they let it slide.

Students don't have deep pockets.
infinity ag
How long do you want to ignore this user?
torrid said:

Does AI go behind a paywall, or does utilize what is available from a standard internet search? Because to me it appears what everyone is calling "AI" is just a fancy Google search.


What "AI" does is create a giant complex mathematical equation using a process called "training". The training process requires data (which they get from the internet including NYT). All of NYT data will now be encoded in their AI model once training is done. So when you send it a question in human language, it converts that to numbers and runs it through this equation and gets the response back.

A Google search also has some similar stuff but it returns you links, ChatGPT will give you the information from many links in a form that answers your question. Google just says here is where you can find what you need, go there. ChatGPT gives you the info. Of course Google used to cache the internet for fast retrieval, they probably still do that.
torrid
How long do you want to ignore this user?
AG
I'm guessing Turnitin has a lengthy terms and conditions to which you must agree as part of being a student. No real way to opt-out other than withdraw from school.
Ryan the Temp
How long do you want to ignore this user?
AG
torrid said:

I'm guessing Turnitin has a lengthy terms and conditions to which you must agree as part of being a student. No real way to opt-out other than withdraw from school.

Opting out is at the discretion of the professor or school, but in order to have your work deleted from Turnitin's database, it requires a request from the school's Turnitin administrator - the student cannot make the request, which effectively eliminates control over their own intellectual property, and courts are okay with this.
eric76
How long do you want to ignore this user?
AG
Ryan the Temp said:

Quote:

On the flip side, if NYT didn't have something, Perplexity couldn't use it. So Perp is using something useful from NYT and not paying for it.

Years ago, a group of students sued claiming Turnitin was using their intellectual property to generate revenue. Turnitin stores and maintains every single paper students upload to it as part of plagiarism prevention programs at roughly 16,000 colleges and universities. Courts ruled their claim was invalid because their individual work had a negligible effect on Turnitin's overall revenue.

Given the volume of sources Perplexity has access to, the NYT is probably as big a fish as that group of students who sued Turnitin was.
Quote:

Sure, they could go after college kids, but they make a judgement call, it woul hurt their reputation by doing so. So they let it slide.

Students don't have deep pockets.

That's puzzling.

My understanding in copyright situations is that there is a something called statutory damages of at lest $750 per work that can be claimed without having to prove damages. So each individual should be able to collect at least $750 per copyrighted work.

I remember one case a number of years ago when the music groups filed suit against one student who downloaded incredible amounts of music for approximately $1,000,000,000 because of statutory damages. I never did hear the result of that.

So why were no statutory damages considered in the case?
captkirk
How long do you want to ignore this user?
AG
TexAgs91 said:

> NYT Sues Perplexity Over Use of Copyrighted Work

I think this highlights another issue. AI is using highly sketchy sources for its training

Chat GPT looks at Reddit, unless you specifically tell it not to.
javajaws
How long do you want to ignore this user?
AG
infinity ag said:

torrid said:

Does AI go behind a paywall, or does utilize what is available from a standard internet search? Because to me it appears what everyone is calling "AI" is just a fancy Google search.


What "AI" does is create a giant complex mathematical equation using a process called "training". The training process requires data (which they get from the internet including NYT). All of NYT data will now be encoded in their AI model once training is done. So when you send it a question in human language, it converts that to numbers and runs it through this equation and gets the response back.


An AI model is basically one or more AI algorithms that have been trained against a set of data (like the data in question in this thread). The training can take one of several forms (supervised/unsupervised/etc), but generally results in providing data-specific tuning (weights) for use by that algorithm (aka "fitting" of the algorithm to the data) such that the model gives useful results on that data it was trained on and which then will behave similarly against similar types of data/requests.

While technically everything in a computer gets boiled down to 1s and 0s, your explanation is...nonsensical.

I'm sure one of the actual AI experts would laugh at my explanation as well though so don't feel bad.
MouthBQ98
How long do you want to ignore this user?
AG
I think the issue is that the publisher has one set of advertisers wanting to get views.

The various AI interfaces may or may not be gobbling up that source data to build its own responses to inquiries but with a different set of advertisers or for fee services.

They all fundamentally work this way: summarize or conflate and condense data from huge amounts of sources and effectively paraphrase or build responses off the frequency of associations in the source data based on the query. It is a big problem that requires a definitive legal resolution.
infinity ag
How long do you want to ignore this user?
javajaws said:

infinity ag said:

torrid said:

Does AI go behind a paywall, or does utilize what is available from a standard internet search? Because to me it appears what everyone is calling "AI" is just a fancy Google search.


What "AI" does is create a giant complex mathematical equation using a process called "training". The training process requires data (which they get from the internet including NYT). All of NYT data will now be encoded in their AI model once training is done. So when you send it a question in human language, it converts that to numbers and runs it through this equation and gets the response back.


An AI model is basically one or more AI algorithms that have been trained against a set of data (like the data in question in this thread). The training can take one of several forms (supervised/unsupervised/etc), but generally results in providing data-specific tuning (weights) for use by that algorithm (aka "fitting" of the algorithm to the data) such that the model gives useful results on that data it was trained on and which then will behave similarly against similar types of data/requests.

While technically everything in a computer gets boiled down to 1s and 0s, your explanation is...nonsensical.

I'm sure one of the actual AI experts would laugh at my explanation as well though so don't feel bad.


What exactly is nonsensical about it? I know about supervised/unsupervised and fitting and all that but I didn't want to get technical as showing off is not the point of this thread. An algorithm and model are different. An algorithm is an set of steps or instructions to solve a problem. A model is the final result of it.

A very simple machine learning model can be z = 3x + 4y. Yes, as simple as that. A very complex one has many many variables and many weights.

It is reasonable to say that an AI model is a mathematical equation - just a very large and complex one. You give it some inputs and you get the output of the equation. GenAI uses probabilities in a clever way which is why you see different results and sometimes hallucination.

What is nonsensical about this? You basically said what I said and added some technical jargon to it. Okay.
BMX Bandit
How long do you want to ignore this user?
Philip J Fry said:

I guess I plagiarize every time I read an article and maintain a memory of it?





Of course, not, and that is not what the lawsuit claimed is happening.

I have only read the story on this perplexity case, not the actual suit, but in the ChatGPT case it was literally taking New York Times articles (some behind payrolls ) and regurgitating them word for word with no credit to the source.

So if you did that on Texags without linking the story, that would be clear copyright infringement

Loaded
How long do you want to ignore this user?
AG
This a copyright infringement case. NY Times wants compensation (rightly so) for use of their brand in search/AI results. Just like A&M signs contracts for compensation to protect their 12thMan brand. Either enforce your copyright, or lose it. The NY Times doesn't want the AI companies to stop using their material (losing eyeballs is self-defeating), they just want to protect their brand.. at least until people finally stop caring what the "Times" has to say.
Over_ed
How long do you want to ignore this user?
AG
aggiehawg said:

Dumb question since I don't use any AI apps but don't they give any attribution to the author/source of the info?

No. Their answer is an amalgamation of data tokens that are joined together by proximity and probability. If you ask the AI to give sources, they generally will be able to do that. BUT ONLY because they can do a new search on the web, not because the know what "tokens" went into their original answer.

BTW, not a dumb, but you already know that. :-)
infinity ag
How long do you want to ignore this user?
BMX Bandit said:

Philip J Fry said:

I guess I plagiarize every time I read an article and maintain a memory of it?





Of course, not, and that is not what the lawsuit claimed is happening.

I have only read the story on this perplexity case, not the actual suit, but in the ChatGPT case it was literally taking New York Times articles (some behind payrolls ) and regurgitating them word for word with no credit to the source.

So if you did that on Texags without linking the story, that would be clear copyright infringement




I am confused. ChatGPT will not regurgitate word for word. That is what makes it "Generative" AI. It takes all the "knowledge" from NYT and other places and generates brand new sentences which mean the same. So it is not word for word.

How does one track this? How does one trace anything back to NYT?
BMX Bandit
How long do you want to ignore this user?
infinity ag said:



I am confused. ChatGPT will not regurgitate word for word. That is what makes it "Generative" AI. It takes all the "knowledge" from NYT and other places and generates brand new sentences which mean the same. So it is not word for word.

How does one track this? How does one trace anything back to NYT?



JJxvi
How long do you want to ignore this user?
AG
If a human read all the same NYT articles and produced the exact same response would the NYT have case?
richardag
How long do you want to ignore this user?
Bird Poo said:

infinity ag said:

This attacks the basis of the entire AI industry. These fancy AI models collect data from the internet and other sources and process them in complex ways to a point where if you send it a human language question, it can pull from this knowledge base and give you an intelligent (mostly) sounding answer. It is pretty remarkable.

However, you cannot get away from the fact that the data isn't theirs. They get it from the internet which is owned by others like NYT. What is right and what is wrong here?

Technically AI cos use other people's data for free and run their businesses and make money. Should they be allowed? Or should they pay for it?

If NYT and others decides to be obstinate and refuse money and not want to share its IP, then AI companies cannot run effectively.

What is the way out here?

I think you are confusing "data" with information. If AI plagiarized the NYT then they would have a case. Are they going to go after every student in America that cites their articles for research projects?

IANAL
Seems the difference is the company using AI is making money and the College student is not making money.
Among the latter, under pretence of governing they have divided their nations into two classes, wolves and sheep.”
Thomas Jefferson, Letter to Edward Carrington, January 16, 1787
Over_ed
How long do you want to ignore this user?
AG
infinity ag said:

BMX Bandit said:

Philip J Fry said:

I guess I plagiarize every time I read an article and maintain a memory of it?





Of course, not, and that is not what the lawsuit claimed is happening.

I have only read the story on this perplexity case, not the actual suit, but in the ChatGPT case it was literally taking New York Times articles (some behind payrolls ) and regurgitating them word for word with no credit to the source.

So if you did that on Texags without linking the story, that would be clear copyright infringement




I am confused. ChatGPT will not regurgitate word for word. That is what makes it "Generative" AI. It takes all the "knowledge" from NYT and other places and generates brand new sentences which mean the same. So it is not word for word.

How does one track this? How does one trace anything back to NYT?



Conceptually -- Imagine there are 50 voices talking about New Year's Eve in NYC, 1947. The AI model breaks down the words into tokens, and joins them together by how closely related they are and the probabilities of their occurring. Keep in mind it is more complicated than this, for one thing because the AI builds on what it knows already.

In any case the AI weaves its reply from joining all these stories together.

Now suppose we ask a question about what happened uniquely to Joe Smith in NYC on Christmas Eve, 1947.
Unfortunately, there is only one account about this, from his granddaughter. The model may put together her story with other events in knows about NYC. When it does this it is hallucinating, because it may not jibe with what really happened to Joe. Hallucinations are pretty common, btw.

Or it might rely solely on the granddaughter's account. Then you can end up quoting the story, exactly. All depends on the training set, the weights assigned to tokens, and other instructions given to the model during training.
Refresh
Page 1 of 1
 
×
subscribe Verify your student status
See Subscription Benefits
Trial only available to users who have never subscribed or participated in a previous trial.