The growing use of data scraping for AI has landed Reddit and Perplexity AI in court. Reddit accuses Perplexity and several web-scraping partners of illegally collecting massive amounts of user-generated content. The lawsuit highlights the mounting tension between social platforms protecting their data and AI companies racing to train their models.
Background of the case
Reddit filed the lawsuit in a California federal court, naming Perplexity AI, Oxylabs UAB, AWMProxy, and SerpApi as defendants. The company claims these firms scraped billions of Reddit pages by disguising their traffic and bypassing security protections. According to Reddit, this large-scale activity occurred without any licensing deal or consent.
Reddit’s allegations
Reddit argues that it has formal licensing agreements with some companies, such as OpenAI and Google, but not with Perplexity. Despite receiving a cease-and-desist letter in May 2024, Reddit says the scraping activity continued—and even increased fortyfold in the following months. The platform accuses the defendants of exploiting its community data for profit and breaching federal computer laws.
Perplexity and partners respond
Perplexity AI denies the allegations and insists it acts within legal boundaries. A company spokesperson said Perplexity “will continue to fight for fair and open access to public information.”
SerpApi also plans to defend itself, claiming it operates in full compliance with U.S. regulations. Oxylabs stated it never received direct contact from Reddit before the lawsuit was filed.
Broader implications
The case reignites debate over who owns online data in the age of data scraping for AI. Platforms like Reddit view their content as a proprietary asset, while AI developers see public web data as essential for model training. The lawsuit could shape how AI companies acquire data and what legal obligations they must meet.
Possible outcomes
If the court sides with Reddit, AI companies may face tighter restrictions and higher costs when sourcing training data. Many could shift toward paid licensing deals to avoid future litigation. The ruling might also push regulators to create clearer frameworks governing how online content can be used in AI development.
Conclusion
Reddit’s lawsuit against Perplexity AI and its alleged partners underscores the growing conflict over data scraping for AI. The case could redefine boundaries between open access and proprietary rights in the AI era. As legal scrutiny increases, both platforms and AI firms must adapt to stricter standards for data use and transparency.


0 responses to “Data scraping for AI sparks Reddit lawsuit against Perplexity”