Investigation into Perplexity AI by Amazon’s cloud division is underway. The inquiry revolves around potential violations of Amazon Web Services regulations by the AI search startup, as per information obtained by WIRED.

An unnamed AWS representative disclosed to WIRED the company’s scrutiny of Perplexity. In a prior investigation, WIRED had uncovered that the startup—supported by the Jeff Bezos family fund and Nvidia, with a recent valuation of $3 billion—seemed to be reliant on information from scraped websites that had explicitly prohibited access through the Robots Exclusion Protocol, a widely recognized web norm. Although the Robots Exclusion Protocol is not legally binding, terms of service typically are.

The Robots Exclusion Protocol is an old-established web standard involving the use of a text file (like wired.com/robots.txt) on a domain to specify which pages automated bots and crawlers are prohibited from accessing. While entities employing scrapers have the option to disregard this protocol, most have traditionally shown respect for it. The AWS spokesperson informed WIRED that customers utilizing AWS services are obligated to comply with the robots.txt standard while navigating websites.

In a statement, the spokesperson emphasized, “AWS’s terms of service proscribe customers from engaging in any illicit activities, and compliance with our terms and all applicable laws rests with our customers.”

Examination of Perplexity’s methodologies stems from a report dated June 11 by Forbes, accusing the startup of at least one instance of content pilfering. Investigations by WIRED validated this claim and unearthed additional instances of scrape misuse and plagiarism involving systems associated with Perplexity’s AI-enabled search chatbot. Engineers at Condé Nast, the parent company of WIRED, barred Perplexity’s crawler across all their websites by utilizing a robots.txt file. Nonetheless, WIRED discovered that the company had clandestine access to a server via an undisclosed IP address—44.221.181.252—that had frequented Condé Nast sites multiple times over the past three months, evidently for scraping purposes.

The contraption linked to Perplexity seems to be extensively trawling through news platforms that prevent bots from engaging with their content. Representatives from The Guardian, Forbes, and The New York Times have also reported multiple visits from the IP address in question to their servers.

WIRED managed to pinpoint the IP address to a virtual machine identified as an Elastic Compute Cloud (EC2) instance hosted on AWS. The investigation commenced after WIRED inquired whether the utilization of AWS infrastructure for scraping sites that explicitly prohibit it breached the platform’s terms of service.

In response to WIRED’s investigation, Perplexity CEO Aravind Srinivas initially stated that the queries presented “reflect a profound and elemental misinterpretation of Perplexity and the Internet’s dynamics.” Subsequently, Srinivas told Fast Company that the mysterious IP address observed by WIRED while scraping Condé Nast sites and a trial site established by WIRED was operated by a third-party specializing in web crawling and indexing services. Refusing to divulge the entity’s name, he cited a non-disclosure agreement. When asked if he intended to instruct the third party to cease crawling WIRED, Srinivas responded, “It’s a complex situation.”