Menu

  • Alerts
  • Incidents
  • News
  • APTs
  • Cyber Decoded
  • Cyber Hygiene
  • Cyber Review
  • Cyber Tips
  • Definitions
  • Malware
  • Threat Actors
  • Tutorials

Useful Tools

  • Password generator
  • Report an incident
  • Report to authorities
No Result
View All Result
CTF Hack Havoc
CyberMaterial
  • Education
    • Cyber Decoded
    • Definitions
  • Information
    • Alerts
    • Incidents
    • News
  • Insights
    • Cyber Hygiene
    • Cyber Review
    • Tips
    • Tutorials
  • Support
    • Contact Us
    • Report an incident
  • About
    • About Us
    • Advertise with us
Get Help
Hall of Hacks
  • Education
    • Cyber Decoded
    • Definitions
  • Information
    • Alerts
    • Incidents
    • News
  • Insights
    • Cyber Hygiene
    • Cyber Review
    • Tips
    • Tutorials
  • Support
    • Contact Us
    • Report an incident
  • About
    • About Us
    • Advertise with us
Get Help
No Result
View All Result
Hall of Hacks
CyberMaterial
No Result
View All Result
Home News

Reddit Blocks Internet Archive Over AI Scraping

August 13, 2025
Reading Time: 3 mins read
in News
Smart Bus Flaws Allow Spying, Control

Reddit has announced plans to severely restrict the Internet Archive’s Wayback Machine from indexing its platform, a move prompted by concerns that AI companies have been exploiting the archival service to circumvent Reddit’s data protection policies. This decision represents a significant escalation in Reddit’s ongoing effort to control access to its user-generated content amidst the AI training data boom. Starting immediately, Reddit will implement “ramping up” restrictions that will block the Wayback Machine from accessing post detail pages, comment threads, and user profiles. The Internet Archive will only be able to index Reddit’s homepage, thereby limiting historical records to mere snapshots of trending headlines and popular posts on specific dates. A Reddit spokesperson explained that while the Internet Archive provides a valuable service to the open web, the company has identified instances where AI firms violated its policies by scraping data from the Wayback Machine. These companies reportedly used the robots.txt bypass capabilities inherent in archived content to access Reddit data that would have otherwise been restricted by the platform’s current API rate limiting and crawler blocking mechanisms.

Reddit’s technical implementation of this restriction will likely involve updating its robots.txt file to specifically target the Internet Archive’s crawlers using their User-Agent strings. The company may also implement server-side blocking based on the IP ranges associated with the Wayback Machine’s infrastructure. This approach mirrors Reddit’s recent strategy of blocking search engine crawlers unless those companies enter into paid licensing agreements. This strategic move is a key component of Reddit’s comprehensive approach to monetizing its data assets in the era of artificial intelligence. The platform has already secured major deals with tech giants like Google and OpenAI for official data access, while also pursuing legal action against other companies, such as Anthropic, for allegedly continuing to scrape content without permission.

The company’s 2023 API pricing changes, which effectively led to the closure of many popular third-party applications, were justified using similar reasoning about preventing unauthorized AI training. To maintain control over data access, Reddit has implemented various technical measures across its infrastructure, including rate limiting, authentication requirements, and usage monitoring. The company asserts these measures are necessary to protect user privacy and ensure content deletion requests are respected, which can be complicated by the existence of archived copies.

Mark Graham, the director of the Wayback Machine, has acknowledged ongoing discussions with Reddit about the issue and suggested that potential technical solutions might be explored. However, Reddit’s stance appears firm: access will remain severely limited until the Internet Archive can provide a guarantee that it will comply with platform policies regarding user privacy and the proper handling of content deletion requests. This rigid position underscores the difficulty of balancing open web archival principles with the commercial desire to control and monetize data.

This development highlights the growing and contentious tension between the principles of an open web and the commercial imperatives of data control in the AI training landscape. As companies like Reddit seek to protect their valuable user-generated content from being used without compensation, they are increasingly coming into conflict with services like the Internet Archive that are dedicated to preserving a historical record of the web. The outcome of this particular conflict between Reddit and the Wayback Machine will likely have broader implications for how companies and archival services interact in the future, particularly as AI continues to drive demand for vast quantities of data.

Reference:

  • Reddit To Block Internet Archive After AI Firms Scrape Data From Wayback Machine
Tags: August 2025Cyber NewsCyber News 2025Cyber threats
ADVERTISEMENT

Related Posts

SAP S4hana Exploited Vulnerability

US Allies Push For Sboms In Security

September 5, 2025
SAP S4hana Exploited Vulnerability

Reward For Russian FSB Hackers

September 5, 2025
SAP S4hana Exploited Vulnerability

US Sues Robot Toy Maker Over Data

September 5, 2025
Google Fined For Cookie Violations

Google Fined For Cookie Violations

September 4, 2025
Google Fined For Cookie Violations

Youtube Cracks Down On Password Sharing

September 4, 2025
Google Fined For Cookie Violations

Moscow Hires Hackers Behind School Breach

September 4, 2025

Latest Alerts

SAP S4hana Exploited Vulnerability

Virustotal Finds Undetected SVG Files

Russian APT28 Deploys Outlook Backdoor

CISA Flags TP Link Router Flaws

Lazarus Hackers Exploit ZeroDay, Deploy Rats

Google Patches 120 Flaws In Android

Subscribe to our newsletter

    Latest Incidents

    North Korean Hackers Fake Interviews

    Bridgestone Confirms Cyberattack

    Cybersecurity Firms Hit By Breach

    Salesloft Drift Attacks Hits Vendors

    Jaguar Land Rover Hit By Cyber Incident

    Hackers Use Grok Ai To Spread Malware

    CyberMaterial Logo
    • About Us
    • Contact Us
    • Jobs
    • Legal and Privacy Policy
    • Site Map

    © 2025 | CyberMaterial | All rights reserved

    Welcome Back!

    Login to your account below

    Forgotten Password?

    Retrieve your password

    Please enter your username or email address to reset your password.

    Log In

    Add New Playlist

    No Result
    View All Result
    • Alerts
    • Incidents
    • News
    • Cyber Decoded
    • Cyber Hygiene
    • Cyber Review
    • Definitions
    • Malware
    • Cyber Tips
    • Tutorials
    • Advanced Persistent Threats
    • Threat Actors
    • Report an incident
    • Password Generator
    • About Us
    • Contact Us
    • Advertise with us

    Copyright © 2025 CyberMaterial