Attack extracts ChatGPT training data

A research team, including members from Google, uncovered a vulnerability in ChatGPT that enabled the extraction of several megabytes of training data by prompting the model to endlessly repeat a word. The attack, costing around a couple of hundred dollars, exposed real email addresses, phone numbers, and other identifiers from ChatGPT’s training dataset.

The researchers notified OpenAI, which addressed the specific exploit but did not rectify the underlying vulnerability. The attack’s simplicity involved prompting the model with a request to repeat a word indefinitely, causing it to diverge and disclose training data. The significance of this vulnerability lies in the potential exposure of sensitive information during model training, including personally identifiable details. While OpenAI implemented measures to prevent the specific attack, the incident raises broader concerns about model memorization and the inadvertent regurgitation of training data during model interactions.

The researchers emphasized the importance of addressing such issues to prevent unintentional data leaks, especially when deploying language models in real-world applications. The findings underscore the ongoing challenges in ensuring the privacy and security of AI models and highlight the need for continuous efforts to enhance safeguards against data extraction and leakage.

Reference: