Malicious Models Discovered on Hugging Face

Researchers from ReversingLabs have discovered a concerning security vulnerability on the Hugging Face platform, where malicious machine learning models were found exploiting weaknesses in the Pickle file serialization format. Pickle, a popular Python module used to serialize and deserialize objects, poses significant security risks because it allows arbitrary code execution during deserialization. The models identified on Hugging Face were stored in PyTorch format as compressed Pickle files, with malicious payloads embedded at the beginning of the Pickle stream. This tactic allowed the payload to execute before the file’s integrity was compromised, effectively evading Hugging Face’s security tools.

The malicious models, dubbed “nullifAl,” were crafted to bypass security detection by initiating harmful code before being flagged. ReversingLabs researchers found that these models contained code capable of executing on unsuspecting systems, compromising security through the Pickle format’s inherent vulnerability. The discovery highlights the increasing security risks associated with the widespread use of Pickle in collaborative AI platforms, where many developers prioritize speed and productivity over robust security measures.

Pickle files, while convenient for serializing machine learning data, can be exploited by attackers to insert malicious payloads. This makes platforms like Hugging Face particularly vulnerable, as they host machine learning models that are downloaded and used by developers worldwide. The malicious payloads in the models identified by ReversingLabs were designed to execute arbitrary commands on target systems, giving attackers access to potentially sensitive environments. The use of Pickle files in collaborative settings increases the likelihood of exposure, especially as many developers neglect to consider the associated risks.

In response to these findings, Hugging Face has taken steps to bolster its security measures, but the incident underscores the need for greater awareness of the risks inherent in using Pickle for model serialization. Developers are being advised to exercise caution when working with Pickle files, opting for safer alternatives when possible and closely monitoring systems for signs of compromise. As the AI community continues to embrace collaborative platforms, the need for innovative and secure solutions to manage the risks of shared machine learning models has never been more critical.