Introduction

As the use of AI has exploded over the past year, organizations have increasingly built their own models and turned to downloading models from public repositories for customization. In February 2024, the data science community was shocked by the discovery on huggingface.com of more than 100 models that had been infected with specialized "back doors." The infected models contained specially crafted embedded code that allowed hackers to compromise model users. 

Even though this set of tainted models was less than one-tenth of one percent of the hundreds of thousands of models hosted on huggingface, the severity of the problem and lack of mitigating measures were a wake-up call for the AI community. This article explores the potential risks of AI models and discusses measures to identify and remediate those risks. 

The information presented below has been validated in the AI Security Enclave, part of WWT's AI Proving Ground. The Security Enclave is a development-focused environment where WWT engineers can quickly deploy and test new AI security technologies in an informal setting. The primary purpose of the Enclave is to provide an environment where WWT, customers and partners can work together to better understand the challenges of AI security, while secondarily allowing WWT to plan and scope more formal efforts around proofs-of-concept, product bake-offs and designing lab-as-a-service engagements.

Model data and file formats

First things first: What exactly constitutes an AI model? The "models" downloaded from huggingface typically consist of several gigabytes of data; this data is often distributed among multiple files, although increasingly we see all of it combined into a single file for convenience. AI model files use a variety of file formats. In the case of Large Language Models (LLMs), the connection between model file formats and model software is especially strong – many of the file formats require a specific application framework (and vice versa).

The model data includes everything required for the AI inference software to apply the model to user input. In the case of LLMs, this includes a variety of data types: tokenization information and parameters, matrices formed from floating-point vectors used for embedding tokenized input, and typically massive floating-point multidimensional arrays called tensors used as weights in the neural net core. Each of these data categories has its own format – the data types include floating-point numbers of varying precision, small integers, large integers, string data, and data just considered to be "raw bytes." 

Model files can be delivered in a variety of modes: compressed archives like zip, ASCII files encoded as JSON or XML, or single binary files with a mix of data types. Each format has its pros and cons regarding efficient loading, storage requirements and security. File format becomes especially critical as the model crosses the very important watershed in the AI model lifecycle—deployment from the development environment to production. 

There are a wide range of model formats. The choice of format determines the library needed to read the data from disk, so the format will correspond to the programming framework used to train the model. Some of the more popular are: Pickle, the ubiquitous format used in python to serialize classes and data arrays to disk; Protocol Buffers (commonly called protobuf), an open-source format designed for efficiently serializing structured data; HDF5 (Hierarchical Data Format), a binary format used by Keras to manage large datasets; ONNX, a highly complex format designed to support a higher level of abstraction with regard to data used by neural nets; GGUF, which combines all of the model data into one file; and Safetensors, a common format on the huggingface platform. Both GGUF and Safetensors are seeing increased usage in the user community.

Risks and attacks 

Serialization is the process of converting object states and data structures into a standard format that can be efficiently transmitted, stored and reconstructed. Traditionally, serialization has been used by developers as a convenient, standard approach to saving program data to disk as a means to enable work to continue seamlessly at the beginning of the next development session. In Python, "serialization" has usually meant "pickle." After the training phase of an AI or machine learning model, it is essential to save the model weights and other necessary data to disk for future use. Not surprisingly, since the majority of top-level code for data science is Python, pickle became the default method for saving model data to disk. 

The good news about using pickle for serialization is that pickle allows executable instructions to be embedded with the data, enhancing the efficiency and efficacy of the deserialization process. 

The bad news about using pickle for serialization is that it allows executable instructions to be embedded with the data, allowing an attacker to modify the data with malicious instructions that get executed upon deserialization, leading to potentially disastrous results.

First and foremost, these attacks begin with unauthorized access — the attacker needs write access to the model file to be able to embed the malicious code. If the serialization method does not allow embedded code, the outlook is brighter but still far from ideal because an attacker can still modify the model data, leading to suboptimal or simply incorrect output from the model. Although "just" corrupting a model is significantly less serious than compromising a host through embedded malware, the losses can still be seen as devastating when the original costs of training the model are considered (often in the tens of millions of dollars).

Defending model data

Preventing attacks against model files requires three concurrent approaches:

  1. Limit access to sensitive AI models and associated data throughout the AI model lifecycle, both in development and production environments. Best practice: a strong IAM process using Role Based Access Control (RBAC).
  2. Implement tamper detection on AI models using digital signatures. Note: to be truly effective, this requires the security provided by a strong PKI or equivalent.
  3. Test the models for dangerous embedded code and other potential malware using a trusted open source or commercial solution.

Also, note that these measures must be part of the organization's security policies. Furthermore, measures (2) and (3) will be new for most organizations, hence requiring evangelization and an additional budget. 

Access control and digital signatures are widely used in a variety of settings. The danger of including code in serialized files has been known for some time in the development world (largely due to a multitude of attacks leveraging pickle shortcomings), but the growth in the use of AI models is a new risk for production environments. Fortunately, there is already a commercial solution called Guardian available from the company Protect AI; even more fortunately, there is a related open-source project called modelscan supported by Protect AI. Guardian is an enterprise-class solution with strong authentication, sophisticated UI and integration with public cloud platforms, while modelscan is a CLI program run on local model files — but both aim for the same result of identifying model risk in the form of dangerous embedded code.

Two safe formats: Safetensors and GGUF

Another viable approach to safe model files is simply to have no embedded code from the beginning — that is, model files should be data only. This approach seems to be gaining in popularity in the data science community, based on model file formats now being preferred in popular frameworks like huggingface, lm-studio and ollama. Currently, the two most popular data-only formats are safetensors and GGUF.

"Safetensor" model files basically just store tensors (tensors are mathematical objects that for AI models are just multi-dimensional arrays of floating point numbers, usually referred to as the "model weights"), with the other data types stored separately. The safetensors format is used heavily by large AI enterprises such as Huggingface, StabilityAI and more. The format itself is very simple, as seen in the following diagram:

Description of safetensors format for LLM (from https://huggingface.co/docs/safetensors/index)

The file contains the tensors themselves with a minimal amount of information required to locate and parse the tensors. This simplicity results in very efficient loading of the model data.

GGUF (GPT Generic Unified Format)  is somewhat more complicated than Safetensors, largely because all of the required data is included in a single file. GGUF is actually an extension of the popular GGML format — the goal of GGUF is to provide significant improvements across the board, in security, stability, versatility, extensibility and efficiency. GGUF supports memory mapping, allowing for faster model loading and processing. It also ensures ease of use and deployment and features the ability to update metadata without compromising the compatibility of older models. The GGUF file format is represented in the following diagram:

 

Format for GGUF model files (from https://github.com/ggerganov/ggml/blob/master/docs/gguf.md)

 

Conclusion

This article has examined the risks inherent in model files, serialization and model formats, and some important options for securing AI model usage. Good security hygiene and data governance are important to the entire AI development lifecycle. Specialized scanning tools such as the Protect AI Guardian platform prevent dangerous models from being loaded, a necessary step that will soon be seen as "table stakes" for AI model security. Creating and enforcing a clear policy for the data science teams about which model file formats are permitted is another important step for securing both development and production environments.

Technologies