Will Tradition Model Theft Techniques Work on LLMs like ChatGPT?

Laveena Bachani
5 min readJan 27, 2024

--

Photo by Mojahid Mottakin on Unsplash

Entities ranging from governments worldwide to companies at the forefront of model development are actively assessing the risks associated with generative models and the potential consequences of this technology falling into the wrong hands. In late October, the Biden Administration released an order aimed at securing and controlling the use of AI. Among the stipulations for high-risk AI, there is a requirement for red teaming to identify security vulnerabilities, as well as the implementation of both physical and cyber security measures to safeguard model weights.

Jason Clinton, a key figure at Anthropic, shared with VentureBeat that a significant portion of their resources is dedicated to protecting a single file containing the weights of their generative AI models. OpenAI has also introduced a bug bounty program, urging researchers to uncover and address any vulnerabilities in their system, with a specific focus on protecting the weights of GPT models.

ByteDance, a Chinese Internet technology company, has already been implicated in attempting to utilize outputs from the ChatGPT model to train its own AI chatbot.

Why the measures are being taken to protect the model weights and model architecture?

The weights and architecture of a model constitute valuable intellectual property, embodying a unique state forged through extensive computation hours, meticulous selection of data, sophisticated algorithms, and conscientious experimentation by skilled researchers. When wielded by malicious actors, this intellectual property can pose significant dangers.

Malicious actor can

  1. Acquire the model with the fraction cost took to train the model. Avoid the cost of target model available through APIs or host the copy of the target model to make profit.
  2. The acquired models might be utilized to generate counterfeit IDs and false information. This poses a significant threat, enabling scams and the deliberate misleading of vulnerable populations.
  3. A more severe risk emerges if these models are misused for the development of biological weapons, underscoring the critical need for stringent safeguards.
  4. Sensitive proprietary information, upon which the target model is trained, could be at risk of being exposed.

I will go ahead with possible ways a Gen AI model could be attacked to steal the weights and architecture.

Guide to Practical Model Stealing

Traditionally, Model theft can be done via following ways shown in the survey:

1. API Exploitation:

If the model is accessible through APIs, malicious actors may use the outputs (labels and probabilities) of model to train their own substitute model.

2. Side-Channel Attacks:
Sophisticated attackers could explore side-channel attacks, such as analyzing power consumption or electromagnetic emissions during model inference, to infer details about the model.

3. Physical Attacks:
In scenarios where the model is deployed on physical hardware, attackers might physically breach the facility to gain access to the servers hosting the model. This could involve theft or tampering with hardware components.

It’s hard to meaningfully extract a full LLM model through above methods due to high security of the cloud companies' infrastructure and huge size of LLMs. But Still Researchers have identified ways the vulnerabilities of the LLMs could be exploited and create a copy model.

One commonly employed technique for pilfering a machine learning model by exploiting its inherent properties is the Substitute Model attack. This method operates on a query-based premise, assuming that the attacker possesses black box access to the model through an API or application. In this scenario, the attacker sends queries to the target model, collects the corresponding outputs, and leverages this information to train a substitute model. The ultimate objective of the attacker is to develop a substitute model that closely mirrors the behavior of the target model.

The popularity of Substitute Model attacks can be attributed to several compelling reasons:

  1. No Need for Original Training Data:
    Unlike other methods, Substitute Model attacks eliminate the requirement for access to the original training dataset that the target model was initially trained on. Instead, the substitute model can be trained using a public dataset or an artificial dataset generated by Generative AI. Research in the paper THIEVES ON SESAME STREET! authors demonstrated extraction of a Bert model even with randomly sampled sequences of words.
  2. Cost-Effectiveness:
    These attacks are cost-effective as they do not necessitate extensive resources. Various optimization methods, including active learning, GANs (Generative Adversarial Networks), and sampling techniques, have been proposed to minimize the number of queries to the original model, making the process more economical.
  3. Reduced Dependency on Original Model Architecture:
    Substitute Model attacks do not require knowledge of the original model’s architecture. While understanding the architecture can be advantageous, studies indicate that even in the absence of this information, it is possible to create a copy model without compromising performance.

But how this method will be used for LLMs? Can you steal the large models with Substitute model attack?

LLMs are great target due to the costs and time that takes to train them. But creating a substitute model with the same capability as target model is still costly. But recent research showed that a generic LLM can be used to fine-tune a small size model for a specific task such as code translation.

In a comparative analysis, the generic model LaMDA (137B) scored only 4.3% in compilation correctness, while the coding-specific model Codex (12B) excelled at 81.1%, showcasing the power of task-oriented fine-tuning.

The quality of data generated by LLMs surpasses publicly available data for numerous NLP and programming tasks. For instance, fine-tuning a Code T5 model on LLM-generated data doubled its performance compared to training it on public data. Considering the quality of data provided by these APIs, adversaries might find it tempting to utilize the model API to gather data and train smaller, cost-effective models. Notably, GitHub charges $10 a month for individual use of its coding copilot, providing an opportunity for adversaries to deploy their own models for potential profit.:

As AI continues its rapid advancement, a parallel emergence of sophisticated attacks on models becomes inevitable. With the unprecedented costs associated with training models, the potential rewards for successful theft soar to new heights. What adds a layer of complexity is the intriguing dynamic where machine learning models, at times, are exploited by nothing other than machine learning techniques — an intricate catch-22. It would be interesting to witness how AI will be protected against AI. The high stakes prompt us to contemplate: How can AI be shielded against its own kind?

Thank you for stopping by.

--

--

Laveena Bachani

Honest stories from Tech Industry | AI @Microsoft | OpenAI | Writer for Women in Tech