Skip to main content

💻 NEW: Data-Driven Marketing Masterclass: € 200 discount when booking before March 17 - Save your seat >

08.10.2025: dlt Blog Series 2

Running dlt at Scale: Pipelines & Secrets in MWAA


Christian Shackleton Hopmann c2f9533d
Christian Shackleton on October 8, 2025

This post about running dlt at scale is part of our dlt blog post series. It assumes familiarity with Airflow, AWS, and Snowflake. If you’re new to dlt, start with our first post where we cover why we adopted it and how we integrated it into our data stack.

Hopmann Blogpost Serie dlt

The Challenge: Running dlt at Scale

dlt works exceptionally well when you have a single environment or when pipelines share the same sources and destinations. In these scenarios, using a single .toml config or environment variables containing credentials, schema definitions, and destination settings is enough, and local runs are smooth and predictable.

But that’s often not reality:

  • We run dozens of concurrent dlt pipelines.
  • We integrate with dozens of marketing sources, many with multiple accounts each using unique credentials.
  • We maintain multiple Snowflake destinations.
  • Pipelines orchestrated via MWAA (Managed Airflow), with code stored in GitHub and deployed through S3

In this setup, dlt’s recommended static .toml configuration and suggested use of environmental variables quickly became a scaling bottleneck. The one-size-fits-all approach could not handle the diversity of sources, destinations, and concurrent pipeline runs in our cloud orchestration environment.

Where Things Broke Down

Once we tried to scale dlt beyond a single environment, the limitations of a static .toml configuration became apparent:

1. Parameter Store Limits

AWS caps the size of stored parameters. Packing dozens of explicitly named credentials into one .toml file pushed us over those limits. Our temporary workaround was to share credentials across pipelines to reduce duplication. That decision triggered the next issue.

2. Destination Switching

We needed to switch between different Snowflake environments, but managing them inside one static config required tricks like reusing pipeline names, which conflicted with dlt’s state management and introduced complexity.

3. Pipeline Name Conflicts

dlt links pipeline state to the pipeline name. To run pipelines concurrently, we needed unique pipeline names, but the static config didn’t allow secrets to resolve properly when names didn’t match.

4. Concurrent Writes

Multiple pipelines sharing the same pipeline names caused race conditions. Mid-run, configs would overwrite each other, leading to unpredictable results, as Pipeline A’s secrets sometimes bled into Pipeline B’s run. This led to unpredictable behavior and difficult-to-debug issues.

Rethinking How We Handle Secrets

It became clear that static .toml files were not going to work in our environment. With dozens of pipelines running concurrently, each needing unique credentials and targeting different destinations, we needed a dynamic, per-pipeline approach.

Our solution: generate and inject secrets at runtime, directly into each DLT pipeline, rather than relying on global environment variables or pre-defined config files.

Key benefits of this solution

  • Per-pipeline isolation: Each DAG/task fetches only the credentials it needs from AWS Parameter Store and assigns them to dlt.secrets.
  • Safe concurrent execution: Since dlt.secrets lives in memory per Python process, there is no risk of one pipeline overwriting another’s credentials.
  • Dynamic configuration: Pipelines can target different buckets, Snowflake environments, or API credentials without interfering with each other.
  • No global state or disk writes: Unlike environment variables, you are explicitly controlling secrets at runtime, which is safer in a multi-tenant execution environment. and disappear once the Airflow task completes.

Note: dlt currently supports Google Cloud Secrets Manager natively, but not AWS Parameter Store. This may change in future releases, so please always check the latest dlt documentation if you plan to rely on other secret backends.

Even though environment variables are standard practice for containerized workflows, in concurrent multi-DAG MWAA setups where values differ per DAG, our approach is actually more secure and reliable. Temporary secrets in memory never touch disk, and AWS SSM Parameter Store remains the source of truth.

Also, since this is running inside an Airflow task, none of the credentials persist beyond the task’s execution. Once the task finishes, everything is cleared from memory.

Our Approach for Running dlt at Scale

  • Store all credentials securely in AWS Parameter Store.
  • Use Airflow connections to manage Snowflake credentials.
  • Fetch temporary AWS session tokens at runtime.
  • Dynamically inject secrets directly into dlt using dlt.secrets.
  • Generate configs on the fly per pipeline, no static .toml files needed.

Here’s a simplified version of our secrets injection function, which is run just before creating the dlt pipeline:

def set_dlt_secrets(snowflake_conn_id: str) -> None:
    """
    Loads AWS and Snowflake credentials into dlt secrets from Airflow connection metadata.

    dlt's filesystem destination uses boto3 by default, but manually setting
    dlt.secrets["destination.filesystem.credentials.*"] overrides that chain.
    If any credential is defined, the rest becomes mandatory, so we set them all explicitly.
    """
    import boto3
    from airflow.hooks.base import BaseHook
    import json
    import dlt

    # Fetch Snowflake connection from Airflow
    snowflake_conn = BaseHook.get_connection(snowflake_conn_id)
    snowflake_extra = json.loads(snowflake_conn.extra) if snowflake_conn.extra else {}

    # Retrieve temporary AWS credentials from session
    session = boto3.Session()
    credentials = session.get_credentials().get_frozen_credentials()
    logger.info("🔑 Successfully retrieved AWS temporary credentials")

    # Set dlt secrets for S3
    dlt.secrets["destination.filesystem.bucket_url"] = f"s3://{bucket_name}"
    dlt.secrets["destination.filesystem.credentials.aws_access_key_id"] = credentials.access_key
    dlt.secrets["destination.filesystem.credentials.aws_secret_access_key"] = credentials.secret_key
    dlt.secrets["destination.filesystem.credentials.aws_session_token"] = credentials.token
    dlt.secrets["destination.filesystem.credentials.aws_default_region"] = "eu-central-1"

    # Set dlt secrets for Snowflake external stage

Outcome of this approach

This approach unlocked a full pipeline isolation and eliminated configuration conflicts. Every Airflow DAG working with dlt now gets its own fresh, ephemeral config, containing only the secrets it needs for that run.

This allowed us to:

  • Give each pipeline a unique name.
  • Keep secrets isolated per pipeline.
  • Scale to as many concurrent pipelines as we wanted without collisions, letting the Amazon Managed Workflow for Apache Airflow (MWAA) simply orchestrates independent tasks.

What were our Learnings for Running dlt at Scale?

Running dlt at scale forced us to rethink how we manage secrets and configs:

  • dlt’s defaults favor simplicity, not scale: .toml files work for local setups but break under concurrency.
  • Dynamic configuration is key: Per-pipeline secrets and configs eliminate collisions.
  • AWS Parameter Store works well when you avoid bundling everything into one large file.
  • Isolation is the foundation: Every pipeline should manage its own secrets and schema independently.
  • MWAA introduces constraints that make static configs impractical, but dynamic generation solves them elegantly.

What’s Next

In our next post, we’ll cover another challenge we faced: schema persistence and evolution. Marketing data demands change constantly, and sometimes dlt’s helpful tricks become a hindrance when handling shifting schemas.

If you need help scaling your data projects in the meantime, we are of course always happy to assist you.