Skip to main content

💻 NEW: Data-Driven Marketing Masterclass: € 200 discount when booking before March 17 - Save your seat >

15.10.2025: Airflow Blog Series Part 4

Workflows in Airflow: Clarity and Scalability


HMA Team Giorgio Frisenda 1b90ede9
Giorgio Frisenda on October 15, 2025

In our last post of our Airflow blog series, we covered how to make Airflow DAGs more resilient with logging, retries, and alerting. But resilience alone won’t save you if your DAGs are a tangled mess.

Imagine you’ve just built your first Airflow pipeline. It fetches campaign data from LinkedIn, loads it into Snowflake, and you’re done. At first, things run smoothly. But as the pipeline grows, more endpoints, more clients, more moving pieces, it starts to feel less like a neat assembly line and more like a messy spider web. That’s where organization comes in. The way you design workflows in Airflow isn’t just about style. It directly impacts how readable, fast, and reliable your pipelines are.

In this post, we’ll explore some of the most common patterns for structuring Airflow workflows, especially in marketing pipelines, and how they can save you from future headaches.

Linear Flows in Airflow: Simple but Limited

The simplest design is a straight line.

This is perfect for small pipelines or steps that truly depend on each other. For example:

  • Get a LinkedIn API token
  • Fetch campaign data
  • Load campaign data into Snowflake

This approach is clean and easy to follow. If your tasks really depend on one another, it’s perfect. But the problem shows up when your pipeline grows. Imagine fetching dozens of endpoints one by one. Suddenly your “simple” flow takes hours to finish. Linear flows in Airflow are great training wheels, but they don’t scale well.

Workflows_in_Airflow_Linear

Fan-Out: Parallelism at Scale

For bigger jobs, you can speed things up with fan-out / fan-in patterns. In our use-case:

  • After authentication, you fan-out into multiple “fetch” tasks (one per LinkedIn endpoint).
  • Once they all complete, load task that aggregates and writes everything to Snowflake.

This design keeps pipelines fast while maintaining clear dependencies.

Workflows_in_Airflow_Fan-Out

TaskGroups: Keep DAGs Readable

As DAGs grow, hundreds of tasks can clutter your Airflow UI. Enter TaskGroups — a way to bundle related tasks into logical “folders”:

  • A fetch group with one task per endpoint.
  • A load group with staging, transformation, and warehouse load steps.

TaskGroups don’t change execution but make DAGs far more readable.

Workflows_in_Airflow_Task-Groups

When Not to Parallelize Workflows in Airflow

Parallelism is powerful, but more isn’t always better. Imagine sending a hundred API requests to LinkedIn at the same time. You’ll likely hit rate limits and get errors back.

Sometimes, the safer choice is to fetch data sequentially, even if it takes longer. The art here is balance: run as much in parallel as you can, but respect the limits of APIs, warehouses, and infrastructure.

In our use-case, we decided to keep the extraction_linkedin_account_details Tasks sequentially to avoid API throttling, while we parallelized load_linkedin_account_details Tasks.

Workflows_in_Airflow_Parallize

Are Fan-Out and Fan-In the only options?

Not at all. Even if your tasks run in parallel, Airflow lets you control how many DAG (Directed Acyclic Graph) runs and tasks execute at the same time. At the DAG level, you can configure these limits globally in airflow.cfg via max_active_runs_per_dag and max_active_tasks_per_dag:

Workflows_in_Airflow_Max-active-runs

For finer-grained control, you can set limits locally for individual tasks using task_concurrency or assign tasks to a pool. For example, if the number of parallel tasks in the extraction_linkedin_account_details TaskGroup increases, you can limit the total number of tasks running at the same time to avoid hitting API rate limits. Using these local controls alongside global DAG settings allows you to safely fan out tasks while preventing overload on APIs.

Workflows_in_Airflow_Task-Concurrency

What’s Next: Idempotent Workflows in Airflow

With modular tasks, resilience, and scalable workflow patterns, you’re well on your way to a production-grade DAG. But there’s one more principle every marketing pipeline needs: idempotency.

In the next post, we’ll cover how to design idempotent DAGs that are safe to rerun without duplicating or corrupting data, ensuring retries and backfills never compromise your Marketing Data Stack.