Skip to main content

💻 NEW: Data-Driven Marketing Masterclass: € 200 discount when booking before March 17 - Save your seat >

09.10.2025: Airflow Blog Series Part 2

LinkedIn DAG in Practice


HMA Team Giorgio Frisenda 1b90ede9
Giorgio Frisenda on October 9, 2025

Modular Tasks & Dynamic Generation

In the first post of our Airflow blog series, we explored the modern marketing data landscape and why orchestration is at the heart of a scalable data stack. Now, let’s get practical.

In this post about LinkedIn DAG, we’ll walk through a production-grade Airflow DAG (Directed Acyclic Graph) for ingesting LinkedIn data into Snowflake. Along the way, we’ll cover how to design tasks that are modular, reusable, and dynamically generated.

Hopmann Blogpost Serie Airflow

Why DAG Design Matters

Airflow lets us define workflows as Directed Acyclic Graphs (DAGs). But not all DAGs are created equal:

  • Poorly designed DAGs become hard to maintain.
  • Overly complex tasks make debugging painful.
  • Sequential pipelines can cause slow runtimes and frustrated stakeholders.

Good DAG design, on the other hand, makes pipelines:

  • Scalable (able to handle more endpoints, clients, or campaigns).
  • Maintainable (easy to debug and extend).
  • Resilient (fail gracefully, retry automatically).

Tipp: Keep Tasks Small and Purposeful

Think of an Airflow task like a single worker on an assembly line: it should do one thing, and do it well.

For example, in our LinkedIn DAG, we split responsibilities across simple tasks:

  • Get access token: one task dedicated to authentication.
Airflow_LinkedIn DAG_Get-Access-Token
  • Fetch data: one task per LinkedIn endpoint, writing results to S3.
Airflow_LinkedIn DAG_Load-Data
  • Load data: a TaskGroup containing S3 sensor to check if the data is available in S3 and a Snowflake Operator that load data into Snowflake.
Airflow_LinkedIn DAG_Load-Data

This modularity pays off when something breaks. Instead of rerunning an entire DAG or sifting through a massive “do-everything” task, you can retry a single step.

Dynamic Task Generation with Configs

Marketing APIs like LinkedIn often have multiple endpoints (ads, campaigns, creatives, spend, etc.). Hardcoding each one into your DAG is a recipe for duplication and maintenance headaches.

Instead, you can define endpoints in a YAML config file and let Airflow dynamically generate tasks for each one.

Airflow_LinkedIn DAG_Dynamik-Task-Generation

The benefits of this are:

  • DRY code (no copy-pasting Python operators for every endpoint).
  • Reusability (add new endpoints by updating the config, not the DAG code).
  • Flexibility (re-run for one endpoint without touching others).

This pattern lets you scale gracefully as APIs evolve — a reality in the fast-changing world of marketing platforms.

Airflow_LinkedIn DAG_Code

What’s Next

By keeping tasks modular and generating them dynamically, you’ve laid the foundation for a scalable LinkedIn ingestion pipeline. But production data pipelines need more than clean design — they need resilience.

In the next post, we’ll explore how to make DAGs robust with logging, alerting, and error handling so that failures are caught early and Marketing and Sales teams aren’t left wondering anymore why dashboards are late.