LinkedIn DAG in Practice
Modular Tasks & Dynamic Generation
In the first post of our Airflow blog series, we explored the modern marketing data landscape and why orchestration is at the heart of a scalable data stack. Now, let’s get practical.
In this post about LinkedIn DAG, we’ll walk through a production-grade Airflow DAG (Directed Acyclic Graph) for ingesting LinkedIn data into Snowflake. Along the way, we’ll cover how to design tasks that are modular, reusable, and dynamically generated.
Why DAG Design Matters
Airflow lets us define workflows as Directed Acyclic Graphs (DAGs). But not all DAGs are created equal:
- Poorly designed DAGs become hard to maintain.
- Overly complex tasks make debugging painful.
- Sequential pipelines can cause slow runtimes and frustrated stakeholders.
Good DAG design, on the other hand, makes pipelines:
- Scalable (able to handle more endpoints, clients, or campaigns).
- Maintainable (easy to debug and extend).
- Resilient (fail gracefully, retry automatically).
Tipp: Keep Tasks Small and Purposeful
Think of an Airflow task like a single worker on an assembly line: it should do one thing, and do it well.
For example, in our LinkedIn DAG, we split responsibilities across simple tasks:
- Get access token: one task dedicated to authentication.
- Fetch data: one task per LinkedIn endpoint, writing results to S3.
- Load data: a TaskGroup containing S3 sensor to check if the data is available in S3 and a Snowflake Operator that load data into Snowflake.
This modularity pays off when something breaks. Instead of rerunning an entire DAG or sifting through a massive “do-everything” task, you can retry a single step.
Dynamic Task Generation with Configs
Marketing APIs like LinkedIn often have multiple endpoints (ads, campaigns, creatives, spend, etc.). Hardcoding each one into your DAG is a recipe for duplication and maintenance headaches.
Instead, you can define endpoints in a YAML config file and let Airflow dynamically generate tasks for each one.
The benefits of this are:
- DRY code (no copy-pasting Python operators for every endpoint).
- Reusability (add new endpoints by updating the config, not the DAG code).
- Flexibility (re-run for one endpoint without touching others).
This pattern lets you scale gracefully as APIs evolve — a reality in the fast-changing world of marketing platforms.
What’s Next
By keeping tasks modular and generating them dynamically, you’ve laid the foundation for a scalable LinkedIn ingestion pipeline. But production data pipelines need more than clean design — they need resilience.
In the next post, we’ll explore how to make DAGs robust with logging, alerting, and error handling so that failures are caught early and Marketing and Sales teams aren’t left wondering anymore why dashboards are late.