Apache Airflow

Managing Data Engineering Pipeline Workflows At-ease

Duration

2 Day

Level

Basic Level

Design and Tailor this course

As per your team needs

Edit Content

It is really hard to get the production-grade Data Pipelines right in the first attempt. Since there are so many moving parts, each update adds to the complexity of the pipeline. Proper orchestration of pipelines is very important for the success of any data driven organization. 

This course would help you in productionalizing Data Pipelines with Apache Airflow. First, we’ll explore what Airflow is, its syntax, how to build DAGs, and, finally, how to scale Data Pipelines.

Then, we’ll discover how to make your pipelines more resilient and predictable. At the end, we’ll learn how to distribute tasks with Celery and Kubernetes Executors.

Edit Content
  • Data Engineers
  • Data Analysts
  • Data Scientists
Edit Content
  • Why Airflow?
  • What is Airflow?
  • Working of Airflow
  • Secrets of WebServer and Scheduler 
  • Exploring the working environment
  • Installing Airflow 2.0
  • Quick tour of Airflow UI and CLI
  • How do we represent a pipeline in Airflow?
  • What is DAG?
  • Our First DAG
  • Dissecting DAGs: Tasks & Operators
  • Creating first Pipeline
  • Demystifying Start_date and schedule_interval parameter
  • Backfill and Catchup
  • Time Zones in Airflow
  • How to make your tasks dependent
  • What is a DAG?
  • Define your DAG
  • Organizing your DAG Folders
  • How to deal with failures in DAGs
  • Testing DAGs
  • Minimizing repetitive patterns with SubDAGs
  • Grouping your tasks with SubDAGs and dealing with deadlocks
  • Trigger rules for your Tasks
  • Using Macros and Airflow Templates
  • Advanced DAG flow with branching
  • First conditional task
  • Extending functionality with custom operators
  • Sharing components with Airflow plugins
  • Why are my tasks sequential?
  • Sequential, Local, and Celery Executor
  • Understanding Concurrency and Parallelism with Local Executor
  • Installing Celery Setup
  • Distributing tasks with Celery Executor
  • Airflow in Kubernetes
  • Setting up a 3-node Kubernetes cluster with Vagrant and Rancher
  • Installing Airflow with Rancher and Kubernetes Executor
  • Running DAGs with Kubernetes Executor
  • Best Practices for Airflow
  • Logging in Airflow?
  • Introduction to Airflow metrics
  • Monitoring Airflow with TIG Stack
  • Triggering alerts for Airflow with Grafana
  • Trigger maintenance DAGs
Edit Content
  • Basics of Python 
  • Understanding of Data Engineering Processes

Connect

we'd love to have your feedback on your experience so far