AirflowLogo

A High Level Overview

Table of Contents

  • What is Apache Airflow?

  • Airflow Principles

    • Scalable

      • Scaling Strategies

    • Dynamic

    • Extensible

    • Elegant

  • Airflow - Core Components

  • Airflow Features

  • Airflow Integrations

  • Airflow Providers

  • Airflow Docker stack

What is Apache Airflow?

  • Airflow is a platform to programmatically author, schedule and monitor workflows.

  • Developed initially by Airbnb in 2014 and later donated to the Apache Software Foundation, Airflow has become the de facto standard for workflow orchestration in the data engineering ecosystem

Airflow Principles

AirflowPrinciples2

Airflow Principle: Scalable

airflow is scalable

Scaling Strategies

airflow scalingstrategies

Airflow Principle: Dynamic

airflow dynamic

Airflow Principle: Extensible

  • Airflow allows you to create :

    • custom operators

    • custom sensors

    • hooks

    • plugins

This helps extending Airflow functionality while also helping to integrate with any system, define new abstractions, and tailor workflows to your environment seamlessly. πŸš€

Airflow Principle: Elegant

airflow elegant

Airflow: Core Components

airflow corecomponents

Airflow Features

Apache Airflow provides following features:

  • Pure Python

  • Useful UI

  • Robust Integrations

  • Easy to Use

  • Open Source

Airflow Feature: Pure Python

python

No more command-line or XML black-magic! Everything is Python:

  • create workflows

  • extend

  • python libraries

  • scheduler, executor and workers run Python

Airflow Feature: Useful UI - DAG Filtering

UI filters
  • all dags

  • active dags

  • paused dags

  • running dags

  • filter by tag

  • filter by name

Airflow Feature: Useful UI - Cluster Activity

UI views cluster activity
  • Worker Status

  • Task Distribution

  • Queue Health

  • Executor Performance

  • Debuging

Airflow Feature: Robust Integrations

airflow integration

Airflow offers robust integrations with :

  • Cloud Platforms

  • Databases

  • BigData Frameworks

Airflow Feature: Easy to Use

airflow easy

Anyone with Python knowledge can deploy a workflow. Apache Airflow does not limit the scope of your pipelines; you can use it to build ML models, transfer data, manage your infrastructure, and more.

Airflow Feature: Open Source

airflow github
  • Last commit: 1 hour ago

  • Total commits: +28k

Wherever you want to share your improvement you can do this by opening a PR. It’s simple as that, no barriers, no prolonged procedures. Airflow has many active users who willingly share their experiences. Have any questions? Check out our buzzing slack.

Airflow Providers packages

Airflow has 80+ providers packages includng integrations with third party integrations. They are updated independently of the Apache Airflow core. The current integrations are shown below

airflow providers

Airflow Docker stack

Airflow has an official Dockerfile and Docker image published in DockerHub as a convenience package for installation. You can extend and customize the image according to your requirements and use it in your own deployments.

Further Sources

Refer official documents on Apache Airflow here: