Flink Training for Real-Time Data Engineering

Driving Transformation, Efficiency, and Strategy Across the Software Lifecycle

Duration

5 Day

Level

Advanced Level

Design and Tailor this course

As per your team needs

Overview

This training program is designed to empower the data engineering team to transition from batch ETL workflows to a state-of-the-art, real-time data processing architecture using Apache Flink on AWS. The course moves from foundational Flink concepts to practical, hands-on implementation within the existing cloud ecosystem. Key focus areas include leveraging Flink’s DataStream API, deploying and managing Flink jobs as a managed service on AWS, integrating with Kubernetes (EKS), establishing real-time CDC pipelines, utilizing advanced table formats like Apache Iceberg, and integrating with Kafka Connect. The ultimate goal is to equip the team with the engineering best practices needed to build and scale a robust, real-time data lake infrastructure for advanced AI initiatives.

Audience

This course is specifically designed for the team, including:

  • Experienced Data Engineers (15-20 primary participants).
  • Architects and technical leads involved in the BI and AI initiatives.
  • Team members with a background in data processing who need to upskill for real-time systems.

Prerequisites

Participants should have:

  • Strong experience with Scala programming and data structures.
  • Practical familiarity with AWS cloud services. 
  • Basic knowledge of Kubernetes is mandatory.
  • Prior exposure to a distributed data processing framework (e.g., Apache Spark) is helpful but not mandatory.
  • No prior experience with Apache Flink is required.

Curriculum

  • Introduction to Apache Flink: What it is and why it excels at stateful stream processing.
  • Overview of Flink’s architecture: Jobs, Tasks, Operators, and the Cluster model.
  • Deep dive into the core API for building Flink applications.
  • Sources: Reading data from streams (e.g., Kafka, Kinesis).
  • Transformations: map, filter, keyBy, window, and other essential operations.
  • Sinks: Writing data to destinations.
  • Lab: Writing your first Flink application.
  • Table API vs. SQL API vs. DataStream API
  • Defining tables over Kafka, filesystem, CDC sources
  • Stream-table duality: Dynamic tables & temporal joins
  • Lab: Create a real-time SQL query over Kafka and emit to PostgreSQL
  • Time-based windows: Tumbling, Sliding, Session
  • Count windows and global windows
  • Triggers, evictors, and late data handling
  • Real-world use cases: User activity aggregation, fraud detection
  • Lab: Implement user session aggregation using event-time session windows
  • Types: Inner, Outer, Interval, Temporal Joins
  • Keyed vs. Broadcast joins
  • Handling skewed joins and out-of-order events
  • Use case: Joining user profile data with real-time events
  • Lab: Implement interval and temporal join.
  • The role of connectors in a Flink ecosystem.
  • Change Data Capture (CDC): Implementing real-time pipelines from OLTP databases.
    • Connecting to PostgreSQL for real-time ingestion..
  • Exploring the landscape of managed vs. open-source connectors.
  • Kafka Connect:
    • Source & sink connectors: Understanding various types and use cases.
    • Distributed vs. standalone modes: When to use each.
    • Schema Registry integration: Ensuring schema compatibility.
    • Error handling & dead-letter queues: Strategies for robust pipelines.
    • Scaling & monitoring Kafka Connect deployments.
    • Lab: Setting up a Kafka Connect source/sink for a Flink application.
  • Table formats in data lakes: Challenges with traditional approaches.
  • Introduction to Apache Iceberg: Key features and benefits (ACID, schema evolution, partition evolution, hidden partitioning).
  • Iceberg architecture: Table layout, manifest lists, manifest files, and data files.
  • Schema evolution: Adding, deleting, renaming columns.
  • Partition evolution: Updating partition schemes without data rewrite.
  • Time-travel queries: Accessing historical table states.
  • Compaction strategies: Optimizing file sizes for query performance.
  • Integrations: Using Flink with Iceberg for real-time data ingestion and transformation.
  • Integrations: Introduction to Iceberg with Spark for batch processing (brief overview).
  • Best practices for production deployments and optimizations with Flink & Iceberg.
  • Lab: Creating an Iceberg table, writing Flink data to it, and performing time-travel queries.
  • Why “state” is critical in real-time applications (e.g., aggregations, fraud detection).
  • Understanding Flink’s state backends (e.g., RocksDB).
  • Ensuring reliability with Checkpoints and Savepoints for no-data-loss guarantees.
  • Lab: Building a stateful Flink application that can recover from failure.
  • Overview of the Flink Kubernetes Operator.
  • Best practices for configuring Flink clusters on EKS.
  • Managing Flink job lifecycles (deployment, updates, scaling) via Kubernetes.
  • Lab: Deploying a Flink job to EKS Cluster

 

  • Operations
    • The Flink Runtime
    • Managing Flink Jobs
    • Fault Tolerance & Exactly-once
    • State Backends
    • High Availability
    • Metrics Monitoring & Alerting
    • Failure Recovery
    • Application Evolution
    • Capacity Planning
    • Security
  • Troubleshooting & Performance Tuning
    • Checkpoint Failures
    • Managing Skew
    • Serialization
    • State Migration
    • Tuning Serialization
    • Network
    • Network Design
    • Backpressure
    • Tuning Latency
    • RocksDB
    • RocksDB Overview
    • Tuning RocksDB
    • Case Study
  • Exercises
    • Troubleshooting a Failing Job
    • Troubleshooting a Stuck Job
    • Troubleshooting Checkpoints
    • Tuning for Latency
    • Tuning for Throughput
    • Tuning RocksDB
    • Object Reuse
  • Monitoring Flink jobs: Identifying backpressure and bottlenecks.
  • Resource management: Allocating CPU, memory, and parallelism effectively.
  • Data Serialization and Schema Management (e.g., using Avro).
  • Structuring Flink projects for maintainability and testing.
  • Production readiness insights for Flink: Checklist and considerations.
  • Performance optimization best practices for Flink: Advanced techniques.

Let’s Build Your Growth Ecosystem.

Get in touch