Getting Started with Apache Iceberg

Analyze your huge data with high-performance and reliability

Duration

1 Day

Level

Basic Level

Design and Tailor this course

Edit Content

In the ever-evolving landscape of data management, innovation is the key to extracting maximum value from your data assets. The new “Getting Started with Apache Iceberg” course is a one-day immersive journey that will equip you with the essential knowledge to leverage Apache Iceberg – a groundbreaking technology that’s reshaping the way data is stored, managed, and queried within data lakes.

Course Highlights

  • Discovering the Apache Iceberg Advantages
  • Deep dive into Apache Iceberg Architecture 
  • Writing Efficient Data Queries
  • Understanding Apache Iceberg Internals
  • Manage, Monitor and Optimize Apache Iceberg 
  • Hands-on Implementation
Edit Content
  • Big data analysts who wish to leverage Apache Iceberg for performance gains
  • Data Engineers intending to set up data lakehouses using Apache Iceberg
Edit Content
  • Evolution of Data Platforms 
  • Understanding Data Lakes and Technologies available
  • Challenges with Data Lakes
  • Introduction to Apache Iceberg
  • Benefits of Apache Iceberg
  • Apache Iceberg vs Delta Lake vs Hudi
  • When to choose Apache Iceberg over other formats for data lake storage?
  • Overview of Apache Iceberg architecture
  • Various Apache Iceberg Components
  • How does Apache Iceberg handle metadata and data versioning?
  • Integration of Apache Iceberg with key data processing engines like Starburst, Spark
  • Installation and setup of Apache Iceberg 
  • Configuring metadata storage for Apache Iceberg tables
  • Apache Iceberg table structure
  • Step-by-step guide to creating Apache Iceberg tables using
    • Apache Spark
    • Presto/Trino on Starburst
    • Hive
  • Inserting data into Apache Iceberg tables
    • Batch inserts
    • Streaming inserts
    • Upserts
  • Efficiently querying Apache Iceberg tables
  • Demonstrating how Apache Iceberg’s data layout optimization enhances query performance
  • The Iceberg Catalog
  • The Metadata Layer
    • Metadata File
    • Manifest List
    • Manifest File
  • The Data Layer
  • A look under the covers when CRUDing
  • Managing schema evolution
  • Enabling partitions in Iceberg Tables
    • Hidden Partitioning
    • Partition Layer Evolution
  • Understanding Time Travel
  • Version Rollback
  • Data Compaction
  • Metrics and Alerts
  • Monitoring Iceberg Tables
Edit Content
  • Familiarity with basic SQL concepts
  • Basic Python programming skills
  • Knowledge of Hadoop & Apache Spark will be helpful

Connect

we'd love to have your feedback on your experience so far