Getting Started with Apache Iceberg
Analyze your huge data with high-performance and reliability
Duration
1 Day
Level
Basic Level
Design and Tailor this course
Edit Content
In the ever-evolving landscape of data management, innovation is the key to extracting maximum value from your data assets. The new “Getting Started with Apache Iceberg” course is a one-day immersive journey that will equip you with the essential knowledge to leverage Apache Iceberg – a groundbreaking technology that’s reshaping the way data is stored, managed, and queried within data lakes.
Course Highlights
- Discovering the Apache Iceberg Advantages
- Deep dive into Apache Iceberg Architecture
- Writing Efficient Data Queries
- Understanding Apache Iceberg Internals
- Manage, Monitor and Optimize Apache Iceberg
- Hands-on Implementation
Edit Content
- Big data analysts who wish to leverage Apache Iceberg for performance gains
- Data Engineers intending to set up data lakehouses using Apache Iceberg
Edit Content
- Evolution of Data Platforms
- Understanding Data Lakes and Technologies available
- Challenges with Data Lakes
- Introduction to Apache Iceberg
- Benefits of Apache Iceberg
- Apache Iceberg vs Delta Lake vs Hudi
- When to choose Apache Iceberg over other formats for data lake storage?
- Overview of Apache Iceberg architecture
- Various Apache Iceberg Components
- How does Apache Iceberg handle metadata and data versioning?
- Integration of Apache Iceberg with key data processing engines like Starburst, Spark
- Installation and setup of Apache Iceberg
- Configuring metadata storage for Apache Iceberg tables
- Apache Iceberg table structure
- Step-by-step guide to creating Apache Iceberg tables using
- Apache Spark
- Presto/Trino on Starburst
- Hive
- Inserting data into Apache Iceberg tables
- Batch inserts
- Streaming inserts
- Upserts
- Efficiently querying Apache Iceberg tables
- Demonstrating how Apache Iceberg’s data layout optimization enhances query performance
- The Iceberg Catalog
- The Metadata Layer
- Metadata File
- Manifest List
- Manifest File
- The Data Layer
- A look under the covers when CRUDing
- Managing schema evolution
- Enabling partitions in Iceberg Tables
- Hidden Partitioning
- Partition Layer Evolution
- Understanding Time Travel
- Version Rollback
- Data Compaction
- Metrics and Alerts
- Monitoring Iceberg Tables
Edit Content
- Familiarity with basic SQL concepts
- Basic Python programming skills
- Knowledge of Hadoop & Apache Spark will be helpful