Getting Started with Apache Iceberg

Analyze your huge data with high-performance and reliability

Duration

1 Day

Level

Basic Level

Design and Tailor this course

Edit Content

In the ever-evolving landscape of data management, innovation is the key to extracting maximum value from your data assets. The new “Getting Started with Apache Iceberg” course is a one-day immersive journey that will equip you with the essential knowledge to leverage Apache Iceberg – a groundbreaking technology that’s reshaping the way data is stored, managed, and queried within data lakes.

Course Highlights

Discovering the Apache Iceberg Advantages
Deep dive into Apache Iceberg Architecture
Writing Efficient Data Queries
Understanding Apache Iceberg Internals
Manage, Monitor and Optimize Apache Iceberg
Hands-on Implementation

Edit Content

Understanding Apache Iceberg

Evolution of Data Platforms
Understanding Data Lakes and Technologies available
Challenges with Data Lakes
Introduction to Apache Iceberg
Benefits of Apache Iceberg
Apache Iceberg vs Delta Lake vs Hudi
When to choose Apache Iceberg over other formats for data lake storage?

Apache Iceberg Architecture

Overview of Apache Iceberg architecture
Various Apache Iceberg Components
How does Apache Iceberg handle metadata and data versioning?
Integration of Apache Iceberg with key data processing engines like Starburst, Spark

Setting Up Apache Iceberg

Installation and setup of Apache Iceberg
Configuring metadata storage for Apache Iceberg tables

Creating Apache Iceberg Tables

Apache Iceberg table structure
Step-by-step guide to creating Apache Iceberg tables using
- Apache Spark
- Presto/Trino on Starburst
- Hive

Writing and Reading Data

Inserting data into Apache Iceberg tables
- Batch inserts
- Streaming inserts
- Upserts
Efficiently querying Apache Iceberg tables
Demonstrating how Apache Iceberg’s data layout optimization enhances query performance

Internals of Apache Iceberg

The Iceberg Catalog
The Metadata Layer
- Metadata File
- Manifest List
- Manifest File
The Data Layer
A look under the covers when CRUDing

Management, Monitoring and Optimization

Managing schema evolution
Enabling partitions in Iceberg Tables
- Hidden Partitioning
- Partition Layer Evolution
Understanding Time Travel
Version Rollback
Data Compaction
Metrics and Alerts
Monitoring Iceberg Tables

Edit Content

FIND YOUR COURSE

Topics

Brands

Getting Started with Apache Iceberg

Duration

Level

Design and Tailor this course

Course Highlights

Quick Links

our Offerings

Get in touch

Sign up for DataCouch Communications

Connect

we'd love to have your feedback on your experience so far