Optimizing Data Lakehouses With Starburst
Best Practices for High-Performance Data Lakehouse Architectures
Duration
2 Days
Level
Intermediate to Advanced Level
Design and Tailor this course
As per your team needs
Overview
This 2-day course comprises instructor-led discussions, demonstrations, and hands-on exercises designed to build a working knowledge of the Starburst query engine. Participants will gain a more thorough awareness of Starburst architecture, focusing on best practices for data lake based schemas, including table formats and partitioning, file formats and sizes, and other optimization techniques.
Audience
This course is designed for:-
- Data engineers
- Data architects
- Experienced data analysts and data scientists
Prerequisites
Intermediate experience with SQL is assumed.
Curriculum
- Overview & architecture
- Web UI
- Connectors & catalogs
- Client tools integrations
- Separation of storage & compute
- Schema on read
- Limit Data Exchanges
- File format options
- Small files problem
- Partitioning & bucketing
- Moving beyond Hive
- Compare/contrast alternatives
- Table format architecture
- Creating tables
- Insert, update & delete
- CDC with merge
- Schema & partition evolution
- Snapshots & compaction
- Benefits of statistics
- Query plan analysis
Duration
2 Days
Level
Intermediate to Advanced Level
Design and Tailor this course
As per your team needs