Optimizing Data Lakehouses with Starburst
Leverage your knowledge of the Starburst query engine with focus on best practices to optimize data access
3 Day ( 8 hrs each day)
Reach out to us
This course comprises instructor-led discussions, demonstrations, and hands-on exercises designed to build a working knowledge of the Starburst query engine. Participants will gain a more thorough awareness of Starburst architecture, focusing on best practices for data lake based schemas, including table formats and partitioning, file formats and sizes, and other optimization techniques.
Upon completion of this course, you will be able to:
● Use Starburst as a single point of access for multiple data sources and federate queries across them
● Evaluate and describe how queries are executed within a Starburst cluster
● Use Hive and Iceberg table formats; construct, populate, query, and modify partitioned tables
● Employ file size/format/hierarchy strategies to improve query performance
● Understand the role of the Cost-based optimizer and read query plans to ensure optimizations are occurring as expected and to identify possible issues
● Create role-based access control policies for table operations
● Build a data engineering pipeline with Starburst Galaxy
This course is designed for data engineers, data architects, and experienced data analysts and data scientists.
Intermediate experience with SQL is assumed.
Apache, Apache Kafka, Apache Spark, Apache Trino, Apache Iceberg, Apache Hive, Kafka, Spark, Trino, Iceberg, Hive, and other associated open-source project names are the Apache Software Foundation trademarks. Starburst, Starburst Data, Starburst Enterprise, and Starburst Galaxy are registered trademarks of Starburst Data, Inc. All rights reserved. DataCouch is not affiliated with, endorsed by, or otherwise associated with the Apache Software Foundation (ASF) or any of its projects.