Delving Deeper into Trino
Mastering SQL, Optimization, and Data Integration for High-Performance Data Queries
Duration
4 Days (8 hours per day)
Level
Intermediate to Advanced Level
Design and Tailor this course
As per your team needs
Edit Content
This course is designed to provide participants with a comprehensive understanding of Trino (formerly Presto SQL), a powerful distributed SQL query engine. Participants will learn to develop and optimize queries, integrate Trino with various data sources, and implement performance tuning and best practices. The hands-on approach will ensure that learners gain practical experience in real-world scenarios.
Edit Content
- Data Analysts
- Data Engineers
- Database Administrators
- Data Scientists
- IT Professionals interested in data querying and integration
Edit Content
- Overview of Trino
- Why do companies use Trino?
- Common Trino deployments (Ad-hoc querying, ETL workloads, real-time analytics)
- Case studies of Trino in the industry (Netflix, LinkedIn, etc.)
- Federated Queries Overview
- Modern Data Lake and how Trino fits
- Trino Setup on Kubernetes/Helm (Basic Understanding)
- Trino Catalog Overview (Hive/Iceberg/Bigquery etc)
- Architecture and Components
- Separation of storage & compute
- Schema on read
- Lab: Running Federated Queries
Participants will execute federated queries on pre-configured Amazon S3 and PostgreSQL databases, seamlessly joining data to experience the benefits of data federation
- Understanding Parquet and ORC Formats
- Why is Parquet and ORC better for analytics than JSON or CSV?
- Compression and predicate pushdown in Parquet/ORC
- JSON Handling
- Lab: Using JSON dataset and Applying Functions
Participants will upload and analyze JSON files to solve a business scenario. They will also query the same dataset stored in different file format (Parquet, ORC, and JSON) and compare query execution times for each
- Add Conditional Logic to a Query
- Pivoting in Trino
- Aggregate on multiple combinations of columns more efficiently with GROUPING – SETS, CUBE, and ROLLUP
- Explore Window Functions
- Define the rows included in a window function using RANGE or ROWS
- Lab: Multiple Scenario based
Participants will be provided with business scenarios to apply SQL Functions for aggregation and windowing to get desired results.
- Efficient Query Designs for Joins (JOIN ON vs. JOIN USING)
- Limiting Data Transfer (LIMIT, FETCH, APPROX Functions)
- Using Indexes Effectively
- Filtering Early / Dynamic Filtering
- Understanding Iceberg as a Table Format
- Iceberg Architecture
-
- Metadata layer: Snapshots, manifests, partitioning
- Configuring Iceberg in Trino
- Working with Data: Insert, Update, and Delete
- Lab 1 : Configuring and Querying Iceberg Tables in Trino
Participants will set up the Iceberg catalog in Trino, create Iceberg tables, explore metadata layers (snapshots, manifests, partitions), and run queries to understand Iceberg’s architecture - Lab 2: Creating and Managing Iceberg Tables
Participants will create Iceberg tables, insert records, and experiment with update & delete operations
- Change Data Capture (CDC) with Merge
- Using MERGE for CDC in Iceberg
- Schema Evolution in Iceberg
- Adding, removing, and modifying columns without rewriting data
- Partition Evolution
- How Iceberg avoids the limitations of Hive-style static partitioning
- Changing partition strategies dynamically
- Lab 1: Implementing CDC in Iceberg
Participants will use MERGE INTO to simulate a CDC workflow - Lab 2 : Schema and Partition Evolution
Participants will modify schemas and partitions dynamically and analyze query performance
- Iceberg Snapshots and Time-Travel Queries
- Querying previous snapshots
- Rolling back to an earlier state
- Compaction Strategies
- Reducing small files and optimizing query performance
- Using OPTIMIZE in Trino to compact Iceberg tables
- Lab: Working with Snapshots and Compaction
Participants will run time-travel queries, rollback changes, and perform compaction for performance improvement
- Partitioning Overview
- Range and Hash Partitioners
- Implementing Partitioning
- Bucketing Overview
- Implementing Bucketing
- Lab: Partitioning & Bucketing
Participants will create partitioned and bucketed tables, then run performance comparisons against standard tables
- Using Resource Groups for Query Prioritization
- Configuring Query Limits (max memory, concurrency limits)
- Isolating workloads using Query Queues & Execution Pools
- Lab: Implement resource groups for different workloads
- Filtering Early / Dynamic Filtering
- Understanding Predicate/Projection/Aggregations Pushdown
- Join Reordering
- Use Columnar File Formats (Parquet, ORC)
- Avoid small file problem
- Query Caching
- Using Materialized Views
- Lab 1: Using Materialized Views
Participants will create and refresh materialized views based on business requirements. - Lab 2: Exploring Different File Formats
Participants will create tables using different file formats, analyze performance metrics, and address small file problems
- Understanding Query Execution Plans and Cost-Based Optimization
- Filtering Early / Dynamic Filtering
- Understanding Predicate Pushdown
- Joins
- Aggregations
- Lab: Understanding Execution Plan for different queries
Participants will explore execution plans, stages, tasks, and predicate pushdown
- Configuring Connectors for PostgreSQL, BigQuery , Hive and Iceberg
- Lab: Connector Configuration
Participants will configure a PostgreSQL connector and run analytical queries
- Monitoring Trino with Trino UI
- Identifying Bottlenecks (Memory, CPU, I/O)
- Log Analysis for Coordinator and Worker Nodes
- Common Troubleshooting Scenarios (e.g., Query Failures)
- Lab: Monitoring & Troubleshooting
Participants will explore Trino UI and do troubleshooting
Edit Content
- Basic knowledge of SQL
- Prior experience with database management systems (DBMS) is beneficial