Delving Deeper into Trino

Mastering SQL, Optimization, and Data Integration for High-Performance Data Queries

Duration

4 Days (8 hours per day)

Level

Intermediate to Advanced Level

Design and Tailor this course

As per your team needs

Edit Content

This course is designed to provide participants with a comprehensive understanding of Trino (formerly Presto SQL), a powerful distributed SQL query engine. Participants will learn to develop and optimize queries, integrate Trino with various data sources, and implement performance tuning and best practices. The hands-on approach will ensure that learners gain practical experience in real-world scenarios.

Edit Content

Introduction to Trino

Overview of Trino
Why do companies use Trino?
Common Trino deployments (Ad-hoc querying, ETL workloads, real-time analytics)
Case studies of Trino in the industry (Netflix, LinkedIn, etc.)
Federated Queries Overview
Modern Data Lake and how Trino fits
Trino Setup on Kubernetes/Helm (Basic Understanding)
Trino Catalog Overview (Hive/Iceberg/Bigquery etc)
Architecture and Components
- Separation of storage & compute
- Schema on read
Lab: Running Federated Queries
Participants will execute federated queries on pre-configured Amazon S3 and PostgreSQL databases, seamlessly joining data to experience the benefits of data federation

Querying Data in Columnar Formats (Parquet/ORC)

Understanding Parquet and ORC Formats
Why is Parquet and ORC better for analytics than JSON or CSV?
Compression and predicate pushdown in Parquet/ORC
JSON Handling
Lab: Using JSON dataset and Applying Functions
Participants will upload and analyze JSON files to solve a business scenario. They will also query the same dataset stored in different file format (Parquet, ORC, and JSON) and compare query execution times for each

Advance SQL Queries with Trino

Add Conditional Logic to a Query
Pivoting in Trino
Aggregate on multiple combinations of columns more efficiently with GROUPING – SETS, CUBE, and ROLLUP
Explore Window Functions
Define the rows included in a window function using RANGE or ROWS
Lab: Multiple Scenario based
Participants will be provided with business scenarios to apply SQL Functions for aggregation and windowing to get desired results.

SQL Best Practices

Efficient Query Designs for Joins (JOIN ON vs. JOIN USING)
Limiting Data Transfer (LIMIT, FETCH, APPROX Functions)
Using Indexes Effectively
Filtering Early / Dynamic Filtering

Working with Apache Iceberg in Trino

Understanding Iceberg as a Table Format
Iceberg Architecture
- Metadata layer: Snapshots, manifests, partitioning
Configuring Iceberg in Trino
Working with Data: Insert, Update, and Delete
Lab 1 : Configuring and Querying Iceberg Tables in Trino
Participants will set up the Iceberg catalog in Trino, create Iceberg tables, explore metadata layers (snapshots, manifests, partitions), and run queries to understand Iceberg’s architecture
Lab 2: Creating and Managing Iceberg Tables
Participants will create Iceberg tables, insert records, and experiment with update & delete operations

Schema and Partition Evolution in Iceberg

Change Data Capture (CDC) with Merge
- Using MERGE for CDC in Iceberg
Schema Evolution in Iceberg
- Adding, removing, and modifying columns without rewriting data
Partition Evolution
- How Iceberg avoids the limitations of Hive-style static partitioning
- Changing partition strategies dynamically
Lab 1: Implementing CDC in Iceberg
Participants will use MERGE INTO to simulate a CDC workflow
Lab 2 : Schema and Partition Evolution
Participants will modify schemas and partitions dynamically and analyze query performance

Managing Iceberg

Iceberg Snapshots and Time-Travel Queries
- Querying previous snapshots
- Rolling back to an earlier state
Compaction Strategies
- Reducing small files and optimizing query performance
- Using OPTIMIZE in Trino to compact Iceberg tables
Lab: Working with Snapshots and Compaction
Participants will run time-travel queries, rollback changes, and perform compaction for performance improvement

Partitioning and Bucketing Strategies

Partitioning Overview
Range and Hash Partitioners
Implementing Partitioning
Bucketing Overview
Implementing Bucketing
Lab: Partitioning & Bucketing
Participants will create partitioned and bucketed tables, then run performance comparisons against standard tables

Resource Management & Workload Isolation

Using Resource Groups for Query Prioritization
Configuring Query Limits (max memory, concurrency limits)
Isolating workloads using Query Queues & Execution Pools
Lab: Implement resource groups for different workloads

Understanding various optimization techniques

Filtering Early / Dynamic Filtering
Understanding Predicate/Projection/Aggregations Pushdown
Join Reordering
Use Columnar File Formats (Parquet, ORC)
Avoid small file problem
Query Caching
Using Materialized Views
Lab 1: Using Materialized Views
Participants will create and refresh materialized views based on business requirements.
Lab 2: Exploring Different File Formats
Participants will create tables using different file formats, analyze performance metrics, and address small file problems

Query Optimization

Understanding Query Execution Plans and Cost-Based Optimization
Filtering Early / Dynamic Filtering
Understanding Predicate Pushdown
Joins
Aggregations
Lab: Understanding Execution Plan for different queries
Participants will explore execution plans, stages, tasks, and predicate pushdown

Connecting Trino to Different Data Sources

Configuring Connectors for PostgreSQL, BigQuery , Hive and Iceberg
Lab: Connector Configuration
Participants will configure a PostgreSQL connector and run analytical queries

Debugging and Monitoring

Monitoring Trino with Trino UI
Identifying Bottlenecks (Memory, CPU, I/O)
Log Analysis for Coordinator and Worker Nodes
Common Troubleshooting Scenarios (e.g., Query Failures)
Lab: Monitoring & Troubleshooting
Participants will explore Trino UI and do troubleshooting

Edit Content

FIND YOUR COURSE

Topics

Brands

Delving Deeper into Trino

Duration

Level

Design and Tailor this course

Quick Links

our Offerings

Get in touch

Sign up for DataCouch Communications

Connect

we'd love to have your feedback on your experience so far

Ready for a Career in the AI Cloud?

Get Snowflake Certified in Just One Day.