Data Engineering with GCP
Google Cloud Platform (GCP) leverages a fully secure cloud infrastructure with plethora of functionalities and features. The Google Cloud Architect course will provide hands-on exposure to students for Architecting on GCP. By the end of the course students will be able to design, develop, and manage cloud solutions imparting robust, secure, scalable, highly available, and dynamic functionalities to fulfill the target business objectives. The program is focussed on performing Data Engineering on Google Cloud Platform. During the process, participants will be building data analytics pipelines on Google Cloud Platform (GCP). The program will provide exposure to participants on how to enable data-driven decision making in any organization by designing end-to-end process i.e. for Data ingestion, Data storage, Data processing, Data analysis, Data modeling, and Data Visualization.
The intended audience for this course
- Data Engineers
Getting holistic view: GCP Data Products and Pipelines
- Why Google Cloud Platform (GCP)?
- Current Challenges with On-Premise Architectures
- Role of a Data Engineer
- How Google enables higher productivity for Data Engineers?
- How Key Google Products fit in Enterprise Architecture?
- How to design modern Data Analytics Pipeline on GCP?
- Hands-on exercise: Getting familiar with Google Cloud Platform
Google Cloud Storage Technologies
- Consideration for Building Data Lake on Cloud
- Data Lake vs Data Warehouses
- Various options while choose storage technologies
- Use cases
- Which one to choose when?
- Best Practices
Google Cloud Storage
- HDFS vs Google Cloud Storage
- Concepts and Terms
- GCS Classes and Lifecycle management
- Introduction to Cloud Shell and Gsutil
- Working with Google Cloud Storage
- Hands-on exercise(s) – Working with GCS
Google Cloud SQL
- Transactional workloads vs Analytics workloads
- About Cloud SQL
- Working with Cloud SQL
- Hands-on exercise: Bootstrapping Cloud SQL
- Hands-on exercise: Ingesting Data into Cloud SQL
NoSQL GCP Databases
- DataStore Overview
- DataStore Use Cases
- BigTable Overview
- BigTable Use Cases
BigQuery: Serverless Analysis
- What is BigQuery?
- Logical Architecture
- Data Analysis
- Best Practices
- Supported File Formats
- Loading data through Cloud Storage
- Scheduling BigQuery
- Federated Data Sources
- Complex Data Type Support
- Performance Optimization Techniques
- Demo: Analyzing data using BigQuery
- Demo: Federated Queries with BigQuery
- Hands-on exercise: Loading Data into BigQuery
Dataproc: Run Hadoop/Spark on GCP
- Challenges with On-Prem Hadoop Clusters
- How Google Cloud Dataproc solves the challenges?
- Provisioning and Managing clusters
- Preemptible VMs
- Advantages of GCS over HDFS for DataProc
- Concept of Ephemeral Clusters
- Using Web Console
- Automating Cluster Creation Process
- Dataproc REST API
- StackDriver Overview
- Hands-on exercise: Running Spark Application on Dataproc
- Hands-on exercise: Integrating Cloud SQL and Spark
Dataflow: Building Serverless Pipelines
- Introducing Google Cloud Dataflow
- Apache Beam API
- Building Dataflow Pipelines
- What is Streaming Analytics?
- Batch vs. Streaming Processing
- Windowing and Sliding Window
- Events, triggers
- Integrating with GCS, BigQuery and Pub/Sub
- Side Inputs in Dataflow
- Hands-on exercise: Python Based Dataflow Job
- Hands-on exercise: Dataflow Job using Template
Cloud Pub/Sub: Ingesting Data at Scale
- Challenges with Streaming data ingestion
- Introduction to Cloud Pub/Sub
- Demo: Walkthrough of Data Studio (BI Tool)
- Hands-on exercise: Ingesting Data using Cloud Pub/Sub
- Hands-on exercise: Building Streaming Pipeline using Cloud Pub/Sub, Dataflow and DataStudio
- Role of Cloud Functions
- How to setup Cloud Functions
- Interacting with Cloud Pub/Sub
- Hands-on exercise: Processing Pub/Sub data using Cloud Functions
- Why Cloud Composer?
- Airflow Environment
- Building, Scheduling and Monitoring Workflows
- Hands-on exercise: Cloud Composer
Data Fusion: Building Data Pipeline
- Why Google Cloud Data Fusion?
- Key Components
- Building a Pipeline
- Hands-on exercise: Cloud DataFusion
- GCP Account
- Basic knowledge of Big Data, Hadoop, Spark, Kafka, HDFS, etc.