Google Cloud Data Engineer

Understand major components of GCP, why and when to use which GCP product around Big Data and Machine Learning

Duration

5 Days

Level

Intermediate Level

Design and Tailor this course

As per your team needs

Edit

Google Cloud Platform (GCP) leverages a fully secure cloud infrastructure with plethora of functionalities and features. The Google Cloud Architect course will provide hands-on exposure to students for Architecting on GCP. By the end of the course students will be able to design, develop, and manage cloud solutions imparting robust, secure, scalable, highly available, and dynamic functionalities to fulfill the target business objectives.

The program is focussed on performing Data Engineering on Google Cloud Platform. During the process, participants will be building data analytics pipelines on Google Cloud Platform (GCP). The program will provide exposure to participants on how to enable data-driven decision making in any organization by designing end-to-end process i.e. for Data ingestion, Data storage, Data processing, Data analysis, Data modeling, and Data Visualization.

Productivity Objectives:   

Upon completion of this course, you should be able to:

  • Understand major components of GCP, why and when to use which GCP product around Big Data and Machine Learning
  • Learn which product to use when in GCP?
  • Perform interactive data exploration using Datalab
  • Work with data processing jobs in Dataproc and Dataflow 
  • Learn how to explore data using BigQuery
  • Perform Real time analytics etc.
Edit

The intended audience for this course:

  • Data Engineers
  • Data Scientists
  • Integration Engineers
  • Architects
Edit
  • Why Google Cloud Platform (GCP)?
  • Current Challenges with On-Premise Architectures
  • Role of a Data Engineer
  • How Google enables higher productivity for Data Engineers?
  • How Key Google Products fit in Enterprise Architecture?
  • How to design modern Data Analytics Pipeline on GCP?
  • Hands-on exercise: Getting familiar with Google Cloud Platform
  • Consideration for Building Data Lake on Cloud
  • Data Lake vs Data Warehouses
  • Various options while choose storage technologies
  • Use cases 
  • Which one to choose when?
  • Best Practices 
  • Transactional workloads vs Analytics workloads
  • About Cloud SQL
  • Working with Cloud SQL
  • Hands-on exercise: Bootstrapping Cloud SQL 
  • Hands-on exercise: Ingesting Data into Cloud SQL
  • Transactional workloads vs Analytics workloads
  • About Cloud SQL
  • Working with Cloud SQL
  • Hands-on exercise: Bootstrapping Cloud SQL 
  • Hands-on exercise: Ingesting Data into Cloud SQL
  • DataStore Overview
  • DataStore Use Cases
  • BigTable Overview 
  • BigTable Use Cases
  • What is BigQuery?
  • Capabilities
  • Logical Architecture
  • Data Analysis 
  • Best Practices 
  • Supported File Formats
  • Loading data through Cloud Storage
  • Scheduling BigQuery
  • Federated Data Sources 
  • Complex Data Type Support
  • Performance Optimization Techniques
  • Demo: Analyzing data using BigQuery
  • Demo: Federated Queries with BigQuery
  • Hands-on exercise: Loading Data into BigQuery
  • Challenges with On-Prem Hadoop Clusters
  • How Google Cloud Dataproc solves the challenges?
  • Provisioning and Managing clusters
  • Preemptible VMs
  • Advantages of GCS over HDFS for DataProc 
  • Concept of Ephemeral Clusters
  • Using Web Console
  • Automating Cluster Creation Process
  • Dataproc REST API
  • StackDriver Overview
  • Hands-on exercise: Running Spark Application on Dataproc
  • Hands-on exercise: Integrating Cloud SQL and Spark 
  • Introducing Google Cloud Dataflow
  • Apache Beam API 
  • Building Dataflow Pipelines
  • What is Streaming Analytics?
  • Use-cases
  • Batch vs. Streaming Processing
  • Windowing and Sliding Window
  • Aggregation
  • Events, triggers
  • Integrating with GCS, BigQuery and Pub/Sub
  • Side Inputs in Dataflow
  • Hands-on exercise: Python Based Dataflow Job
  • Hands-on exercise: Dataflow Job using Template
  • Challenges with Streaming data ingestion
  • Introduction to Cloud Pub/Sub
  • Capabilities
  • Demo: Walkthrough of Data Studio (BI Tool)
  • Hands-on exercise: Ingesting Data using Cloud Pub/Sub
  • Hands-on exercise: Building Streaming Pipeline using Cloud Pub/Sub, Dataflow and DataStudio
  • Role of Cloud Functions 
  • How to setup Cloud Functions
  • Integrating with Cloud Pub/Sub
  • Hands-on exercise: Processing Pub/Sub data using Cloud Functions
  • Why Cloud Composer?
  • Airflow Environment
  • Building, Scheduling and Monitoring Workflows 
  • Hands-on exercise: Cloud Composer
  • Why Google Cloud Data Fusion?
  • Key Components
  • Building a Pipeline
  • Hands-on exercise: Cloud DataFusion
Edit

Participants should preferably have a basic knowledge of SQL and Python. Participants with similar background are preferred for best results.

Connect

we'd love to have your feedback on your experience so far