Google Cloud Data Engineer
Design and Tailor this course
As per your team needs
Google Cloud Platform (GCP) leverages a fully secure cloud infrastructure with plethora of functionalities and features. The Google Cloud Architect course will provide hands-on exposure to students for Architecting on GCP. By the end of the course students will be able to design, develop, and manage cloud solutions imparting robust, secure, scalable, highly available, and dynamic functionalities to fulfill the target business objectives.
The program is focussed on performing Data Engineering on Google Cloud Platform. During the process, participants will be building data analytics pipelines on Google Cloud Platform (GCP). The program will provide exposure to participants on how to enable data-driven decision making in any organization by designing end-to-end process i.e. for Data ingestion, Data storage, Data processing, Data analysis, Data modeling, and Data Visualization.
Upon completion of this course, you should be able to:
- Understand major components of GCP, why and when to use which GCP product around Big Data and Machine Learning
- Learn which product to use when in GCP?
- Perform interactive data exploration using Datalab
- Work with data processing jobs in Dataproc and Dataflow
- Learn how to explore data using BigQuery
- Perform Real time analytics etc.
The intended audience for this course:
- Data Engineers
- Data Scientists
- Integration Engineers
- Why Google Cloud Platform (GCP)?
- Current Challenges with On-Premise Architectures
- Role of a Data Engineer
- How Google enables higher productivity for Data Engineers?
- How Key Google Products fit in Enterprise Architecture?
- How to design modern Data Analytics Pipeline on GCP?
- Hands-on exercise: Getting familiar with Google Cloud Platform
- Consideration for Building Data Lake on Cloud
- Data Lake vs Data Warehouses
- Various options while choose storage technologies
- Use cases
- Which one to choose when?
- Best Practices
- What is BigQuery?
- Logical Architecture
- Data Analysis
- Best Practices
- Supported File Formats
- Loading data through Cloud Storage
- Scheduling BigQuery
- Federated Data Sources
- Complex Data Type Support
- Performance Optimization Techniques
- Demo: Analyzing data using BigQuery
- Demo: Federated Queries with BigQuery
- Hands-on exercise: Loading Data into BigQuery
- Challenges with On-Prem Hadoop Clusters
- How Google Cloud Dataproc solves the challenges?
- Provisioning and Managing clusters
- Preemptible VMs
- Advantages of GCS over HDFS for DataProc
- Concept of Ephemeral Clusters
- Using Web Console
- Automating Cluster Creation Process
- Dataproc REST API
- StackDriver Overview
- Hands-on exercise: Running Spark Application on Dataproc
- Hands-on exercise: Integrating Cloud SQL and Spark
- Introducing Google Cloud Dataflow
- Apache Beam API
- Building Dataflow Pipelines
- What is Streaming Analytics?
- Batch vs. Streaming Processing
- Windowing and Sliding Window
- Events, triggers
- Integrating with GCS, BigQuery and Pub/Sub
- Side Inputs in Dataflow
- Hands-on exercise: Python Based Dataflow Job
- Hands-on exercise: Dataflow Job using Template
- Challenges with Streaming data ingestion
- Introduction to Cloud Pub/Sub
- Demo: Walkthrough of Data Studio (BI Tool)
- Hands-on exercise: Ingesting Data using Cloud Pub/Sub
- Hands-on exercise: Building Streaming Pipeline using Cloud Pub/Sub, Dataflow and DataStudio
Participants should preferably have a basic knowledge of SQL and Python. Participants with similar background are preferred for best results.