Lorem ipsum dolor sit amet, conse ctetur adip elit, pellentesque turpis.

  • No products in the cart.

Image Alt

Data Engineering with GCP

  /    /  Data Engineering with GCP

Data Engineering with GCP

Categories:
Cloud computing
Reviews:

Google Cloud Platform (GCP) leverages a fully secure cloud infrastructure with plethora of functionalities and features. The Google Cloud Architect course will provide hands-on exposure to students for Architecting on GCP. By the end of the course students will be able to design, develop, and manage cloud solutions imparting robust, secure, scalable, highly available, and dynamic functionalities to fulfill the target business objectives. The program is focussed on performing Data Engineering on Google Cloud Platform. During the process, participants will be building data analytics pipelines on Google Cloud Platform (GCP). The program will provide exposure to participants on how to enable data-driven decision making in any organization by designing end-to-end process i.e. for Data ingestion, Data storage, Data processing, Data analysis, Data modeling, and Data Visualization. 

The intended audience for this course

  • Architects
  • Data Engineers
  • Analysts
Getting holistic view: GCP Data Products and Pipelines
  • Why Google Cloud Platform (GCP)?
  • Current Challenges with On-Premise Architectures
  • Role of a Data Engineer
  • How Google enables higher productivity for Data Engineers?
  • How Key Google Products fit in Enterprise Architecture?
  • How to design modern Data Analytics Pipeline on GCP?
  • Hands-on exercise: Getting familiar with Google Cloud Platform
Google Cloud Storage Technologies
  • Consideration for Building Data Lake on Cloud
  • Data Lake vs Data Warehouses
  • Various options while choose storage technologies
  • Use cases 
  • Which one to choose when?
  • Best Practices 

 

 

 

 

Google Cloud Storage
  • Overview
  • HDFS vs Google Cloud Storage
  • Concepts and Terms
  • GCS Classes and Lifecycle management
  • Introduction to Cloud Shell and Gsutil
  • Working with Google Cloud Storage
  • Hands-on exercise(s) – Working with GCS

 

 

 

 

 

Google Cloud SQL
  • Transactional workloads vs Analytics workloads
  • About Cloud SQL
  • Working with Cloud SQL
  • Hands-on exercise: Bootstrapping Cloud SQL 
  • Hands-on exercise: Ingesting Data into Cloud SQL

 

 

 

NoSQL GCP Databases
  • DataStore Overview
  • DataStore Use Cases
  • BigTable Overview 
  • BigTable Use Cases

 

 

BigQuery: Serverless Analysis
  • What is BigQuery?
  • Capabilities
  • Logical Architecture
  • Data Analysis 
  • Best Practices 
  • Supported File Formats
  • Loading data through Cloud Storage
  • Scheduling BigQuery
  • Federated Data Sources 
  • Complex Data Type Support
  • Performance Optimization Techniques
  • Demo: Analyzing data using BigQuery
  • Demo: Federated Queries with BigQuery
  • Hands-on exercise: Loading Data into BigQuery

 

 

 

 

 

 

 

Dataproc: Run Hadoop/Spark on GCP
  • Challenges with On-Prem Hadoop Clusters
  • How Google Cloud Dataproc solves the challenges?
  • Provisioning and Managing clusters
  • Preemptible VMs
  • Advantages of GCS over HDFS for DataProc 
  • Concept of Ephemeral Clusters
  • Using Web Console
  • Automating Cluster Creation Process
  • Dataproc REST API
  • StackDriver Overview
  • Hands-on exercise: Running Spark Application on Dataproc
  • Hands-on exercise: Integrating Cloud SQL and Spark 

 

 

 

 

Dataflow: Building Serverless Pipelines
  • Introducing Google Cloud Dataflow
  • Apache Beam API 
  • Building Dataflow Pipelines
  • What is Streaming Analytics?
  • Use-cases
  • Batch vs. Streaming Processing
  • Windowing and Sliding Window
  • Aggregation
  • Events, triggers
  • Integrating with GCS, BigQuery and Pub/Sub
  • Side Inputs in Dataflow
  • Hands-on exercise: Python Based Dataflow Job
  • Hands-on exercise: Dataflow Job using Template

 

 

 

Cloud Pub/Sub: Ingesting Data at Scale
    • Challenges with Streaming data ingestion
    • Introduction to Cloud Pub/Sub
    • Capabilities
    • Demo: Walkthrough of Data Studio (BI Tool)
    • Hands-on exercise: Ingesting Data using Cloud Pub/Sub
    • Hands-on exercise: Building Streaming Pipeline using Cloud Pub/Sub, Dataflow and DataStudio

 

 

 

Cloud Functions
  • Role of Cloud Functions 
  • How to setup Cloud Functions
  • Interacting with Cloud Pub/Sub
  • Hands-on exercise: Processing Pub/Sub data using Cloud Functions
Cloud Composer
  • Why Cloud Composer?
  • Airflow Environment
  • Building, Scheduling and Monitoring Workflows 
  • Hands-on exercise: Cloud Composer

 

 

Data Fusion: Building Data Pipeline
  • Why Google Cloud Data Fusion?
  • Key Components
  • Building a Pipeline
  • Hands-on exercise: Cloud DataFusion

 

 

 

  • GCP Account
  • Basic knowledge of Big Data, Hadoop, Spark, Kafka, HDFS, etc.

Course Information

Duration

5 Days

Mode of Delivery

Instructor led/Virtual

Reach out to us..Our representative will get back to you!




Post a Comment