Lorem ipsum dolor sit amet, conse ctetur adip elit, pellentesque turpis.

  • No products in the cart.

Image Alt

Big Data Analytics Project Based Training

  /    /  Big Data Analytics Project Based Training

Big Data Analytics Project Based Training

Categories:
Big Data
Reviews:

Project Background

“DataCouch Power Ltd.” (fictitious energy utility company) has installed Smart meters at customers’ homes/premises across London, UK. These meters emit energy consumption at small interval of time. Such smart meter readings are sent to the utility company, for them to charge based on how much electrical energy a customer consumes.

In order to optimize cost of operation and provide better Customer experience, this utility company wants to determine overuse of sanctioned load, perform proactive maintenance like identifying faulty meters, plan service outages etc. so that there is minimal impact to Customers.

Apart from above use cases, energy disaggregation service is available from third party that can analyze the energy smart meter logs and give deeper insights about the appliances and other settings that consumes the energy. Such disaggregated data can be used to cross sell information like customer lifestyle, customer profile, appliances under use and their usage pattern to third party.

Project Objectives

  • Analyze batch data and provide insights about energy usage pattern in last couple of years.
  • Identify Faulty Meters which require attention. This data should be captured in streaming mode.
  • Find out average energy consumption by each consumer in every 10 seconds time frame.
  • Segregate meters which are consuming energy over and above their sanctioned load.
  • Display Dashboard for business to perform Cube Analytics on data.

This project will provide exposure to participants to build Big Data Analytics end to end data pipeline from scratch.

This intensive training course encompasses lectures and hands-on labs that help participants learn theoretical knowledge and gain practical experience of Spark Core, SQL and Streaming. Hands-on exercises will enable participants to work with various common data sources like HDFS, MySQL, HBase, Kafka etc. Also during the course, participants will get exposure to deal with variety of Dataformats including CSV, JSON, XML, Log files, Avro, Parquet, ORC etc. using Spark Framework.

    • Learn basics of Scala
    • Understand Spark Core, SQL and Streaming Architecture and APIs
    • Get Practical exposure to key projects like Spark, Kafka etc.
    • Develop Distributed code using the Scala programming language
    • Optimize Spark jobs through partitioning, caching, and other techniques
    • Build, deploy, and run Spark code on Hadoop clusters
    • Transform structured data using SparkSQL and DataFrames
    • Process and Analyze Streaming Data using Spark
    • Integrate Spark with Kafka, HBase etc.
    • Work with key dataformats like Avro, Parquet etc.

This program is designed for :

  • Developers
  • Analysts
  • Architects
  • Team Leads
  • Data Scientists
Introduction to Scala
  • History of Scala Language
  • What is Scala?
  • Design Goals of Scala
  • Advantages of Functional Programming
  • Scala vs Java
  • Scala and Java
  • Introduction to Eclipse IDE
  • Scala Shell Overview
  • Scala with Zeppelin Notebooks
Quick Recap of Hadoop for Spark
  • Recap of HDFS for Spark
  • Recap of YARN w.r.t. Spark
  • Recap of HBase
  • How to use YARN Commands?
  • Recap of MapReduce Logical Architecture
  • Hands-on Exercise
Scala Basics
  • Variables and Constants
  • Key Datatypes in Scala
  • Dealing with Numeric, Boolean and String types
  • Scala Shell Commands
  • Scala Key Built-in Functions
  • Scala Collections
  • Manipulating Tuples, Seq, Map, List etc.
  • Flow Control in Scala
  • Hands-on: Exercises
Modular Scala
  • User Defined Functions
  • Anonymous Functions
  • Classes and Objects
  • Packages
  • Traits
  • Ways to compile Scala Code
  • Compiling and Deploying Scala Code
  • Hands-on: Exercises
Dealing With Key Data Formats in Scala
  • Processing CSV data using Scala
  • Dealing with XML files in Scala
  • JSON processing using Scala
  • Regular expressions
  • Processing Semi-structured data
  • Extending Hive using Use
  • Hands-on: Exercises
Introduction to Spark
  • Spark Overview
  • Detailed discussion on “Why Spark”
  • Quick Recap of MapReduce
  • Spark vs MapReduce
  • Why Scala for Spark?
Spark Core Framework and API
  • High level Spark Architecture
  • Role of Executor, Driver, SparkContext etc.
  • Resilient Distributed Datasets
  • Basic operations in Spark Core API i.e. Actions and Transformations
  • Using the Spark REPL for performing interactive data analysis
  • Hands-on Exercises
Delving Deeper Into Spark API
  • Pair RDDs
  • Implementing MapReduce Algorithms using Spark
  • Ways to create Pair RDDs
  • JSON Processing
  • Code Example on JSON Processing
  • XML Processing
  • Joins
  • Playing with Regular Expressions
  • Log File Processing using Regular Expressions
  • Hands-on Exercises
Executing a Spark Application
  • Writing Standalone Spark Application
  • Building Standalone Scala Spark Application using Maven
  • Various commands to execute and configure Spark Applications in various modes
  • Discussion on Application, Job, Stage, Executor, Tasks
  • Interpreting RDD Metadata/Lineage/DAG
  • Controlling degree of Parallelism in Spark Job
  • Physical execution of a Spark application
  • Discussion on: How Spark is better than MapReduce?
  • Hands-on Exercises
Advanced Features Of Spark
  • Persistence
  • Location
  • Data Format of Persistence
  • Replication
  • Partitioned By
  • Coalesce
  • Accumulators
  • Broadcasting for optimizing performance of Spark jobs
  • Hands-on Exercises
Spark Streaming
  • Analyzing streaming data using Spark
  • Stateless Streaming
  • Stateful Streaming
  • Quick introduction to Kafka Architecture
  • Role of Zookeeper, Brokers etc.
  • Hands-on Exercises
Spark SQL
  • Introduction
  • Dataframe API
  • Performing ad-hoc query analysis using Spark SQL
  • Working with Hive Partitioning
  • Hands-on Exercises
Iterative Processing Using Spark
  • Introduction to Iterative Processing
  • Checkpointing
  • Checkpointing vs Persist
  • Example of Iterative Processing
  • K Means Clustering
  • Hands-on Exercises
Dataset API
  • Introduction to Datasets
  • Why Datasets?
  • Datasets vs Dataframes
  • Using Dataset API
  • Hands-on Exercises
Structured Streaming
  • Structured Streaming Overview
  • How it is better than streaming?
  • Structured Streaming API
  • Hands-on Exercises

This is a hands-on training therefore it is advisable to have some basic knowledge of Hadoop like Hive queries, HDFS commands etc.

Reviews

Our course begins with the first step for generating great user experiences: understanding what people do, think, say, and feel. In this module, you’ll learn how to keep an open mind while learning.

Course Information

Duration

5 Days

Mode of Delivery

Instructor led/Virtual

Level

Intermediate

Have more queries?Our representative will got back to you!



Fill up the form to download the course PDF

Your Name (required)

Your Email (required)

Phone (required)

Post a Comment