Lorem ipsum dolor sit amet, conse ctetur adip elit, pellentesque turpis.

  • No products in the cart.

Image Alt

Managing Big Data using Hadoop and Spark

  /    /  Managing Big Data using Hadoop and Spark

Managing Big Data using Hadoop and Spark

Big Data

The program is focussed on ingestion, storage, processing and analysis of Big data using Hadoop and Spark Ecosystem i.e. HDFS, MapReduce, YARN, Sqoop, Flume, Hive, Spark Core, Pig, Impala, HBase and Kafka.

    • Holistic Overview of Hadoop and Spark Ecosystem
    • Distributed Storage and Processing Concepts
    • Which technology/tool to choose when?
    • Architecture and Internals of key projects
    • How to perform data processing/analysis using Spark, Pig and Hive?

The intended audience for this course:

  • Application Developers
  • DevOps Engineers
  • Architects
  • System Engineers
  • Technical Managers
Introduction to Hadoop and Spark Ecosystem
  • Big Data Overview
  • Key Roles in Big Data Project
  • Key Business Use cases
  • Hadoop and Spark Logical Architecture
  • Typical Big Data Project Pipeline
Basic Concepts of HDFS
  • HDFS Overview
  • Physical Architectures of HDFS
  • The Hadoop Distributed File System Hands-on
MapReduce v1/YARN Frameworks, Architectures and MapReduce API
  • Java Basics for understanding, developing, building and deploying MapReduce Programs
  • Logical Architecture of MapReduce
  • Physical Architecture of MRv1 and YARN
  • Compare MRv1 vs. MRv2 on YARN
  • MapReduce API
  • Hands-on Exercise
  • Sqoop Basics
  • Sqoop Internals
  • Sqoop 1 vs Sqoop 2
  • Key Sqoop Commands
  • Hands-on Exercise
Ingesting Data with Flume
  • Flume Overview
  • Physical Architectures of Flume
  • Source, Sink and Channel
  • Building Data Pipeline using Flume
  • Hands-on Exercise
Working with Hive
  • Hive Overview and Use cases
  • How Hive Differ from Relational Databases
  • Basic Syntax in Hive
  • External and Managed Tables
  • Key Built-In functions in Hive
  • Hive vs. HiveServer2
  • Hands-on Exercise
Delving Deeper in Hive
  • Partitioning – Static and Dynamic
  • Hive UDFs
  • Hands-on Exercises
Working with HBase
  • HBase Overview
  • Physical Architectures of HBase
  • HBase Table Fundamentals
  • Thinking About Table Design
  • HBase Shell
  • HBase Physical Architecture
  • HBase Schema Design
  • HBase API
  • Hive on HBase
  • Hands-on Exercises
Introduction to Pig
  • What Is Pig?
  • Pig’s Features
  • Pig Use Cases
  • Interacting with Pig
Basic Data Analysis with Pig
  • Pig Latin Syntax
  • Loading Data
  • Simple Data Types
  • Field Definitions
  • Data Output
  • Viewing the Schema
  • Filtering and Sorting Data
  • Commonly-Used Functions
  • HandsOn Exercise: Using Pig for ETL Processing
Introduction to Impala
  • What is Impala?
  • How Impala Differs from Hive and Pig
  • How Impala Differs from Relational Databases
  • Limitations and Future Directions
  • Using the Impala Shell
Analyzing Data with Impala
  • Basic Syntax
  • Data Types
  • Filtering, Sorting, and Limiting Results
  • Joining and Grouping Data
  • Improving Impala Performance
  • HandsOn Exercise: Interactive Analysis with Impala  
Spark Overview
  • What is Spark?
  • Why Spark?
  • Data Abstraction – RDD
  • Logical Architecture of Spark
  • Programming Languages in Spark
  • Functional Programming with Spark
  • Hands-on Exercise
Interactive Data Exploration with Spark
  • Key RDD API Operations
  • Pair RDDs
  • MapReduce Operations
  • Join of Sqoop and Flume data
  • Spark on YARN
  • YARN client vs YARN cluster modes
  • Hands-on Exercise
Data Ingestion using Kafka
  • Kafka Overview
  • Kafka Architecture
  • Kafka Producer Consumer API
  • Flume vs Kafka
  • Hands-on Exercise

Participants should preferably have prior Software development experience along with basic knowledge of SQL and Unix commands. Knowledge of Python/Scala would be a plus.


Our course begins with the first step for generating great user experiences: understanding what people do, think, say, and feel. In this module, you’ll learn how to keep an open mind while learning.

Course Information


4 Days / 5 Days

Mode of Delivery

Instructor led/Virtual

Reach out to us..Our representative will get back to you!

Fill up the form to download the course PDF

Your Name (required)

Your Email (required)

Phone (required)

Post a Comment

Need Help? Chat with us
Please accept our privacy policy first to start a conversation.