Lorem ipsum dolor sit amet, conse ctetur adip elit, pellentesque turpis.

  • No products in the cart.

Image Alt

Big Data Development and Analysis

  /    /  Big Data Development and Analysis

Big Data Development and Analysis

Big Data

Big data for development” is a concept that refers to the identification of sources of big data relevant to the policies and planning for development programs. Big data analytics is the often complex process of examining large and varied data sets or to uncover information including hidden patterns, unknown correlations, market trends and customer preferences that can help organizations make informed business decisions.

The program is focussed on ingestion, storage, processing and analysis of Big data using Hadoop and Spark Ecosystem i.e. HDFS, MapReduce, YARN, Sqoop, Hive, Spark Core, Impala, HBase and Kafka.

    • Holistic Overview of Hadoop and Spark Ecosystem
    • Distributed Storage and Processing Concepts
    • Which technology/tool to choose when?
    • Architecture and Internals of key projects
    • How to perform data processing/analysis using Spark and Hive?

The intended audience for this course:

  • Application Developers
  • DevOps Engineers
  • Architects
  • System Engineers
  • Technical Managers
Introduction to Hadoop and Spark Ecosystem
  • Big Data Overview
  • Key Roles in Big Data Project
  • Key Business Use cases
  • Hadoop and Spark Logical Architecture
  • Typical Big Data Project Pipeline
Basic Concepts of HDFS
  • HDFS Overview
  • Physical Architectures of HDFS
  • The Hadoop Distributed File System Hands-on
MapReduce v1/YARN Frameworks, Architectures and MapReduce API
  • Java Basics for understanding, developing, building and deploying MapReduce Programs
  • Logical Architecture of MapReduce
  • Physical Architecture of MRv1 and YARN
  • Compare MRv1 vs. MRv2 on YARN
  • MapReduce API
  • Hands-on Exercise
  • Sqoop Basics
  • Sqoop Internals
  • Sqoop 1 vs Sqoop 2
  • Key Sqoop Commands
  • Hands-on Exercise
Working with Hive
  • Hive Overview and Use cases
  • How Hive Differ from Relational Databases
  • Basic Syntax in Hive
  • External and Managed Tables
  • Key Built-In functions in Hive
  • Hive vs. HiveServer2
  • Hands-on Exercise
Delving Deeper in Hive
  • Partitioning – Static and Dynamic
  • Hive UDFs
  • Hands-on Exercises
Working with HBase
  • HBase Overview
  • Physical Architectures of HBase
  • HBase Table Fundamentals
  • Thinking About Table Design
  • HBase Shell
  • HBase Physical Architecture
  • HBase Schema Design
  • HBase API
  • Hive on HBase
  • Hands-on Exercises
Introduction to Impala
  • What is Impala?
  • How Impala Differs from Hive and Pig
  • How Impala Differs from Relational Databases
  • Limitations and Future Directions
  • Using the Impala Shell
Analyzing Data with Impala
  • Basic Syntax
  • Data Types
  • Filtering, Sorting, and Limiting Results
  • Joining and Grouping Data
  • Improving Impala Performance
  • HandsOn Exercise: Interactive Analysis with Impala
Spark Overview
  • What is Spark?
  • Why Spark?
  • Data Abstraction – RDD
  • Logical Architecture of Spark
  • Programming Languages in Spark
  • Functional Programming with Spark
  • Hands-on Exercise
Interactive Data Exploration with Spark
  • Key RDD API Operations
  • Pair RDDs
  • MapReduce Operations
  • Spark on YARN
  • YARN client vs YARN cluster modes
  • Hands-on Exercise
Spark SQL
  • Introduction
  • Dataframe API
  • Performing ad-hoc query analysis using Spark SQL
  • Working with Hive Partitioning
  • Hands-on Exercises
Spark Streaming
  • Analyzing streaming data using Spark
  • Stateless Streaming
  • Stateful Streaming
  • Hands-on Exercises
Data Ingestion using Kafka
  • Kafka Overview
  • Kafka Architecture
  • Kafka Producer Consumer API
  • Flume vs Kafka
  • Hands-on Exercise
Kafka Integrations
  • End to End Data Pipeline using Kafka
  • Kafka Connect
  • Integrate Kafka with Spark
  • Hands-on Exercises

Participants should preferably have prior Software development experience along with basic knowledge of SQL and Unix commands. Knowledge of Python/Scala would be a plus.


Our course begins with the first step for generating great user experiences: understanding what people do, think, say, and feel. In this module, you’ll learn how to keep an open mind while learning.

Course Information


4 Days / 5 Days

Mode of Delivery

Instructor led/Virtual

Reach out to us..Our representative will got back to you!

Fill up the form to download the course PDF

Your Name (required)

Your Email (required)

Phone (required)

Need Help? Chat with us
Please accept our privacy policy first to start a conversation.