Managing Big Data using Hadoop and Spark

Learn ingestion, storage, processing and analysis of Big data using Hadoop and Spark Ecosystem

Duration

4 Days

Level

Intermediate Level

Design and Tailor this course

As per your team needs

Edit Content

The program is focussed on ingestion, storage, processing and analysis of Big data using Hadoop and Spark Ecosystem i.e. HDFS, MapReduce, YARN, Sqoop, Flume, Hive, Spark Core, Pig, Impala, HBase and Kafka.

Holistic Overview of Hadoop and Spark Ecosystem
Distributed Storage and Processing Concepts
Which technology/tool to choose when?
Architecture and Internals of key projects
How to perform data processing/analysis using Spark, Pig and Hive?

Edit Content

Introduction to Hadoop and Spark Ecosystem

Big Data Overview
Key Roles in Big Data Project
Key Business Use cases
Hadoop and Spark Logical Architecture
Typical Big Data Project Pipeline

Basic Concepts of HDFS

HDFS Overview
Physical Architectures of HDFS
The Hadoop Distributed File System Hands-on

MapReduce v1/YARN Frameworks, Architectures and MapReduce API

Logical Architecture of MapReduce
Logical Architecture of YARN
High-level Architecture of MRv1 and YARN
Compare MRv1 vs. MRv2 on YARN
Hands-on Exercise

Sqoop

Sqoop Basics
Sqoop Internals
Sqoop 1 vs Sqoop 2
Key Sqoop Commands
Hands-on Exercise

Ingesting Data with Flume

Flume Overview
Physical Architectures of Flume
Source, Sink and Channel
Building Data Pipeline using Flume
Hands-on Exercise

Working with Hive

Hive Overview and Use cases
How Hive Differ from Relational Databases
Basic Syntax in Hive
External and Managed Tables
Key Built-In functions in Hive
Hive vs. HiveServer2
Hands-on Exercise

Delving Deeper in Hive

Partitioning – Static and Dynamic
Hive UDFs
Hands-on Exercises

Working with HBase

HBase Overview
Physical Architectures of HBase
HBase Table Fundamentals
Thinking About Table Design
HBase Shell
HBase Physical Architecture
HBase Schema Design
HBase API
Hive on HBase
Hands-on Exercises

Introduction to Pig

What Is Pig?
Pig’s Features
Pig Use Cases
Interacting with Pig

Basic Data Analysis with Pig

Pig Latin Syntax
Loading Data
Simple Data Types
Field Definitions
Data Output
Viewing the Schema
Filtering and Sorting Data
Commonly-Used Functions
HandsOn Exercise: Using Pig for ETL Processing

Introduction to Impala

What is Impala?
How Impala Differs from Hive and Pig
How Impala Differs from Relational Databases
Limitations and Future Directions
Using the Impala Shell

Analyzing Data with Impala

Basic Syntax
Data Types
Filtering, Sorting, and Limiting Results
Joining and Grouping Data
Improving Impala Performance
HandsOn Exercise: Interactive Analysis with Impala

Spark Overview

What is Spark?
Why Spark?
Data Abstraction – RDD
Logical Architecture of Spark
Programming Languages in Spark
Functional Programming with Spark
Hands-on Exercise

Interactive Data Exploration with Spark

Key RDD API Operations
Pair RDDs
MapReduce Operations
Join of Sqoop and Flume data
Spark on YARN
YARN client vs YARN cluster modes
Hands-on Exercise

Data Ingestion using Kafka

Kafka Overview
Kafka Architecture
Kafka Producer Consumer API
Flume vs Kafka
Hands-on Exercise

Edit Content

FIND YOUR COURSE

Topics

Brands

Managing Big Data using Hadoop and Spark

Duration

Level

Design and Tailor this course

Quick Links

our Offerings

Get in touch

Sign up for DataCouch Communications

Connect

we'd love to have your feedback on your experience so far