Lorem ipsum dolor sit amet, conse ctetur adip elit, pellentesque turpis.

  • No products in the cart.

Image Alt

Hadoop and Spark Training for Analysts

  /    /  Hadoop and Spark Training for Analysts

Hadoop and Spark Training for Analysts

Big Data

The program is focussed on processing Big data using HDFS, HBase, Impala, Hive, Kudu and Spark SQL. Below points provide a high-level overview of the course:

  • Learn how to work with HDFS
  • Understand the purpose of HBase and use HBase with other ecosystem projects
  • Create tables, insert, read and delete data from HBase
  • Get an all-round understanding of Kudu and its role in the Hadoop ecosystem
  • Perform Interactive analysis using Hive and Impala
  • Understand the role of Spark
  • Use Spark SQL for analyzing data at scale

This program is designed for the below roles:

  • Big Data Analysts
  • Big Data Engineers
  • Big Data Scientists
  • Big Data Developers
Introduction to Hadoop and Spark Ecosystem
  • Big Data Overview
  • Key Roles in Big Data Project
  • Key Business Use cases
  • Hadoop and Spark Logical Architecture
  • Typical Big Data Project Pipeline
Basic Concepts of HDFS
  • HDFS Overview
  • Why HDFS?
  • High-level architecture
  • HDFS Commands
  • Working with HUE
  • The Hadoop Distributed File System Hands-on
MapReduce v1/YARN Frameworks and Architectures
  • Logical Architecture of MapReduce
  • High-level Architecture of MRv1 and YARN
  • Compare MRv1 vs. MRv2 on YARN
  • Hands-on Exercise
HBase Introduction
  • What Is HBase?
  • Why Use HBase?
  • Strengths of HBase
  • HBase in Production
  • Weaknesses of HBase
  • Comparison of HBase with other products
  • HBase Vs. RDBMS
HBase Tables
  • HBase Concepts
  • HBase Table Fundamentals
  • Thinking About Table Design
The HBase Shell
  • Creating Tables with the HBase Shell
  • Working with Tables
  • Working with Table Data
Introduction to Impala and Kudu
  • What is Impala?
  • How Impala Differs from Hive and Pig
  • How Impala Differs from Relational Databases
  • Limitations and Future Directions
  • Using the Impala Shell
  • Why Kudu
  • Kudu Architecture
  • Kudu Use cases
  • Comparing Kudu with other frameworks
  • Impala with Kudu
Using Hive and Impala with HBase
  • Using Hive and Impala with HBase
Introduction to Hive
  • What Is Hive?
  • Hive Schema and Data Storage
  • Comparing Hive to Traditional Databases
  • Hive vs. Pig
  • Hive Use Cases
  • Interacting with Hive
Relational Data Analysis with Hive
  • Hive Databases and Tables
  • Basic HiveQL Syntax
  • Data Types
  • Joining Data Sets
  • Common Built-in Functions
  • HandsOn Exercise: Running Hive Queries on the Shell, Scripts, and Hue
Hive Data Management
  • Hive Data Formats
  • Creating Databases and Hive-Managed Tables
  • Loading Data into Hive
  • Altering Databases and Tables
  • Self-Managed Tables
  • Simplifying Queries with Views
  • Storing Query Results
  • Controlling Access to Data
  • HandsOn Exercise: Data Management with Hive
Hive Optimization
  • Understanding Query Performance
  • Controlling Job Execution Plan
  • Partitioning
  • Bucketing
  • Indexing Data
Extending Hive
  • SerDes
  • Data Transformation with Custom Scripts
  • User-Defined Functions
  • Parameterized Queries
  • HandsOn Exercise: Data Transformation with Hive
Spark Overview
  • What is Spark?
  • Why Spark?
  • Data Abstraction – RDD
  • Logical Architecture of Spark
  • Programming Languages in Spark
  • Functional Programming with Spark
  • Hands-on Exercise
Developing Spark Standalone Applications
  • Spark Applications vs. Spark Shell
  • Developing a Spark Application
  • Hands-on Exercise
Spark SQL
  • Introduction
  • Why Spark SQL?
  • Working with Zeppelin Notebook
  • Role of Catalyst Optimizer
  • Hive and Spark Integration
  • Dataframe API
  • DataSet API
  • Joins
  • Performing ad-hoc query analysis using Spark SQL
  • Hands-on on Dataframe API
  • Hands-on on Using Avro and Parquet with Dataframe API
  • Hands-on: Integrating Hive with Spark SQL
  • Hands-on: Hive Partitioned Using Spark
  • Hands-on: Snappy Compression
  • Hands-on: SQL functions
Advanced Features of Spark
  • Persistence
  • Hands-on: Persistence
  • Coalesce
  • Accumulators
  • Broadcasting
  • Hands-on: Broadcasting
  • Other optimization techniques

This is a hands-on training therefore it is advisable to have some basic knowledge of Hadoop like Hive queries, HDFS commands etc.

Course Information


4 Days / 5 Days

Mode of Delivery

Instructor led/Virtual



Have more queries?Our representative will got back to you!

Fill up the form to download the course PDF

Your Name (required)

Your Email (required)

Phone (required)

Post a Comment

Need Help? Chat with us
Please accept our privacy policy first to start a conversation.