Lorem ipsum dolor sit amet, conse ctetur adip elit, pellentesque turpis.

  • No products in the cart.

Image Alt

Hadoop and Spark Training for Analysts

  /    /  Hadoop and Spark Training for Analysts

Hadoop and Spark Training for Analysts

Categories:
Big Data
Reviews:

The program is focussed on processing Big data using HDFS, HBase, Impala, Hive, Kudu and Spark SQL. Below points provide a high-level overview of the course:

  • Learn how to work with HDFS
  • Understand the purpose of HBase and use HBase with other ecosystem projects
  • Create tables, insert, read and delete data from HBase
  • Get an all-round understanding of Kudu and its role in the Hadoop ecosystem
  • Perform Interactive analysis using Hive and Impala
  • Understand the role of Spark
  • Use Spark SQL for analyzing data at scale

This program is designed for the below roles:

  • Big Data Analysts
  • Big Data Engineers
  • Big Data Scientists
  • Big Data Developers
Introduction to Hadoop and Spark Ecosystem
  • Big Data Overview
  • Key Roles in Big Data Project
  • Key Business Use cases
  • Hadoop and Spark Logical Architecture
  • Typical Big Data Project Pipeline
Basic Concepts of HDFS
  • HDFS Overview
  • Why HDFS?
  • High-level architecture
  • HDFS Commands
  • Working with HUE
  • The Hadoop Distributed File System Hands-on
MapReduce v1/YARN Frameworks and Architectures
  • Logical Architecture of MapReduce
  • High-level Architecture of MRv1 and YARN
  • Compare MRv1 vs. MRv2 on YARN
  • Hands-on Exercise
HBase Introduction
  • What Is HBase?
  • Why Use HBase?
  • Strengths of HBase
  • HBase in Production
  • Weaknesses of HBase
  • Comparison of HBase with other products
  • HBase Vs. RDBMS
HBase Tables
  • HBase Concepts
  • HBase Table Fundamentals
  • Thinking About Table Design
The HBase Shell
  • Creating Tables with the HBase Shell
  • Working with Tables
  • Working with Table Data
Introduction to Impala and Kudu
  • What is Impala?
  • How Impala Differs from Hive and Pig
  • How Impala Differs from Relational Databases
  • Limitations and Future Directions
  • Using the Impala Shell
  • Why Kudu
  • Kudu Architecture
  • Kudu Use cases
  • Comparing Kudu with other frameworks
  • Impala with Kudu
Using Hive and Impala with HBase
  • Using Hive and Impala with HBase
Introduction to Hive
  • What Is Hive?
  • Hive Schema and Data Storage
  • Comparing Hive to Traditional Databases
  • Hive vs. Pig
  • Hive Use Cases
  • Interacting with Hive
Relational Data Analysis with Hive
  • Hive Databases and Tables
  • Basic HiveQL Syntax
  • Data Types
  • Joining Data Sets
  • Common Built-in Functions
  • HandsOn Exercise: Running Hive Queries on the Shell, Scripts, and Hue
Hive Data Management
  • Hive Data Formats
  • Creating Databases and Hive-Managed Tables
  • Loading Data into Hive
  • Altering Databases and Tables
  • Self-Managed Tables
  • Simplifying Queries with Views
  • Storing Query Results
  • Controlling Access to Data
  • HandsOn Exercise: Data Management with Hive
Hive Optimization
  • Understanding Query Performance
  • Controlling Job Execution Plan
  • Partitioning
  • Bucketing
  • Indexing Data
Extending Hive
  • SerDes
  • Data Transformation with Custom Scripts
  • User-Defined Functions
  • Parameterized Queries
  • HandsOn Exercise: Data Transformation with Hive
Spark Overview
  • What is Spark?
  • Why Spark?
  • Data Abstraction – RDD
  • Logical Architecture of Spark
  • Programming Languages in Spark
  • Functional Programming with Spark
  • Hands-on Exercise
Developing Spark Standalone Applications
  • Spark Applications vs. Spark Shell
  • Developing a Spark Application
  • Hands-on Exercise
Spark SQL
  • Introduction
  • Why Spark SQL?
  • Working with Zeppelin Notebook
  • Role of Catalyst Optimizer
  • Hive and Spark Integration
  • Dataframe API
  • DataSet API
  • Joins
  • Performing ad-hoc query analysis using Spark SQL
  • Hands-on on Dataframe API
  • Hands-on on Using Avro and Parquet with Dataframe API
  • Hands-on: Integrating Hive with Spark SQL
  • Hands-on: Hive Partitioned Using Spark
  • Hands-on: Snappy Compression
  • Hands-on: SQL functions
Advanced Features of Spark
  • Persistence
  • Hands-on: Persistence
  • Coalesce
  • Accumulators
  • Broadcasting
  • Hands-on: Broadcasting
  • Other optimization techniques

This is a hands-on training therefore it is advisable to have some basic knowledge of Hadoop like Hive queries, HDFS commands etc.

Course Information

Duration

4 Days / 5 Days

Mode of Delivery

Instructor led/Virtual

Level

Intermediate

Have more queries?Our representative will got back to you!



Fill up the form to download the course PDF

Your Name (required)

Your Email (required)

Phone (required)

Post a Comment