Join us for a FREE hands-on Meetup webinar on Mastering Prompt Engineering with Azure OpenAI Service (AI102) | Fri, Aug 09 · 5:00 PM IST/ 7:30 AM EST Join us for a FREE hands-on Meetup webinar on Mastering Prompt Engineering with Azure OpenAI Service (AI102) | Fri, Aug 09 · 5:00 PM IST/ 7:30 AM EST

Data Analyst

Learn the optimized Analysis technique using Hive and Impala on Big Data Cloudera Ecosystem.

Duration

4 Days

Level

Intermediate Level

Design and Tailor this course

As per your team needs

Edit Content

This course is designed and developed for providing exposure to perform Data Analysis using Hive and Impala. The course initially focuses on holistic overview of Big Data Cloudera Ecosystem and then it delves into Hive and Impala Architectures, Query Languages and Optimization Techniques. Towards the end of the course, the participants will understand how to prepare for CCA-159 Cloudera Certification.

Edit Content

This program is designed for those who aspire for Data Analyst role or who would like to clear Cloudera Data Analyst Certification :

Business Analysts
Data Engineers
Data Scientists
Machine Learning Engineers
Data Integration Engineers
Data Architects

Edit Content

Introduction to Hadoop and Spark Ecosystem

Big Data Overview
Key Roles in Big Data Project
Key Business Use cases
Hadoop and Spark Logical Architecture
Typical Big Data Project Pipeline

Basic Concepts of HDFS

HDFS Overview
Why HDFS?
High-level architecture
HDFS Commands
Working with HUE
The Hadoop Distributed File System Hands-on

MapReduce v1/YARN Frameworks and Architectures

What Is Hive?
Hive Schema and Data Storage
Comparing Hive to Traditional Databases
Hive vs. Pig
Hive Use Cases
Interacting with Hive

Introduction to Hive

Relational Data Analysis with Hive

Hive Databases and Tables
Basic HiveQL Syntax
Data Types
Joining Data Sets
Windowing
Common Builtin Functions
HandsOn Exercise: Running Hive Queries on the Shell, Scripts, and Hue

Hive Data Management

Hive Data Formats
Creating Databases and Hive-Managed Tables
Loading Data into Hive
Altering Databases and Tables
Self-Managed Tables
Simplifying Queries with Views
Storing Query Results
Controlling Access to Data
Hands-On Exercise: Data Management with Hive

Text Processing with Hive

Overview of Text Processing
Important String Functions
Using Regular Expressions in Hive
Sentiment Analysis and NGrams
HandsOn Exercise (Optional): Gaining Insight with Sentiment Analysis

Hive Optimization

Understanding Query Performance
Controlling Job Execution Plan
Partitioning
Bucketing
Indexing Data

SerDes
Data Transformation with Custom Scripts
User-Defined Functions
Parameterized Queries
HandsOn Exercise: Data Transformation with Hive

Introduction to Impala

What is Impala?
How Impala Differs from Hive and Pig
How Impala Differs from Relational Databases
Limitations and Future Directions
Using the Impala Shell

Analyzing Data with Impala

Basic Syntax
Data Types
Filtering, Sorting, and Limiting Results
Joining and Grouping Data
Improving Impala Performance
HandsOn Exercise: Interactive Analysis with Impala

Cloudera Data Analyst Certification

Objectives
Things to Take care
Practice Questions

Edit Content

Participants should have basic knowledge of SQL and Basic Unix/Linux commands

Stay ahead with DataCouch! Your partner in mastering the latest advancements in AI, Data Science, DevOps, and more.

Quick Links

our Offerings

Get in touch

Sign up for DataCouch Communications

Copyright 2024 © DataCouch