Data Analyst
Learn the optimized Analysis technique using Hive and Impala on Big Data Cloudera Ecosystem.
Duration
4 Days
Level
Intermediate Level
Design and Tailor this course
As per your team needs
Edit Content
This course is designed and developed for providing exposure to perform Data Analysis using Hive and Impala. The course initially focuses on holistic overview of Big Data Cloudera Ecosystem and then it delves into Hive and Impala Architectures, Query Languages and Optimization Techniques. Towards the end of the course, the participants will understand how to prepare for CCA-159 Cloudera Certification.
Edit Content
This program is designed for those who aspire for Data Analyst role or who would like to clear Cloudera Data Analyst Certification :
- Business Analysts
- Data Engineers
- Data Scientists
- Machine Learning Engineers
- Data Integration Engineers
- Data Architects
Edit Content
- Big Data Overview
- Key Roles in Big Data Project
- Key Business Use cases
- Hadoop and Spark Logical Architecture
- Typical Big Data Project Pipeline
- HDFS Overview
- Why HDFS?
- High-level architecture
- HDFS Commands
- Working with HUE
- The Hadoop Distributed File System Hands-on
- What Is Hive?
- Hive Schema and Data Storage
- Comparing Hive to Traditional Databases
- Hive vs. Pig
- Hive Use Cases
- Interacting with Hive
- Hive Databases and Tables
- Basic HiveQL Syntax
- Data Types
- Joining Data Sets
- Windowing
- Common Builtin Functions
- HandsOn Exercise: Running Hive Queries on the Shell, Scripts, and Hue
- Hive Data Formats
- Creating Databases and Hive-Managed Tables
- Loading Data into Hive
- Altering Databases and Tables
- Self-Managed Tables
- Simplifying Queries with Views
- Storing Query Results
- Controlling Access to Data
- Hands-On Exercise: Data Management with Hive
- Overview of Text Processing
- Important String Functions
- Using Regular Expressions in Hive
- Sentiment Analysis and NGrams
- HandsOn Exercise (Optional): Gaining Insight with Sentiment Analysis
- Understanding Query Performance
- Controlling Job Execution Plan
- Partitioning
- Bucketing
- Indexing Data
- SerDes
- Data Transformation with Custom Scripts
- User-Defined Functions
- Parameterized Queries
- HandsOn Exercise: Data Transformation with Hive
- What is Impala?
- How Impala Differs from Hive and Pig
- How Impala Differs from Relational Databases
- Limitations and Future Directions
- Using the Impala Shell
- Basic Syntax
- Data Types
- Filtering, Sorting, and Limiting Results
- Joining and Grouping Data
- Improving Impala Performance
- HandsOn Exercise: Interactive Analysis with Impala
- Objectives
- Things to Take care
- Practice Questions
Edit Content
Participants should have basic knowledge of SQL and Basic Unix/Linux commands