Big Data Administration using Cloudera
Learn how to install, configure, maintain and monitor versatile frameworks bundled with Cloudera distribution
Duration
4 Days
Level
Intermediate Level
Design and Tailor this course
As per your team needs
Edit Content
This Big Data Administrator training course is based on Cloudera distribution. With this Admin training, the participants learn to install, configure, maintain and monitor versatile frameworks bundled with Cloudera distribution including HDFS, YARN, Sqoop, Flume, Pig, Hive, Spark, Kafka and Impala.
The program is focussed on Cloudera Hadoop Cluster Administration. Below points provide a high-level overview of the course:
- Introduction to Cloudera Hadoop Administrator using Cloudera Manager
- Understand how Cloudera production deployment can be setup
- Install, Configure, Manage, Secure, Test and Troubleshoot Hadoop Cloudera Cluster
- Manage and secure production grade Hadoop Cloudera Cluster using Kerberos and Sentry
Edit Content
The intended audience for this course:
- Big Data Administrator
- DevOps
- Big Data Architects
Edit Content
- Big Data Overview
- Key Roles in Big Data Project
- Key Business Use cases
- Hadoop and Spark Logical Architecture
- Typical Big Data Project Pipeline
- Roles in Big Data Project
- Types of Administrators
- Responsibilities of Administrator
- Why Hadoop and Spark?
- Core Hadoop Components
- Fundamental Concepts
- Logical Architecture of Hadoop and Spark
- Use Cases
- Ingesting Data from Relational Databases with Sqoop
- Flume
- Overview
- Ecosystem
- Connect API
- Integrating HDFS
- REST Interfaces
- What Is MapReduce?
- Basic MapReduce Concepts
- YARN Overview
- YARN Cluster Architecture
- YARN Concepts
- Resource Allocation
- Failure Recovery
- Using the YARN Web UI
- Why Cloudera Manager?
- Cloudera Manager Features
- Cloudera Manager Installation
- Installing CDH Using Cloudera Manager
- Performing Basic Administration Tasks Using Cloudera Manager
- Things to consider
- Choosing the Right Hardware
- Configuring Nodes
- Planning for Cluster Management
- Deployment Types
- Installing Hadoop
- Specifying the Hadoop Configuration
- Performing Initial HDFS Configuration
- Performing Initial YARN and MapReduce Configuration
- Hadoop Logging
- Hive
- Impala
- Pig
- What is a Hadoop Client?
- Installing and Configuring Hadoop Clients
- Installing and Configuring Hue
- Hue Authentication and Author
- Advanced Configuration Parameters
- Configuring Hadoop Ports
- Explicitly Including and Excluding Hosts
- Configuring HDFS for Rack Awareness
- Configuring HDFS High Availability
- Managing Running Jobs
- Scheduling Hadoop Jobs
- Configuring the Fair Scheduler
- Impala Query Scheduling
- Checking HDFS Status
- Copying Data Between Clusters
- Adding and Removing Cluster Nodes
- Rebalancing the Cluster
- Cluster Upgrading
- General System Monitoring
- Monitoring Hadoop Clusters
- Common Troubleshooting Hadoop Clusters
- Common Misconfigurations
- Basics of Security
- Hadoop’s Security System Concepts
- What Kerberos Is?
- Securing a Hadoop Cluster with Kerberos
- How does Kerberos work?
- Installation
- Configuration
- Sentry Overview
- Hands-on
Edit Content
Participants should preferably have prior Software development experience along with basic knowledge of SQL and Unix commands. Knowledge of Python/Scala would be a plus.