Big Data Administration using Cloudera

Learn how to install, configure, maintain and monitor versatile frameworks bundled with Cloudera distribution

Duration

4 Days

Level

Intermediate Level

Design and Tailor this course

As per your team needs

Edit Content

This Big Data Administrator training course is based on Cloudera distribution. With this Admin training, the participants learn to install, configure, maintain and monitor versatile frameworks bundled with Cloudera distribution including HDFS, YARN, Sqoop, Flume, Pig, Hive, Spark, Kafka and Impala.

The program is focussed on Cloudera Hadoop Cluster Administration. Below points provide a high-level overview of the course:

Introduction to Cloudera Hadoop Administrator using Cloudera Manager
Understand how Cloudera production deployment can be setup
Install, Configure, Manage, Secure, Test and Troubleshoot Hadoop Cloudera Cluster
Manage and secure production grade Hadoop Cloudera Cluster using Kerberos and Sentry

Edit Content

The intended audience for this course:

Big Data Administrator
DevOps
Big Data Architects

Edit Content

Introduction to Hadoop and Spark Ecosystem

Big Data Overview
Key Roles in Big Data Project
Key Business Use cases
Hadoop and Spark Logical Architecture
Typical Big Data Project Pipeline

Introduction

Roles in Big Data Project
Types of Administrators
Responsibilities of Administrator
Why Hadoop and Spark?
Core Hadoop Components
Fundamental Concepts
Logical Architecture of Hadoop and Spark
Use Cases

HDFS

Ingesting Data from Relational Databases with Sqoop
Flume

Data Acquisition

Kafka

Overview
Ecosystem
Connect API
Integrating HDFS
REST Interfaces

YARN and MapReduce

What Is MapReduce?
Basic MapReduce Concepts
YARN Overview
YARN Cluster Architecture
YARN Concepts
Resource Allocation
Failure Recovery
Using the YARN Web UI

Cloudera Manager

Why Cloudera Manager?
Cloudera Manager Features
Cloudera Manager Installation
Installing CDH Using Cloudera Manager
Performing Basic Administration Tasks Using Cloudera Manager

Capacity Planning

Things to consider
Choosing the Right Hardware
Configuring Nodes
Planning for Cluster Management

Hadoop Installation and Initial Configuration

Deployment Types
Installing Hadoop
Specifying the Hadoop Configuration
Performing Initial HDFS Configuration
Performing Initial YARN and MapReduce Configuration
Hadoop Logging

Installing and Configuring Hive, Impala, and Pig

Hive
Impala
Pig

Hadoop Clients

What is a Hadoop Client?
Installing and Configuring Hadoop Clients
Installing and Configuring Hue
Hue Authentication and Author

Advanced Cluster Configuration

Advanced Configuration Parameters
Configuring Hadoop Ports
Explicitly Including and Excluding Hosts
Configuring HDFS for Rack Awareness
Configuring HDFS High Availability

Managing and Scheduling Jobs

Managing Running Jobs
Scheduling Hadoop Jobs
Configuring the Fair Scheduler
Impala Query Scheduling

Cluster Maintenance

Checking HDFS Status
Copying Data Between Clusters
Adding and Removing Cluster Nodes
Rebalancing the Cluster
Cluster Upgrading

Cluster Monitoring and Troubleshooting

General System Monitoring
Monitoring Hadoop Clusters
Common Troubleshooting Hadoop Clusters
Common Misconfigurations

Hadoop Security Overview

Basics of Security
Hadoop’s Security System Concepts
What Kerberos Is?
Securing a Hadoop Cluster with Kerberos
How does Kerberos work?
Installation
Configuration
Sentry Overview
Hands-on

Edit Content

Participants should preferably have prior Software development experience along with basic knowledge of SQL and Unix commands. Knowledge of Python/Scala would be a plus.

FIND YOUR COURSE

Topics

Brands

Big Data Administration using Cloudera

Duration

Level

Design and Tailor this course

Quick Links

our Offerings

Get in touch

Sign up for DataCouch Communications

Connect

we'd love to have your feedback on your experience so far