Big Data Ingestion and Processing

Ingest, Store, Process and Analyze data using Real time Big Data Pipelines

Duration

4 Days

Level

Intermediate Level

Design and Tailor this course

As per your team needs

Edit Content

Big Data Ingestion involves connecting to various data sources, extracting the data, and detecting the changed data. or we can say that data Ingestion means taking data coming from multiple sources and putting it somewhere it can be accessed. In Big Data processing system the collected last layer processing data or collected data to be processed and classify the data flow.

The program is focussed on Data Ingestion and Processing using Sqoop, Flume, Kafka and Spark streaming. This program covers
Flume and Sqoop Fundamentals
Architectures of Flume, Sqoop
Kafka Fundamentals, Architecture, API, Kafka Connect, Kafka Streams, Spark Micro-batch processing and Structured Streaming Processing
Hands-on exercises related to Kafka APIs will be in Java/Scala and Scala language will be used for Spark related exercises

Edit Content

Introduction to Ingestion

Data Ingestion Overview
Key Ingestion Frameworks
Key Business Use cases
Typical Big Data Project Pipeline

Sqoop

Sqoop Basics
Sqoop Internals
Sqoop 1 vs Sqoop 2
Key Sqoop Commands
Hands-on Exercise

Ingesting Data with Flume

Flume Overview
Physical Architectures of Flume
Source, Sink and Channel
Building Data Pipeline using Flume
Hands-on Exercise

Introduction to Apache Kafka

Kafka Overview
Salient Features of Kafka
Kafka Use cases
Comparing Kafka with other Key tools

Kafka Fundamentals & Internals

Logical Architecture of Kafka
Physical Architecture of Kafka
- Partitions
- Topics
- Replicas
- Producers & Consumers
- Brokers
Roles and Responsibilities of various components
Replication mechanism
Message Delivery Semantic
Key Terminologies
Key configurations settings of Brokers, Producers, Consumers etc.
Hands-on exercises

Zookeeper

Role of Zookeeper
Zookeeper Basic Operations
Apache Kafka – Zookeeper Role

Kafka Integrations

End to End Data Pipeline using Kafka
Kafka Connect
Integrate Kafka with Spark
Hands-on Exercises

Kafka API

Overview
Producer API
- Sync Producers
- Async Producers
- Message Acknowledgement
- Batching Messages
- Keyed and Non-Keyed Messages
- Compression
- Batching
Consumer API
Hands-on Exercises