Lorem ipsum dolor sit amet, conse ctetur adip elit, pellentesque turpis.

  • No products in the cart.

Image Alt

Big Data Ingestion and Processing

  /    /  Big Data Ingestion and Processing

Big Data Ingestion and Processing

Categories:
Big Data
Reviews:

Big Data Ingestion involves connecting to various data sources, extracting the data, and detecting the changed data. or we can say that data Ingestion means taking data coming from multiple sources and putting it somewhere it can be accessed. In Big Data processing system the collected last layer processing data or collected data to be processed and classify the data flow

The program is focussed on Data Ingestion and Processing using Sqoop, Flume, Kafka and Spark streaming. This program covers

    • Flume and Sqoop Fundamentals
    • Architectures of Flume, Sqoop
    • Kafka Fundamentals, Architecture, API, Kafka Connect, Kafka Streams,
    • Spark Micro-batch processing and Structured Streaming Processing.
    • Hands-on exercises related to Kafka APIs will be in Java/Scala and Scala language will be used for Spark related exercises.

The intended audience for this course:

  • Application Developers
  • DevOps Engineers
  • Architects
  • System Engineers
  • Technical Managers
Introduction to Ingestion
  • Data Ingestion Overview
  • Key Ingestion Frameworks
  • Key Business Use cases
  • Typical Big Data Project Pipeline
Sqoop
  • Sqoop Basics
  • Sqoop Internals
  • Sqoop 1 vs Sqoop 2
  • Key Sqoop Commands
  • Hands-on Exercise
Ingesting Data with Flume
  • Flume Overview
  • Physical Architectures of Flume
  • Source, Sink and Channel
  • Building Data Pipeline using Flume
  • Hands-on Exercise
Introduction to Apache Kafka
  • Kafka Overview
  • Salient Features of Kafka
  • Kafka Use cases
  • Comparing Kafka with other Key tools
Kafka Fundamentals & Internals
  • Logical Architecture of Kafka
  • Physical Architecture of Kafka
    • Partitions
    • Topics
    • Replicas
    • Producers & Consumers
    • Brokers
  • Roles and Responsibilities of various components
  • Replication mechanism
  • Message Delivery Semantic
  • Key Terminologies
  • Key configurations settings of Brokers, Producers, Consumers etc.
  • Hands-on exercises
Zookeeper
  • Role of Zookeeper
  • Zookeeper Basic Operations
  • Apache Kafka – Zookeeper Role
Kafka Integrations
  • End to End Data Pipeline using Kafka
  • Kafka Connect
  • Integrate Kafka with Spark
  • Hands-on Exercises
Kafka API
  • Overview
  • Producer API
    • Sync Producers
    • Async Producers
    • Message Acknowledgement
    • Batching Messages
    • Keyed and Non-Keyed Messages
    • Compression
    • Batching
  • Consumer API
  • Hands-on Exercises
Spark Overview
  • What is Spark?
  • Why Spark?
  • Data Abstraction – RDD
  • Logical Architecture of Spark
  • Programming Languages in Spark
  • Functional Programming with Spark
  • Hands-on Exercise
Spark SQL
  • Introduction
  • Dataframe API
  • Performing ad-hoc query analysis using Spark SQL
  • Hands-on Exercises
Spark Streaming
  • Analyzing streaming data using Spark
  • Stateless Streaming
  • Hands-on: Stateless Streaming
  • Stateful Streaming
  • Hands-on: Stateful Streaming
  • Structured Streaming
  • Hands-on exercises
  • Hands-on: Integrating Kafka with Spark Streaming
Kafka Streams
  • Overview
  • What is Kafka Streams
  • Why Kafka Streams
  • Kafka Streams Architecture
  • Hands-on Exercise

Participants should preferably have prior Software development experience along with basic knowledge of SQL and Unix commands. Knowledge of Python/Scala would be a plus.

Reviews

Our course begins with the first step for generating great user experiences: understanding what people do, think, say, and feel. In this module, you’ll learn how to keep an open mind while learning.

Course Information

Duration

4 Days

Mode of Delivery

Instructor led/Virtual

Reach out to us..Our representative will get back to you!



Fill up the form to download the course PDF

Your Name (required)

Your Email (required)

Phone (required)

Post a Comment