Big Data Ingestion and Processing

Ingest, Store, Process and Analyze data using Real time Big Data Pipelines

Duration

4 Days

Level

Intermediate Level

Design and Tailor this course

As per your team needs

Edit Content

Big Data Ingestion involves connecting to various data sources, extracting the data, and detecting the changed data. or we can say that data Ingestion means taking data coming from multiple sources and putting it somewhere it can be accessed. In Big Data processing system the collected last layer processing data or collected data to be processed and classify the data flow.

  • The program is focussed on Data Ingestion and Processing using Sqoop, Flume, Kafka and Spark streaming. This program covers
  • Flume and Sqoop Fundamentals
  • Architectures of Flume, Sqoop
  • Kafka Fundamentals, Architecture, API, Kafka Connect, Kafka Streams, Spark Micro-batch processing and Structured Streaming Processing
  • Hands-on exercises related to Kafka APIs will be in Java/Scala and Scala language will be used for Spark related exercises
Edit Content

The intended audience for this course:

  • Application Developers
  • DevOps Engineers
  • Architects
  • System Engineers
  • Technical Managers
Edit Content
  • Data Ingestion Overview
  • Key Ingestion Frameworks
  • Key Business Use cases
  • Typical Big Data Project Pipeline
  • Sqoop Basics
  • Sqoop Internals
  • Sqoop 1 vs Sqoop 2
  • Key Sqoop Commands
  • Hands-on Exercise
  • Flume Overview
  • Physical Architectures of Flume
  • Source, Sink and Channel
  • Building Data Pipeline using Flume
  • Hands-on Exercise
  • Kafka Overview
  • Salient Features of Kafka
  • Kafka Use cases
  • Comparing Kafka with other Key tools
  • Logical Architecture of Kafka
  • Physical Architecture of Kafka
    • Partitions
    • Topics
    • Replicas
    • Producers & Consumers
    • Brokers
  • Roles and Responsibilities of various components
  • Replication mechanism
  • Message Delivery Semantic
  • Key Terminologies
  • Key configurations settings of Brokers, Producers, Consumers etc.
  • Hands-on exercises
  • Role of Zookeeper
  • Zookeeper Basic Operations
  • Apache Kafka – Zookeeper Role
  • End to End Data Pipeline using Kafka
  • Kafka Connect
  • Integrate Kafka with Spark
  • Hands-on Exercises
  • Overview
  • Producer API
    • Sync Producers
    • Async Producers
    • Message Acknowledgement
    • Batching Messages
    • Keyed and Non-Keyed Messages
    • Compression
    • Batching
  • Consumer API
  • Hands-on Exercises
  • What is Spark?
  • Why Spark?
  • Data Abstraction – RDD
  • Logical Architecture of Spark
  • Programming Languages in Spark
  • Functional Programming with Spark
  • Hands-on Exercise
  • Introduction
  • Dataframe API
  • Performing ad-hoc query analysis using Spark SQL
  • Hands-on Exercises
  • Analyzing streaming data using Spark
  • Stateless Streaming
  • Hands-on: Stateless Streaming
  • Stateful Streaming
  • Hands-on: Stateful Streaming
  • Structured Streaming
  • Hands-on exercises
  • Hands-on: Integrating Kafka with Spark Streaming
  • Overview
  • What is Kafka Streams
  • Why Kafka Streams
  • Kafka Streams Architecture
  • Hands-on Exercise
Edit Content

Participants should preferably have prior Software development experience along with basic knowledge of SQL and Unix commands. Knowledge of Python/Scala would be a plus.

Connect

we'd love to have your feedback on your experience so far