Connect with us:

Join us for a FREE hands-on Meetup webinar on Mastering Snowflake Certifications: Associate & SnowPro Core Readiness | Friday, July 18th, 2025 · 6:00 PM IST/ 08:30 AM ET Join us for a FREE hands-on Meetup webinar on Mastering Snowflake Certifications: Associate & SnowPro Core Readiness | Friday, July 18th, 2025 · 6:00 PM IST/ 08:30 AM ET

Big Data for Architects

This Program is Focus on Key Architectures and Pipelines in Big Data Ecosystem

Duration

3 Days

Level

Intermediate Level

Design and Tailor this course

As per your team needs

Edit Content

Edit Content

Edit Content

Holistic View, Architectures and Pipelines

Evolution of Big Data Technologies
Big Data Technologies Landscape in Hortonworks Stack
Key Big Data Architectures
Deployment Architecture of Data Lake
Typical Big Data Batch Pipeline
More Examples of Big Data Batch Pipeline
Typical Big Data Streaming Pipeline
More Examples of Streaming Pipeline

Key Ingestion/DataFlow Frameworks

Factors to consider while comparing Ingestion frameworks
Loading data into Data Lake
Sqoop Internals
Loading data using Sqoop
Sqoop vs Kafka Connect
High Level Introduction to NiFi
Hands-on Confluent Kafka Installation

Key Storage Frameworks

Interoperability
Text vs Binary
Row oriented vs Column oriented
Splittable
Schema Evolution
Avro Data Format
ORC Data Format
Comparing Data Formats – which one to choose when?
Hands-on Big Data Batch Pipeline Use Avro format

Key Distributed OS & Data Processing Frameworks

Factors to consider while comparing Processing frameworks
Introduction to YARN
YARN Architecture
YARN Internals
How to troubleshoot issues on cluster
Things to consider for performance tuning
Spark vs Tez
MR vs Spark
MR vs Spark Logical Architecture Perspective
MR vs Spark Performance Perspective
Why Spark?
Spark Physical Architecture
Spark Internals
Spark Optimizations
Things to consider when implementing Spark on YARN
Kafka Stream vs Spark Streaming
Spark Core vs Spark SQL
Spark Execution Modes: YARN Client vs YARN Cluster
Spark 2.x Streaming vs Spark 1.x Streaming
KStreams vs Spark
Hands-on Spark on YARN
Hands-on Kafka & Spark Streaming Integration

Key Data Analysis Frameworks

Factors to consider while comparing Storage frameworks
Why Hive?
Hive Architecture
Hive LLAP Architecture
Spark Sql vs Hive
KSQL vs Hive
Hands-on exercises for Spark SQL and Hive

Building End to End Pipeline

Hands-on Big Data Batch Pipeline Use Avro format
Hands-on Kafka, NiFi, HBase & Hive Integration
Implementing Change Data Capture using Kafka
Building ETL pipeline

Hortonworks Security

Introduction to Hortonworks Security
Security Aspects: Key things to consider
Securing the ecosystem
Discussion on various tools/frameworks/technologies related to security in below areas
- Authentication – Kerberos
- Authorization – Ranger (also masking)
- Auditing
- Encryption – Data at Rest and Data in Motion

Edit Content

Stay ahead with DataCouch! Your partner in mastering the latest advancements in AI, Data Science, DevOps, and more.

Quick Links

our Offerings

Get in touch

Sign up for DataCouch Communications

Copyright 2025 © DataCouch