Data Engineering Bootcamp – End-to-End Modern Data Platform

Comprehensive 15-Day Intensive Covering Databases, Neo4j, Kafka, Spark, Databricks, Iceberg, Lakehouse Architecture, and Power BI

Duration

15 Days

Level

Advanced Level

Design and Tailor this course

As per your team needs

Overview

This 15-day Data Engineering Bootcamp provides a complete end-to-end learning journey across the entire Data Engineering track — from core data fundamentals and architecture principles to distributed systems, streaming, lakehouse implementation, and business intelligence.

Participants will master relational and NoSQL databases, graph databases (Neo4j), event streaming with Kafka, distributed processing with Spark and Databricks, table formats with Iceberg, medallion/lakehouse architecture, governance, orchestration, and Power BI integration.

The program follows a progressive structure (Basic → Intermediate → Advanced) and includes a capstone building a production-grade modern data platform.

Audience

Aspiring Data Engineers
Software Engineers transitioning to DE
Data Analysts moving to Engineering
Platform & Cloud Engineers
BI Developers
Analytics Engineers

Prerequisites

Basic programming knowledge (Python preferred)
Basic SQL understanding
Familiarity with data concepts (tables, files, schemas)
No formal prerequisites required

Curriculum

Foundations of Data & Data Engineering

What is Data Engineering?
- Role of Data Engineer
- DE vs DA vs DS
- Modern data ecosystem overview
- Batch vs streaming systems=
Types of Data
- Structured vs semi-structured vs unstructured
- OLTP vs OLAP workloads
- Transactional vs analytical systems
Data Modeling Fundamentals
- ER modeling
- Normalization & denormalization
- Star & snowflake schemas
Hands-on
- Design ER diagram
- Create normalized schema

Relational Databases & SQL Deep Dive

RDBMS Architecture
- ACID properties
- Indexing strategies
- Query execution plans
Advanced SQL
- Joins & window functions
- CTEs & subqueries
- Performance tuning
Hands-on
- Build optimized schema
- Analyze execution plans
- Tune slow queries

NoSQL & Graph Databases (Neo4j)

NoSQL Landscape
- Key-value
- Document databases
- Column-family stores
Graph Databases – Neo4j
- Graph modeling principles
- Nodes & relationships
- Cypher query language
- Use cases (fraud detection, recommendation)
Hands-on
- Design graph model
- Write Cypher queries
- Build recommendation example

Data Architecture & System Design

Traditional vs Modern Architectures
- Monolithic vs distributed systems
- Lambda & Kappa architectures
- Event-driven architecture
Lakehouse Architecture
- Data lake vs warehouse
- Medallion architecture
- Separation of compute & storage
Hands-on
- Design enterprise data architecture
- Identify bottlenecks & trade-offs

Apache Kafka Fundamentals

Kafka Architecture
- Brokers, partitions, replication
- Producers & consumers
- Offsets & consumer groups
Event Streaming Concepts
- Exactly-once semantics
- Log compaction
- Event-driven systems
Hands-on
- Setup Kafka
- Create topics
- Produce & consume events

Advanced Kafka & Streaming Pipelines

Schema Registry & Serialization
- Avro / JSON / Protobuf
- Schema evolution
Kafka Connect
- Source connectors
- Sink connectors
- CDC pipelines
Hands-on
- Build CDC ingestion pipeline
- Monitor consumer lag

Apache Spark Fundamentals

Spark Architecture
- Driver & executors
- DAG execution
- Lazy evaluation
RDDs & DataFrames
- Transformations & actions
- Partitioning strategies
Hands-on
- Build distributed processing job
- Analyze Spark UI

Spark SQL & Advanced Transformations

Catalyst Optimizer
- Logical vs physical plans
- Predicate pushdown
Performance Optimization
- Shuffle tuning
- Broadcast joins
- Caching strategies
Hands-on
- Optimize large joins
- Benchmark performance

Spark Structured Streaming

Streaming Concepts
- Event time vs processing time
- Watermarking
- Stateful aggregations
Spark + Kafka Integration
- Stream ingestion
- Exactly-once guarantees
Hands-on
- Build real-time streaming pipeline
- Handle late-arriving data

Databricks & Lakehouse Implementation

Databricks Architecture
- Workspaces & clusters
- Jobs vs interactive clusters
- Autoscaling
Medallion Architecture
- Bronze layer
- Silver layer
- Gold layer
Hands-on
- Implement layered lakehouse
- Build incremental transformations

Apache Iceberg Deep Dive

Iceberg Internals
- Table metadata
- Snapshots & time travel
- Schema evolution
- Partition evolution
Iceberg + Spark Integration
- Incremental reads
- Streaming writes
- Compaction
Hands-on
- Create Iceberg tables
- Perform time travel queries
- Implement CDC ingestion

Data Pipelines & Orchestration

ETL vs ELT
- Pipeline design principles
- Idempotency
- Error handling
Workflow Orchestration
- DAG scheduling
- Dependency management
- Retry strategies
Hands-on
- Build orchestrated DE pipeline
- Add monitoring & logging

Data Governance, Security & Performance

Data Security
- Encryption at rest & in transit
- Role-based access control
Governance & Metadata
- Data lineage
- Catalog management
- Audit trails
Performance & Cost Optimization
- File sizing strategy
- Cluster tuning
- Storage optimization
Hands-on
- Implement access policies
- Tune pipeline performance

Power BI & Analytics Layer

Data Modeling for BI
- Star schema
- Aggregations
Connecting Lakehouse to Power BI
- Direct query vs import
- Incremental refresh
Dashboard Design Best Practices
- KPI frameworks
- Visualization optimization
Hands-on
- Connect Iceberg/Spark data to Power BI
- Build executive dashboard

Capstone – End-to-End Modern Data Platform

Architecture Design
- Event ingestion via Kafka
- Processing via Spark
- Storage via Iceberg
- Lakehouse layering
- BI visualization
Production Considerations
- Scalability
- Fault tolerance
- Cost optimization
- Governance
Final Presentation
- Architecture walkthrough
- Trade-off discussions
- Production readiness checklist

Duration

15 Days

Level

Advanced Level

Design and Tailor this course

As per your team needs

FIND YOUR COURSE

Topics

Brands

Data Engineering Bootcamp – End-to-End Modern Data Platform

Duration

Level

Design and Tailor this course

Overview

Audience

Prerequisites

Curriculum

Duration

Level

Design and Tailor this course

Strategic Capability Areas

Artificial Intelligence

Generative AI

Anthropic Claude

Agentic AI

Data

Cloud

Cyber Security

Blockchain

Agile

DevOps

RPA

QA and Testing

Soft skills

Strategic Capability Areas

Artificial Intelligence

Generative AI

Agentic AI

Data

Cloud

Cyber Security

Blockchain

Agile

DevOps

RPA

QA and Testing

Soft skills

Let’s Build Your Growth Ecosystem.

Get in touch