Data Engineering Bootcamp – End-to-End Modern Data Platform

Comprehensive 15-Day Intensive Covering Databases, Neo4j, Kafka, Spark, Databricks, Iceberg, Lakehouse Architecture, and Power BI

Duration

15 Days

Level

Advanced Level

Design and Tailor this course

As per your team needs

Overview

This 15-day Data Engineering Bootcamp provides a complete end-to-end learning journey across the entire Data Engineering track — from core data fundamentals and architecture principles to distributed systems, streaming, lakehouse implementation, and business intelligence.

Participants will master relational and NoSQL databases, graph databases (Neo4j), event streaming with Kafka, distributed processing with Spark and Databricks, table formats with Iceberg, medallion/lakehouse architecture, governance, orchestration, and Power BI integration.

The program follows a progressive structure (Basic → Intermediate → Advanced) and includes a capstone building a production-grade modern data platform.

Audience

  • Aspiring Data Engineers
  • Software Engineers transitioning to DE
  • Data Analysts moving to Engineering
  • Platform & Cloud Engineers
  • BI Developers
  • Analytics Engineers

Prerequisites

  • Basic programming knowledge (Python preferred)
  • Basic SQL understanding
  • Familiarity with data concepts (tables, files, schemas)
  • No formal prerequisites required

Curriculum

  1. What is Data Engineering?

    • Role of Data Engineer
    • DE vs DA vs DS
    • Modern data ecosystem overview
    • Batch vs streaming systems=
  2. Types of Data

    • Structured vs semi-structured vs unstructured
    • OLTP vs OLAP workloads
    • Transactional vs analytical systems
  3. Data Modeling Fundamentals

    • ER modeling
    • Normalization & denormalization
    • Star & snowflake schemas
  4. Hands-on

    • Design ER diagram
    • Create normalized schema
  1. RDBMS Architecture

    • ACID properties
    • Indexing strategies
    • Query execution plans
  2. Advanced SQL

    • Joins & window functions
    • CTEs & subqueries
    • Performance tuning
  3. Hands-on

    • Build optimized schema
    • Analyze execution plans
    • Tune slow queries
  1. NoSQL Landscape

    • Key-value
    • Document databases
    • Column-family stores
  2. Graph Databases – Neo4j

    • Graph modeling principles
    • Nodes & relationships
    • Cypher query language
    • Use cases (fraud detection, recommendation)
  3. Hands-on

    • Design graph model
    • Write Cypher queries
    • Build recommendation example
  1. Traditional vs Modern Architectures

    • Monolithic vs distributed systems
    • Lambda & Kappa architectures
    • Event-driven architecture
  2. Lakehouse Architecture

    • Data lake vs warehouse
    • Medallion architecture
    • Separation of compute & storage
  3. Hands-on

    • Design enterprise data architecture
    • Identify bottlenecks & trade-offs
  1. Kafka Architecture

    • Brokers, partitions, replication
    • Producers & consumers
    • Offsets & consumer groups
  2. Event Streaming Concepts

    • Exactly-once semantics
    • Log compaction
    • Event-driven systems
  3. Hands-on

    • Setup Kafka
    • Create topics
    • Produce & consume events
  1. Schema Registry & Serialization

    • Avro / JSON / Protobuf
    • Schema evolution
  2. Kafka Connect

    • Source connectors
    • Sink connectors
    • CDC pipelines
  3. Hands-on

    • Build CDC ingestion pipeline
    • Monitor consumer lag
  1. Spark Architecture

    • Driver & executors
    • DAG execution
    • Lazy evaluation
  2. RDDs & DataFrames

    • Transformations & actions
    • Partitioning strategies
  3. Hands-on

    • Build distributed processing job
    • Analyze Spark UI
  1. Catalyst Optimizer

    • Logical vs physical plans
    • Predicate pushdown
  2. Performance Optimization

    • Shuffle tuning
    • Broadcast joins
    • Caching strategies
  3. Hands-on

    • Optimize large joins
    • Benchmark performance
  1. Streaming Concepts

    • Event time vs processing time
    • Watermarking
    • Stateful aggregations
  2. Spark + Kafka Integration

    • Stream ingestion
    • Exactly-once guarantees
  3. Hands-on

    • Build real-time streaming pipeline
    • Handle late-arriving data
  1. Databricks Architecture

    • Workspaces & clusters
    • Jobs vs interactive clusters
    • Autoscaling
  2. Medallion Architecture

    • Bronze layer
    • Silver layer
    • Gold layer
  3. Hands-on

    • Implement layered lakehouse
    • Build incremental transformations
  1. Iceberg Internals

    • Table metadata
    • Snapshots & time travel
    • Schema evolution
    • Partition evolution
  2. Iceberg + Spark Integration

    • Incremental reads
    • Streaming writes
    • Compaction
  3. Hands-on

    • Create Iceberg tables
    • Perform time travel queries
    • Implement CDC ingestion
  1. ETL vs ELT

    • Pipeline design principles
    • Idempotency
    • Error handling
  2. Workflow Orchestration

    • DAG scheduling
    • Dependency management
    • Retry strategies
  3. Hands-on

    • Build orchestrated DE pipeline
    • Add monitoring & logging
  1. Data Security

    • Encryption at rest & in transit
    • Role-based access control
  2. Governance & Metadata

    • Data lineage
    • Catalog management
    • Audit trails
  3. Performance & Cost Optimization

    • File sizing strategy
    • Cluster tuning
    • Storage optimization
  4. Hands-on

    • Implement access policies
    • Tune pipeline performance
  1. Data Modeling for BI

    • Star schema
    • Aggregations
  2. Connecting Lakehouse to Power BI

    • Direct query vs import
    • Incremental refresh
  3. Dashboard Design Best Practices

    • KPI frameworks
    • Visualization optimization
  4. Hands-on

    • Connect Iceberg/Spark data to Power BI
    • Build executive dashboard
  1. Architecture Design

    • Event ingestion via Kafka
    • Processing via Spark
    • Storage via Iceberg
    • Lakehouse layering
    • BI visualization
  2. Production Considerations

    • Scalability
    • Fault tolerance
    • Cost optimization
    • Governance
  3. Final Presentation

    • Architecture walkthrough
    • Trade-off discussions
    • Production readiness checklist

Let’s Build Your Growth Ecosystem.

Get in touch