Introduction to Spark

Exploring processing engine for executing Data Engineering, Data Science, and Machine Learning on distributed clusters

Duration

3 Day

Level

Basic Level

Design and Tailor this course

As per your team needs

Edit Content

The Introduction to Apache Spark training course is designed to demonstrate the necessary skills to work with Apache Spark, an open-source engine for data in the Hadoop ecosystem optimized for speed and advanced analytics.
The course begins by examining how to use Spark as an alternative to traditional MapReduce processing. Next, it explores how Spark supports streamed data processing and iterative algorithms. The course concludes with a lesson on how Spark enables jobs to run faster than traditional Hadoop MapReduce.

After this course, you will be able to:
○ Describe how Apache Spark,Yarn and Hadoop fit together
○ Understand Spark Internals and architecture.
○ Work with Dataframes & SparkSQL
○ Implement an application using the key Spark concepts.
○ Writing and running spark application on cluster
○ Understand Spark Streaming basics.

Edit Content

This course is designed for application Developers, DevOps Engineers, Architects.

Edit Content
  • What is Apache Spark?
  • Spark versus MapReduce
  • Using the Spark Shell
  • Why HDFS?
  • HDFS Architecture
  • Using HDFS
  • What is Yarn?
  • How does Spark run with Yarn?
  • Understand Transformations & Actions
  • Spark Partitions
  • Drivers & Executors
  • RDD vs Dataframes/Datasets
  • Working with different file formats
  • Working with Dataframes API
  • Introducing Spark SQL
  • SparkContext
  • Spark Properties
  • Building and Running a Spark Application
  • Logging
  • Running Spark on Cluster
  • Spark Web UI walkthrough
  • What is stage,task & jobs
  • Understanding execution plan
  • Caching
  • Aggregations
  • Joins
  • Streaming Overview
  • Sliding Window Operations
  • Basic Spark Streaming Applications
Edit Content

Participants should preferably have prior Software development experience along with basic knowledge of SQL and Unix commands. Knowledge of Python/Scala would be a plus.

Connect

we'd love to have your feedback on your experience so far