Introduction to Spark

Exploring processing engine for executing Data Engineering, Data Science, and Machine Learning on distributed clusters

Duration

3 Day

Level

Basic Level

Design and Tailor this course

As per your team needs

Edit Content

The Introduction to Apache Spark training course is designed to demonstrate the necessary skills to work with Apache Spark, an open-source engine for data in the Hadoop ecosystem optimized for speed and advanced analytics.
The course begins by examining how to use Spark as an alternative to traditional MapReduce processing. Next, it explores how Spark supports streamed data processing and iterative algorithms. The course concludes with a lesson on how Spark enables jobs to run faster than traditional Hadoop MapReduce.

After this course, you will be able to:
○ Describe how Apache Spark,Yarn and Hadoop fit together
○ Understand Spark Internals and architecture.
○ Work with Dataframes & SparkSQL
○ Implement an application using the key Spark concepts.
○ Writing and running spark application on cluster
○ Understand Spark Streaming basics.

Edit Content

This course is designed for application Developers, DevOps Engineers, Architects.

Edit Content

Spark Basics

What is Apache Spark?
Spark versus MapReduce
Using the Spark Shell

The Hadoop Distributed File (HDFS) System

Why HDFS?
HDFS Architecture
Using HDFS

Yarn Cluster Manager

What is Yarn?
How does Spark run with Yarn?

Spark Internals

Understand Transformations & Actions
Spark Partitions
Drivers & Executors
RDD vs Dataframes/Datasets

Dataframes & Spark SQL

Working with different file formats
Working with Dataframes API
Introducing Spark SQL

Writing Spark Applications with IDE

SparkContext
Spark Properties
Building and Running a Spark Application
Logging
Running Spark on Cluster

The Spark Web UI & Execution plan

Spark Web UI walkthrough
What is stage,task & jobs
Understanding execution plan

Dataframes Advance

Caching
Aggregations
Joins

Spark Streaming

Streaming Overview
Sliding Window Operations
Basic Spark Streaming Applications

Edit Content

Participants should preferably have prior Software development experience along with basic knowledge of SQL and Unix commands. Knowledge of Python/Scala would be a plus.

FIND YOUR COURSE

Topics

Brands

Introduction to Spark

Duration

Level

Design and Tailor this course

Quick Links

our Offerings

Get in touch

Sign up for DataCouch Communications

Connect

we'd love to have your feedback on your experience so far