Duration
1 Day
Level
Advanced Level
Design and Tailor this course
As per your team needs
Edit Content
The program is focussed on applying Machine Learning in a Scalable way using Spark framework. You will learn the course curriculum through theory lectures, live demonstrations and lab exercises. This course will be taught in the Python programming language and we will be using Cloudera 7.x using Spark 3.x
Edit Content
This course is designed for :
- Data Scientists, Application developers, DevOps engineers, Architects, QAs, Technical Managers
- Developers who are experienced with Spark
Edit Content
- Problems with Traditional Machine Learning Frameworks
- Machine Learning at Scale – Various options
- Why Spark?
- How Spark performs well for Iterative Machine Learning Algorithms?
- Data Acquisition from various data sources
- Data Cleansing/Processing at Scale
- Feature Engineering – Feature Extraction, Scaling etc.
- Modeling the problem
- Evaluation Metrics
- Spark ML vs Spark MLLib
- Data types and key terms
- Feature Extraction
- Linear Regression using Spark MLLib
- Hands-on Exercises
- Spark ML Overview
- Transformers and Estimators
- Pipelines
- Implementing Decision Trees
- K-Means Clustering using Spark ML
- Hands-on Exercises
Edit Content
- Basic knowledge of Python
- Basic Knowledge of Apache Spark
- Participants should preferably have basic knowledge of SQL, Scala/Java and Unix commands