Close this search box.

Scalable Machine Learning and Deep Learning

Jump to the level 2 in achieving the mastery in Scalable Machine Learning


5 Days


Intermediate Level

Design and Tailor this course

As per your team needs

Edit Content

This course is intermediate level in “Machine Learning and Artificial Intelligence” learning path. It has been designed and developed for providing exposure to participants in Scalable Machine learning.  This course covers Spark Core, Spark SQL, Spark Streaming and Spark ML in detail along with providing exposure to Deep Learning in a gentle manner. Below points provide a high-level overview of the course:  

  • Understand the role of Spark in Machine Learning 
  • Providing hands-on experience in Data Acquisition, Processing, Analysis and Modeling using Cloudera distribution of Hadoop and Spark
  • The participants will deal with various common types of data e.g. CSV, XML, JSON, Social Media data etc. for pre-processing and/or building Machine Learning Models using Spark 
  • How Deep Cognition helps in performing Deep Learning
  • During the course, the participants will also get exposure to Deep Learning using Deep Cognition Studio
  • Build Deep Learning Models using Deep Cognition Studio even without knowledge of Statistics
Edit Content

This program is designed for those who aspire for Data/ML/AI roles:

  • Data Engineers
  • Data Scientists
  • Machine Learning Engineers
  • Data Integration Engineers
  • Data Architects
Edit Content
  • Artificial Intelligence (AI) Overview
  • AI vs ML vs Data Science
  • The relationship between Deep Learning (DL) and Machine Learning
  • Practical Use cases
  • Concepts and Terms
  • Tools/Platforms for Scalable ML, DL, and AI
  • Big Data and Cloud fits into the Ecosystem
  • What is Scalable Machine Learning? 
  • Why it is required?
  • Key platforms for performing Scalable Machine Learning
  • Scalable Machine Learning Project End to End Pipeline
  • Spark Introduction
  • Why Spark for Scalable Machine Learning?
  • Databricks Platform Demo
  • Approaches for scaling sci-kit learn code
  • Hands-on Exercise(s): Experiencing the first notebook
  • Quick Recap/Introduction to Hadoop  
  • Logical View of Cloudera Distribution
  • Big Data Analytics Pipelines
  • Components in Cloudera Distribution for performing SML
  • Hands-on Exercise(s)
  • Acquiring Structured content from Relational Databases
  • Acquiring Semi-structured content from Log Files
  • Acquiring Unstructured content from other key sources like Web
  • Tools for Performing Data acquisition at Scale
  • Sqoop, Flume and Kafka Introduction, use cases and architectures
  • Hands-on Exercise(s)
  • Using the Spark Shell
  • Resilient Distributed Datasets (RDDs)
  • Functional Programming with Spark
  • RDD Operations
  • Key-Value Pair RDDs
  • MapReduce and Pair RDD Operations
  • Building and Running a Spark Application
  • Performing Data Validation
  • Data De-Duplication
  • Detecting Outliers
  • Hands-on Exercise(s)
  • Dealing with RDD Infinite Lineages
  • Caching Overview
  • Distributed Persistence
  • Checkpointing of an Iterative Machine Learning Algorithm
  • Hands-on Exercise(s)
  • Introduction
  • Dataframe API
  • Performing ad-hoc query analysis using Spark SQL
  • Hands-on Exercise(s)
  • Spark ML vs Spark MLLib
  • Data types and key terms
  • Feature Extraction
  • Linear Regression using Spark MLLib
  • Hands-on Exercise(s)
  • Spark ML Overview
  • Transformers and Estimators
  • Pipelines
  • Implementing Decision Trees
  • K-Means Clustering using Spark ML
  • Hands-on Exercise(s)
  • What is Natural Language Processing?
  • The NLTK package
  • Preparing text for analysis
  • Text summarisation
  • Sentiment analysis
  • Naïve Bayes technique
  • Text classification
  • Topic Modelling
  • Hands-on Exercise(s)
  • Model Evaluation
  • Optimizing a Model
  • Deploying Model
  • Best Practices
  • Types – Classification and Regression trees
  • Gini Index, Entropy and Information Gain
  • Building Decision Trees
  • Pruning the trees
  • Prediction using Trees
  • Ensemble Models
  • Bagging and Boosting
  • Advantages of using Random Forest
  • Working with Random Forest
  • Ensemble Learning
  • How ensemble learning works
  • Building models using Bagging
  • Random Forest algorithm
  • Random Forest model building
  • Fine tuning hyper-parameters
  • Hands-on Exercise(s)
  • Real-time data acquisition using Kafka
  • Salient Features of Kafka
  • Kafka Use cases
  • Comparing Kafka with other Key tools
  • End to End Data Pipeline using Kafka
  • Integrating Kafka with Spark Streaming
  • Hands-on Exercise(s)
  • What is Deep Learning?
  • Deep Learning Architecture
  • Deep Learning Frameworks
  • The relationship between Deep Learning and Machine Learning
  • Deep Learning Use cases
  • Concepts and Terms
  • How to implement Deep Learning?
  • Deep Cognition Introduction
  • Why Deep Cognition Studio?
  • Walkthrough of Deep Learning Studio
  • Multilayer Perceptron in Deep Cognition
  • How does a single artificial neuron work?
  • Computation Graph
  • Activation Functions
  • Importance of non-linear activation
  • Data encoding for deep neural networks
  • Hands-on Exercise(s)
  • Convolutional Neural Networks
  • Components of CNN
  • Data augmentation 
  • Transfer learning for using pre-trained networks
  • Hands-on Exercise(s)
  • Spark
    • Running an application on YARN
    • Interactive Data Exploration using Spark
    • Working with Pair RDDs
    • Dealing with XML files in Spark
    • Processing JSON data in Spark  
    • Processing Log file data in Spark  
    • Caching in Spark
    • Data Deduplication
    • Using Broadcast Variables 
    • Using Accumulators 
    • Working with Dataframe API
    • Spark SQL – Multiple exercises
    • Spark Streaming: Part 1
    • Spark Streaming: Part 2
    • Integrating Kafka and Spark Streaming 
  • Spark ML
    • Vector
    • Stringindexer and onehotencoder
    • SQL transformer
    • Pipeline
    • Imputer
    • Sparkml_pca
    • Decision tree classification
    • Vector assembler
    • Kmeans to analyze hacking attacks
    • End to end spark ml pipeline using a decision tree
    • Cross-validation
    • Naive Bayes classification
    • NLP and NLTK basics
    • Decision tree classification example
    • Lab sentiment_analysis
    • Logistic regression
    • RFormula
    • Support vector machine
    • Linear regression
    • Predict customer churn
    • Random forest classification
  • Deep Cognition
    • Experiencing first Deep Neural Network 
    • Emotion Analysis 
Edit Content

Participants should have knowledge level equivalent to what is specified in “Data and Machine Learning Fundamentals” course (Intermediate level course in “Machine Learning and Artificial Intelligence” learning path).


we'd love to have your feedback on your experience so far