Data and Machine Learning Fundamentals

A getting started voyage to the realm of Machine Learning with pragmatic hands-on experience

Duration

5 Days

Level

Beginner Level

Design and Tailor this course

As per your team needs

Edit Content

This course is a stepping stone for the “Machine Learning and Artificial Intelligence” learning path. It has been designed and developed for creating a base for the next level of courses in the above path. Below points provide high level overview about the course –

  • Understand role of Data and Machine Learning  
  • Use cases of Machine Learning
  • Introduction to some of the key technologies in Data and Machine Learning 
  • Providing hands-on experience in Data Acquisition, Processing, Analysis and Modeling using Python programming language
  • The participants will deal with various common types of data e.g. CSV, Web data, Social Media data etc. for pre-processing and/or building Machine Learning Models
  • During the course, the participants will also get exposure to perform Exploratory data analysis along with learning basic statistics
Edit Content

This program is designed for those who aspire for Data/ML/AI roles:

  • Data Engineers
  • Data Scientists
  • Machine Learning Engineers
  • Data Integration Engineers
  • Data Architects
Edit Content
  • Significance of Data
  • What is Machine Learning (ML)?
  • Practical Use cases
  • Concepts and Terms
  • Tools/Platforms for ML
  • Machine Learning End to End Pipeline
  • Roles and Responsibilities of Data Engineer and Data Scientist
  • Installing Anaconda
  • Setting up Jupyter Notebook
  • Experiencing Notebooks
  • Introduction to Google Colab
  • Hands-on Exercise(s)
  • Content Acquisition Approaches, Pros & Cons
  • Working with Beautiful Soup
  • Acquiring data using Rest Based APIs
  • Connecting to External data sources 
  • Working with datasets
  • Manipulating the datasets 
  • Exporting the datasets into external files
  • Population and Sample
  • Data Types
  • Measures of Central tendency
  • Measures of dispersion
  • Percentiles & Quartiles
  • Box plots and outlier detection
  • Creating Graphs and Reporting
  • Probability Distributions 
  • Hypothesis testing 
  • Hands-on Exercise(s)
  • Dealing with One-dimensional Arrays
  • Dealing with Multi-dimensional Arrays
  • Working with NumPy Array
  • NumPy Arrays Compared to Python Lists
  • Manipulating Arrays
  • Hands-on Exercise(s)
  • Basic types – Series and DataFrames
  • Working with a Series
  • Element-wise Operations
  • Creating a DataFrame from various sources e.g. CSV
  • Data Manipulation using Pandas
  • Hands-on Exercise(s)
  • Overview
  • Key types of plots
  • Exploratory Analysis using MatPlot Lob
  • Hands-on Exercises
  • Introduction to Seaborn
  • Seaborn foundation
  • Key types of plots
  • Customizing Seaborn Plots
  • Hands-on Exercise(s)
  • Exploratory Data Analysis
  • Data Cleaning techniques
    • Deal with missing data
    • Add default values
    • Remove incomplete rows
    • Deal with error-prone columns
    • Fixing the nan values and string/float confusion
  • Data Preparation for ML
    • Normalize data types
    • Feature Scaling
    • Feature Standardization
    • Label Encoding
    • One-Hot Encoding
  • Hands-on Exercise(s)
  • What is Feature Engineering?
  • Why Feature Engineering?
  • How to apply Feature Engineering?
  • Discussions on various scenarios
  • Hands-on Exercise(s)
  • Types of Machine Learning
  • Key Algorithms in Machine Learning
  • Practical Applications of Machine Learning
  • Various frameworks/Libraries popular for ML
  • Concepts and Terms
  • Why Scikit Learn?
  • Code Walkthrough
  • Hands-on Exercise(s)
  • Key Classification Algorithms
  • Conditional Probability 
  • Proof of Bayes Theorem
  • Naïve Bayes Classifier
  • Confusion Matrix
  • Accuracy
  • Key Regression Algorithms
  • Linear, Logistic and Other Key types of Regressions
  • Decision Trees
  • Ensemble Learning –  Random Forest
  • Gradient Descent
  • Loss function
  • Bias vs Variance Tradeoff
  • Confusion Matrix
  • Evaluating Models
  • Hyper Parameter Tuning
  • Hands-on Exercise(s)
  • Key types of Unsupervised ML
  • Principal Component Analysis
  • Performing Clustering of data
  • Hands-on Exercise(s)
  • Basic Python
    • Regular-expression
    • Higher Order Functions
    • Nested-statements-and-scope
    • User defined Functions
    • Lambda-expressions
    • Multiple exercises
  • Numpy
    • Understanding-data-types
    • Numpy-indexing-selection
    • Numpy-arrays
    • Sorting
    • Numpy-operations
  • Pandas
    • Series
    • Operations
    • Merging-joining-concatenation
    • Missing-data
    • Dataframes
    • Data-input-output
    • Groupby
  • Descriptive Statistics
  • Social data analysis
  • Data Acquisition
  • Data preprocessing and feature exploration
  • Matplotlib
    • Matplotlib-overview
    • Settings-and-stylesheets
    • Multiple-subplots
    • Simple-scatter-plots
    • Histograms
    • Visualization-with-seaborn
    • Simple-line-plots
    • Three-dimensional-plotting
    • customizing-legends
  • Seaborn
    • Categorical-plots
    • Regression-plots
    • Style-and-color
    • Matrix-plots
    • Distribution-plots
    • Grids
  • Scikit learn
    • Linear regression
    • Logistic regression
    • K means clustering
    • Principal component analysis
Edit Content

Participants should preferably have some hands-on experience in programming language. Knowledge of Python would be a plus.

Connect

we'd love to have your feedback on your experience so far