Data and Machine Learning Fundamentals

A getting started voyage to the realm of Machine Learning with pragmatic hands-on experience

Duration

5 Days

Level

Beginner Level

Design and Tailor this course

As per your team needs

Edit

This course is a stepping stone for the “Machine Learning and Artificial Intelligence” learning path. It has been designed and developed for creating a base for the next level of courses in the above path. Below points provide high level overview about the course –

  • Understand role of Data and Machine Learning  
  • Use cases of Machine Learning
  • Introduction to some of the key technologies in Data and Machine Learning 
  • Providing hands-on experience in Data Acquisition, Processing, Analysis and Modeling using Python programming language
  • The participants will deal with various common types of data e.g. CSV, Web data, Social Media data etc. for pre-processing and/or building Machine Learning Models
  • During the course, the participants will also get exposure to perform Exploratory data analysis along with learning basic statistics
Edit

This program is designed for those who aspire for Data/ML/AI roles:

  • Data Engineers
  • Data Scientists
  • Machine Learning Engineers
  • Data Integration Engineers
  • Data Architects
Edit
  • Significance of Data
  • What is Machine Learning (ML)?
  • Practical Use cases
  • Concepts and Terms
  • Tools/Platforms for ML
  • Machine Learning End to End Pipeline
  • Roles and Responsibilities of Data Engineer and Data Scientist
  • Installing Anaconda
  • Setting up Jupyter Notebook
  • Experiencing Notebooks
  • Introduction to Google Colab
  • Hands-on Exercise(s)
  • Content Acquisition Approaches, Pros & Cons
  • Working with Beautiful Soup
  • Acquiring data using Rest Based APIs
  • Connecting to External data sources 
  • Working with datasets
  • Manipulating the datasets 
  • Exporting the datasets into external files
  • Content Acquisition Approaches, Pros & Cons
  • Working with Beautiful Soup
  • Acquiring data using Rest Based APIs
  • Connecting to External data sources 
  • Working with datasets
  • Manipulating the datasets 
  • Exporting the datasets into external files
  • Population and Sample
  • Data Types
  • Measures of Central tendency
  • Measures of dispersion
  • Percentiles & Quartiles
  • Box plots and outlier detection
  • Creating Graphs and Reporting
  • Probability Distributions 
  • Hypothesis testing 
  • Hands-on Exercise(s)
  • Dealing with One-dimensional Arrays
  • Dealing with Multi-dimensional Arrays
  • Working with NumPy Array
  • NumPy Arrays Compared to Python Lists
  • Manipulating Arrays
  • Hands-on Exercise(s)
  • Basic types – Series and DataFrames
  • Working with a Series
  • Element-wise Operations
  • Creating a DataFrame from various sources e.g. CSV
  • Data Manipulation using Pandas
  • Hands-on Exercise(s)
  • Overview
  • Key types of plots
  • Exploratory Analysis using MatPlot Lob
  • Hands-on Exercises
  • Introduction to Seaborn
  • Seaborn foundation
  • Key types of plots
  • Customizing Seaborn Plots
  • Hands-on Exercise(s)
  • Exploratory Data Analysis
  • Data Cleaning techniques
    • Deal with missing data
    • Add default values
    • Remove incomplete rows
    • Deal with error-prone columns
    • Fixing the nan values and string/float confusion
  • Data Preparation for ML
    • Normalize data types
    • Feature Scaling
    • Feature Standardization
    • Label Encoding
    • One-Hot Encoding
  • Hands-on Exercise(s)
  • What is Feature Engineering?
  • Why Feature Engineering?
  • How to apply Feature Engineering?
  • Discussions on various scenarios
  • Hands-on Exercise(s)
  • Types of Machine Learning
  • Key Algorithms in Machine Learning
  • Practical Applications of Machine Learning
  • Various frameworks/Libraries popular for ML
  • Concepts and Terms
  • Why Scikit Learn?
  • Code Walkthrough
  • Hands-on Exercise(s)
  • Key Classification Algorithms
  • Conditional Probability 
  • Proof of Bayes Theorem
  • Naïve Bayes Classifier
  • Confusion Matrix
  • Accuracy
  • Key Regression Algorithms
  • Linear, Logistic and Other Key types of Regressions
  • Decision Trees
  • Ensemble Learning –  Random Forest
  • Gradient Descent
  • Loss function
  • Bias vs Variance Tradeoff
  • Confusion Matrix
  • Evaluating Models
  • Hyper Parameter Tuning
  • Hands-on Exercise(s)
  • Key types of Unsupervised ML
  • Principal Component Analysis
  • Performing Clustering of data
  • Hands-on Exercise(s)
  • Basic Python
    • Regular-expression
    • Higher Order Functions
    • Nested-statements-and-scope
    • User defined Functions
    • Lambda-expressions
    • Multiple exercises
  • Numpy
    • Understanding-data-types
    • Numpy-indexing-selection
    • Numpy-arrays
    • Sorting
    • Numpy-operations
  • Pandas
    • Series
    • Operations
    • Merging-joining-concatenation
    • Missing-data
    • Dataframes
    • Data-input-output
    • Groupby
  • Descriptive Statistics
  • Social data analysis
  • Data Acquisition
  • Data preprocessing and feature exploration
  • Matplotlib
    • Matplotlib-overview
    • Settings-and-stylesheets
    • Multiple-subplots
    • Simple-scatter-plots
    • Histograms
    • Visualization-with-seaborn
    • Simple-line-plots
    • Three-dimensional-plotting
    • customizing-legends
  • Seaborn
    • Categorical-plots
    • Regression-plots
    • Style-and-color
    • Matrix-plots
    • Distribution-plots
    • Grids
  • Scikit learn
    • Linear regression
    • Logistic regression
    • K means clustering
    • Principal component analysis
Edit

Participants should preferably have some hands-on experience in programming language. Knowledge of Python would be a plus.

Connect

we'd love to have your feedback on your experience so far