## About Statistics and Data Science Masters Program

This is a complete hands on Statistics and Data Science specialization training course that provides you detailed learning in data science, data analytics, project life cycle, data acquisition, analysis, statistical methods, data visualization, data Wrangling and Machine Learning. You will gain expertise to deploy Recommenders using Scikit-Learn programming, data analysis, data transformation, transform-fit, score and evaluation of the best fit model based on the features of the dataset.

**What you will Learn:**

- Statistics Fundamentals and Principles for Regression Model
- Probability
- Data Science introduction and importance
- Data acquisition and Data Science lifecycle
- Experimentation, evaluation and project deployment tools
- Different algorithms used in Machine Learning
- Predictive analytics, segmentation using clustering
- Data Scientist roles and responsibilities
- Deploying recommender systems on real world data sets
- Work on data mining, data structures, data manipulation.

**Statistics**

**Fundamentals of descriptive statistics**

- Types of Data
- Levels of Measurement
- Categorical variable
- Visualization techniques for categorical variables
- Numerical variables using frequency Distribution table
- Histogram Charts
- Cross table and scatter plots

- Measure of central tendency: mean, median and mode
- Measuring Skewness
- Measuring how data is spread out: calculative variance
- Variance Exercise
- Standard deviation and coefficient of variation
- Standard deviation Exercise
- Calculating and understanding covariance
- Covariance Exercise
- Correlation coefficient
- Correlation Exercise

**Practical Exercise on descriptive stats**

- Practical Exercise

- Introduction to inferential statistics
- Normal distribution
- Central limit Theorem
- Standard error

- Working with estimators and estimates
- Confidence intervals
- Calculating confidence intervals within a population with known variance
- T-score Exercise
- Margin of error
- Calculating Confidence intervals Exercise

- Practical Exercise on inferential statistics

- The null and the alternative hypothesis
- Region rejection and significance level
- Type I error vs Type II error
- Test Mean population variance
- p-value
- Practical example Hypothesis testing

**Fundamentals of Regression Analysis**

- Introduction
- Correlation and causation
- Linear Regression Model
- Correlation vs Regression
- Geometric representation of regression model
- Exercise - Reinforced learning
- R-squared
- The ordinary least square setting and practical application
- Multi Linear Regression Model
- The adjusted R-squared
- F-statistics
- OLS assumptions
- Normality and homoscedasticity
- No autocorrelation
- No multicollinearity
- Exercise Linear Regression

What is Data Science, significance of Data Science in today’s digitally-driven world is of high impact, today's World complex problems carries some sort of geometrical patterns. Using Data Science Machine Learning algorithms, you can analyze the data and predict the accuracy of the outcome. Applications of Data Science covers the scope of Predicting Stock Price in Capital Markets to predicting the Home Price in Real Estate.

**Hands-on Exercise – Basic **Installation of Anaconda Distribution IDE, PyCharm. Start with Implementing simple mathematical operations and logic using Pandas Data sets.

- Control Functions
- File Handling
- Data Structure

**II. Data Exploration**

Introduction to data exploration, importing and exporting data to/from external sources, what is data exploratory analysis, data importing, dataframes, working with dataframes, accessing individual elements, vectors and factors, operators, in-built functions, conditional, looping statements and user-defined functions, matrix, list and array.

**Hands-on Exercise – **Accessing individual elements of Datasets in Pandas, modifying and extracting the results from the dataset using user-defined functions in Python.

- Numpy
- Pandas
- Regular Expressions

**III. Data Manipulation**

**Hands-on Exercise – **Implementing packages to perform various operations for abstracting over how data is manipulated and stored.

- Numpy
- Pandas
- Regular Expressions

**IV. Data Visualization**

Introduction to visualization, Different types of graphs, Introduction to grammar of graphics pyplot package, Seaborn is popular data visualization library that is built on top of Matplotlib. Using Seaborn you will learn how easier it to generate heat maps, time series and violin plots. Ggplot is Python visualization library, using this library learn how to construct plots using grammar without thinking about the implementation details. Using Plotly for data visualization strength lies in making interactive plots. Geoplotlib is a toolbox used for plotting geographical data and map creation. This library will help you to learn how to plot variety of map-types like heatmap.

**Hands-on Exercise – **Creating data visualization to understand the various types of charts using pyplot, Ggplot, Seaborn, Geoplotlib for importing and analyzing data into grids. You will visualize versatility of Matplotlib to make visualization types:- Scatter plots, Bar charts, Line plots, Pie charts, Contour plots, Spectrograms.

**V. Introduction to Statistics**

Why do we need Statistics?, Categories of Statistics, Statistical Terminologies,Types of Data, Measures of Central Tendency, Measures of Spread, Correlation & Co-variance,Standardization & Normalization,Probability & Types of Probability, Hypothesis Testing, Chi-Square testing, ANOVA, normal distribution, binary distribution.

**Hands-on Exercise – **Building a statistical analysis model that uses quantifications, representations, experimental data for gathering, reviewing, analyzing and drawing conclusions from data.

**VI. Machine Learning**

Introduction to Machine Learning, introduction to Linear Regression, predictive modeling with Linear Regression, simple Linear and multiple Linear Regression, concepts and formulas, assumptions and residual diagnostics in Linear Regression, building simple linear model, predicting results and finding p-value, introduction to logistic regression, comparing linear regression and logistics regression, bivariate & multi-variate logistic regression, confusion matrix & accuracy of model, Linear Regression concepts and detailed formulas, various assumptions of Linear Regression,residuals,understanding the fit of the model, building simple linear model, predicting results and finding p-value, understanding the summary results with Null Hypothesis, p-value & F-statistic, building linear models with multiple independent variables.

**Hands-on Exercise – **Modeling the relationship within the data using linear predictor functions. Implementing Linear & Logistics Regression in Python by building model with ‘tenure’ as dependent variable and multiple independent variables.

- Linear Regression
- Logistic Regression
- Supervised Vs Unsupervised Learning
- Naive Bayes Classifier
- Decision Tree, Random Forest
- K-Nearest Neighbours
- Ensemble Learning
- K-Means Clustering

### Project Detail

**Case Study 1 (Linear Regression Project)**

**Case Study 2 (Logistic Regression Project)**

**Case Study 3 (Decision Trees Project)**

**Case Study4 (K Means Clustering Project)**

**Case Study 5 (XG BOOST)**

Project 1: Recommendation Algorithm for Movies

Topics: This is a real-world project that gives you hands-on experience in working with a movie recommender system. Depending on what movies are liked by a particular user, you will be in a position to provide data-driven recommendations. This project involves understanding recommender systems, standardization of the different datasets information, perform filtering, building the training and test sets. Build algorithms to Predict likely hood of the User liking the Genre of the Movie. Perform Cross Validation and Bias Variance. The main components of the project include the following:

- Feature Engineering
- Feature Scaling
- Feature Standardization
- Building Several Machine Learning Models.
- Predicting the Recommendation for movies
- Comparing the Machine Learning Models- K-Nearest Neighbor, Decision Tree, Random Forest.
- Evaluating the Model Performances.

Project 2: Email Classification as SPAM/HAM

Topics: This is a real-world project that gives you hands-on experience in working with most of the Machine Learning algorithms. The main components of the project include the following:

- Manipulating data to extract meaningful insights
- Tokenize the Dataset, Vectorize the Dataset
- Visualizing data to find out patterns among different factors
- Implementing these algorithms: linear regression, decision tree, and Naïve Bayes
- Evaluate the Model Performances.

## 752 Reviews

Best Part I liked about the Data Science Training course was the way complex problems were chunked and explained in modules. Spark Academy has one of the Best Course in the Market. Even my Friends in USA have taken several courses from Spark Academy. 5 Star to Spark Academy for providing such a great course on DataScience.

Training was absolutely great, I was able to enjoy the learning of the new algorithms. Trainer was very nice to explain the concepts several times as requested. Thus learning was effortless. Thank You Spark Academy!

I have 10 Years of Experience as a Data Analyst working for Capital Markets. I was looking to switch my career and luckily I got recommendation from my Friend who is a Sr. Data Scientist at Bell Labs that Spark Academy is Offering Best Data Scientist courses Online. I gave it a try and after 2months of completion of the course, I landed a Job as a Data Scientist and I want to Thank Spark Academy for all the Support and Excellent Training to make my career spark!!!!!!!!

Everything I learned from Estimators to Hypothesis testing, I was able to use it at my Work Project. This course really helped to sail as a Data Scientist.

I loved the overall Course training and training method. Kudos to the Training team.