## About Data Science

This is a complete Data Science bootcamp specialization training course that provides you detailed learning in data science, data analytics, project life cycle, data acquisition, analysis, statistical methods, data visualization, data Wrangling and Machine Learning. You will gain expertise to deploy Recommenders using Scikit-Learn programming, data analysis, data transformation, transform-fit, score and evaluation of the best fit model based on the features of the dataset.**What you will Learn:**

- Data Science introduction and importance
- Data acquisition and Data Science lifecycle
- Experimentation, evaluation and project deployment tools
- Different algorithms used in Machine Learning
- Predictive analytics, segmentation using clustering
- Data Scientist roles and responsibilities
- Deploying recommender systems on real world data sets
- Work on data mining, data structures, data manipulation.

What is Data Science, significance of Data Science in today’s digitally-driven world is of high impact, today's World complex problems carries some sort of geomatrical patterns. Using Data Science Machine Learning algorithms, you can analyze the data and predict the accuracy of the outcome. Applications of Data Science covers the scope of Predicting Stock Price in Capital Markets to predicting the Home Price in Real Estate.

**Hands-on Exercise – Basic **Installation of Anaconda Distribution IDE, PyCharm. Start with Implementing simple mathematical operations and logic using Pandas Data sets.

- Control Functions
- File Handling
- Data Structure

**II. Data Exploration**

Introduction to data exploration, importing and exporting data to/from external sources, what is data exploratory analysis, data importing, dataframes, working with dataframes, accessing individual elements, vectors and factors, operators, in-built functions, conditional, looping statements and user-defined functions, matrix, list and array.

**Hands-on Exercise – **Accessing individual elements of Datasets in Pandas, modifying and extracting the results from the dataset using user-defined functions in Python.

- Numpy
- Pandas
- Regular Expressions

**III. Data Manipulation**

**Hands-on Exercise – **Implementing packages to perform various operations for abstracting over how data is manipulated and stored.

- Numpy
- Pandas
- Regular Expressions

**IV. Data Visualization**

Introduction to visualization, Different types of graphs, Introduction to grammar of graphics pyplot package, Seaborn is popular data visualization library that is built on top of Matplotlib. Using Seaborn you will learn how easier it to generate heat maps, time series and violin plots. Ggplot is Python visualization library, using this library learn how to construct plots using grammar without thinking about the implementation details. Using Plotly for data visualization strength lies in making interactive plots. Geoplotlib is a toolbox used for plotting geographical data and map creation. This library will help you to learn how to plot variety of map-types like heatmap.

**Hands-on Exercise – **Creating data visualization to understand the various types of charts using pyplot, Ggplot, Seaborn, Geoplotlib for importing and analyzing data into grids. You will visualize versatility of Matplotlib to make visualization types:- Scatter plots, Bar charts, Line plots, Pie charts, Contour plots, Spectrograms.

**V. Introduction to Statistics**

Why do we need Statistics?, Categories of Statistics, Statistical Terminologies,Types of Data, Measures of Central Tendency, Measures of Spread, Correlation & Co-variance,Standardization & Normalization,Probability & Types of Probability, Hypothesis Testing, Chi-Square testing, ANOVA, normal distribution, binary distribution.

**Hands-on Exercise – **Building a statistical analysis model that uses quantifications, representations, experimental data for gathering, reviewing, analyzing and drawing conclusions from data.

**VI. Machine Learning**

Introduction to Machine Learning, introduction to Linear Regression, predictive modeling with Linear Regression, simple Linear and multiple Linear Regression, concepts and formulas, assumptions and residual diagnostics in Linear Regression, building simple linear model, predicting results and finding p-value, introduction to logistic regression, comparing linear regression and logistics regression, bivariate & multi-variate logistic regression, confusion matrix & accuracy of model, Linear Regression concepts and detailed formulas, various assumptions of Linear Regression,residuals,understanding the fit of the model, building simple linear model, predicting results and finding p-value, understanding the summary results with Null Hypothesis, p-value & F-statistic, building linear models with multiple independent variables.

**Hands-on Exercise – **Modeling the relationship within the data using linear predictor functions. Implementing Linear & Logistics Regression in Python by building model with ‘tenure’ as dependent variable and multiple independent variables.

- Linear Regression
- Logistic Regression
- Supervised Vs Unsupervised Learning
- Naive Bayes Classifier
- Decision Tree, RandomForest
- K-Nearest Neighbours
- Ensemble Learning
- K-Means Clustering

### Project Detail

**Case Study 1 (Linear Regression Project)**

**Case Study 2 (Logistic Regression Project)**

**Case Study 3 (Decision Trees Project)**

**Case Study4 (K Means Clustering Project)**

**Case Study 5 (XG BOOST)**

Project 1: Recommendation Algorithm for Movies

Topics: This is a real-world project that gives you hands-on experience in working with a movie recommender system. Depending on what movies are liked by a particular user, you will be in a position to provide data-driven recommendations. This project involves understanding recommender systems, standardization of the different datasets information, perform filtering, building the training and test sets. Build algorithms to Predict likely hood of the User liking the Genre of the Movie. Perform Cross Validation and Bias Variance. The main components of the project include the following:

- Feature Engineering
- Feature Scaling
- Feature Standardization
- Building Several Machine Learning Models.
- Predicting the Recommendation for movies
- Comparing the Machine Learning Models- K-Nearest Neighbor, Decision Tree, Random Forest.
- Evaluating the Model Performances.

Project 2: Email Classification as SPAM/HAM

Topics: This is a real-world project that gives you hands-on experience in working with most of the Machine Learning algorithms. The main components of the project include the following:

- Manipulating data to extract meaningful insights
- Tokenize the Dataset, Verctorize the Dataset
- Visualizing data to find out patterns among different factors
- Implementing these algorithms: linear regression, decision tree, and Naïve Bayes
- Evaluate the Model Performances.

## 2759 Reviews

Course Material is comprehensive and well structured. I found that course extends beyond Normal Machine Learning curriculum. Trainer is highly approachable for clearing doubts. On my Request trainer even provided Overview and Case Study of the Automated Machine Learning Models. I thoroughly enjoyed the course and I also want to Thank Support Staff for Notifications to join the Meetings and Updates on the training sessions. Great Professional team at Spark Academy from Trainers to Support Staff.

I am a Data Analyst by profession and I have found learning Data Science even more interesting and contextual to my career growth. My Experience of taking the course from Spark Academy was fabulous. I have successfully completed my IBM Data Science Certification with few practices. Instructor at Spark Academy continuously guided me from concepts to focus areas of Data Science program. Highly Recommending others to take their Data Science course.

Spark Academy made the course well Organized. Each Case Study is structured and there are Home Assignments for Case Studies. Entire Procedure of training made my learning curve easy. I want to sincerely Thank Spark Academy to provide me the Best In Class Training..

I am glad that I took the course from the Spark Academy. Training provide the most comprehensive and well structured Data Science Program. There are lot of Real World Projects and they Often give additional Projects based on your needs to practice. So far one of the Best Experience of Online Course.

Data Science course was presented in training in most optimal manner. Every Topic was simplified so even a beginner can do the Jump start on complex algorithms. My fear of learning Data Science was resolved with the Training method and approach that Spark Academy took. Definitely the Best Course in the Market to learn and boost the career.