Data Science Architect
The Spark Academy Data Science Architect masters’ course will provide you with in-depth knowledge on Data Science, real-time analytics, statistical computing, SQL, parsing machine-generated data and finally the domain of Deep Learning in Artificial Intelligence. In this program, you will also learn how to leverage Big Data Analytics with Spark for Data Science. This program is specially designed by industry experts, and you will get 3 courses with several industry-based projects.
List of Courses Included
Online Instructor-led Courses:
- Data Science with R
- Python for Data Science
- Apache Spark and Scala
Introduction to Data Science with R
What is Data Science, significance of Data Science in today’s digitally-driven world, applications of Data Science, lifecycle of Data Science, components of the Data Science lifecycle, Introduction to Machine Learning and Deep Learning, introduction to R programming and R Studio.
Hands-on Exercise – Installation of R Studio, implementing simple mathematical operations and logic using R operators, loops, if statements and switch cases.
Introduction to data exploration, importing and exporting data to/from external sources, what is data exploratory analysis, data importing, dataframes, working with dataframes, accessing individual elements, vectors and factors, operators, in-built functions, conditional, looping statements and user-defined functions, matrix, list and array.
Hands-on Exercise – Accessing individual elements of customer churn data, modifying and extracting the results from the dataset using user-defined functions in R.
Need for Data Manipulation, Introduction to dplyr package, Selecting one or more columns with select() function, Filtering out records on the basis of a condition with filter() function, Adding new columns with the mutate() function, Sampling & Counting with sample_n(), sample_frac() & count() functions, Getting summarized results with the summarise() function, Combining different functions with the pipe operator, Implementing sql like operations with sqldf, Text Mining with StringR, wordcloud & StringR, Data Manipulation with data.table package, Working with dates with the lubridate package.
Hands-on Exercise – Implementing dplyr to perform various operations for abstracting over how data is manipulated and stored.
Introduction to visualization, Different types of graphs, Introduction to grammar of graphics & ggplot2 package, Understanding categorical distribution with geom_bar() function, understanding numerical distribution with geom_hist() function, building frequency polygons with geom_freqpoly(), making a scatter-plot with geom_pont() function, multivariate analysis with geom_boxplot, univariate Analysis with Bar-plot, histogram and Density Plot, multivariate distribution, Bar-plots for categorical variables using geom_bar(), adding themes with the theme() layer, visualization with plotly package & ggvis package, geographic visualization with ggmap(), building web applications with shinyR, frequency-plots with geom_freqpoly(), multivariate distribution with scatter-plots and smooth lines, continuous vs categorical with box-plots, subgrouping the plots, working with co-ordinates and themes to make the graphs more presentable, Intro to plotly & various plots, visualization with ggvis package, geographic visualization with ggmap(), building web applications with shinyR.
Hands-on Exercise – Creating data visualization to understand the customer churn ratio using charts using ggplot2, Plotly for importing and analyzing data into grids. You will visualize tenure, monthly charges, total charges and other individual columns by using the scatter plot.
Introduction to Statistics
Why do we need Statistics?, Categories of Statistics, Statistical Terminologies,Types of Data, Measures of Central Tendency, Measures of Spread, Correlation & Covariance,Standardization & Normalization,Probability & Types of Probability, Hypothesis Testing, Chi-Square testing, ANOVA, normal distribution, binary distribution.
Hands-on Exercise – Building a statistical analysis model that uses quantifications, representations, experimental data for gathering, reviewing, analyzing and drawing conclusions from data.
Introduction to Machine Learning, introduction to Linear Regression, predictive modeling with Linear Regression, simple Linear and multiple Linear Regression, concepts and formulas, assumptions and residual diagnostics in Linear Regression, building simple linear model, predicting results and finding p-value, introduction to logistic regression, comparing linear regression and logistics regression, bivariate & multi-variate logistic regression, confusion matrix & accuracy of model, threshold evaluation with ROCR, uses of Poisson Regression, bivariate & multivariate Poisson Regression, implementing Poisson Regression in R, Linear Regression concepts and detailed formulas, various assumptions of Linear Regression,residuals, qqnorm(), qqline(), understanding the fit of the model, building simple linear model, predicting results and finding p-value, understanding the summary results with Null Hypothesis, p-value & F-statistic, building linear models with multiple independent variables.
Hands-on Exercise – Modeling the relationship within the data using linear predictor functions. Implementing Linear & Logistics Regression in R by building model with ‘tenure’ as dependent variable and multiple independent variables.
Introduction to Logistic Regression, Logistic Regression Concepts, Linear vs Logistic regression, math behind Logistic Regression, detailed formulas, logit function and odds, Bi-variate logistic Regression, Poisson Regression, building simple “binomial” model and predicting result, confusion matrix and Accuracy, true positive rate, false positive rate, and confusion matrix for evaluating built model, threshold evaluation with ROCR, finding the right threshold by building the ROC plot, cross validation & multivariate logistic regression, building logistic models with multiple independent variables, real-life applications of Logistic Regression.
Hands-on Exercise – Implementing predictive analytics by describing the data and explaining the relationship between one dependent binary variable and one or more binary variables. You will use glm() to build a model and use ‘Churn’ as the dependent variable.
Decision Trees & Random Forest
What is classification and different classification techniques, introduction to Decision Tree, algorithm for decision tree induction, building a decision tree in R, creating a perfect Decision Tree, Confusion Matrix, Regression trees vs Classification trees, introduction to ensemble of trees and bagging, Random Forest concept, implementing Random Forest in R, what is Naive Bayes, Computing Probabilities, Laplace Correction, Implementing Naive Bayes in R, What is KNN algorithm, implementing KNN in R, what is Support Vector Machine, implementing SVM in R, what is XGBOOST, Implementing XGBOOST in R, Impurity Function – Entropy, understand the concept of information gain for right split of node, Impurity Function – Information gain, understand the concept of Gini index for right split of node, Impurity Function – Gini index, understand the concept of Entropy for right split of node, overfitting & pruning, pre-pruning, post-pruning, cost-complexity pruning, pruning decision tree and predicting values, find the right no of trees and evaluate performance metrics.
Hands-on Exercise – Implementing Random Forest for both regression and classification problems. You will build a tree, prune it by using ‘churn’ as the dependent variable and build a Random Forest with the right number of trees, using ROCR for performance metrics.
What is Clustering & it’s Use Cases, what is K-means Clustering, what is Canopy Clustering, what is Hierarchical Clustering, introduction to Unsupervised Learning, feature extraction & clustering algorithms, k-means clustering algorithm, Theoretical aspects of k-means, and k-means process flow, K-means in R, implementing K-means on the data-set and finding the right no. of clusters using Scree-plot, hierarchical clustering & Dendogram, understand Hierarchical clustering, implement it in R and have a look at Dendograms, Principal Component Analysis, explanation of Principal Component Analysis in detail, PCA in R, implementing PCA in R.
Hands-on Exercise – Deploying unsupervised learning with R to achieve clustering and dimensionality reduction, K-means clustering for visualizing and interpreting results for the customer churn data.
Association Rule Mining & Market Basket Analysis
Introduction to association rule Mining & Market Basket Analysis, measures of Association Rule Mining: Support, Confidence, Lift, Apriori algorithm & implementing it in R, Introduction to Recommendation Engine, user-based collaborative filtering & Item-Based Collaborative Filtering, implementing Recommendation Engine in R, user-Based and item-Based, Recommendation Use-cases.
Hands-on Exercise – Deploying association analysis as a rule-based machine learning method, identifying strong rules discovered in databases with measures based on interesting discoveries.
Time Series Analysis
What is Time Series, techniques and applications, components of Time Series, moving average, smoothing techniques, exponential smoothing, univariate time series models, multivariate time series analysis, Arima model, Time Series in R, sentiment analysis in R (Twitter sentiment analysis), text analysis.
Hands-on Exercise – Analyzing time series data, sequence of measurements that follow a non-random order to identify the nature of phenomenon and to forecast the future values in the series.
Introduction to Artificial Intelligence
Introducing Artificial Intelligence and Deep Learning, what is an Artificial Neural Network, TensorFlow – computational framework for building AI models, fundamentals of building ANN using TensorFlow, working with TensorFlow in R.
Python Course Content
Python Environment Setup and Essentials
Introduction to Python Language, features, the advantages of Python over other programming languages, Python installation, Windows, Mac & Linux distribution for Anaconda Python, deploying Python IDE, basic Python commands, data types, variables, keywords and more.
Hands-on Exercise – Installing Python Anaconda for the Windows, Linux and Mac.
Python language Basic Constructs
Built-in data types in Python, tabs and spaces indentation, code comment Pound # character, variables and names, Python built-in data types, Numeric, int, float, complex, list tuple, set dict, containers, text sequence, exceptions, instances, classes, modules, Str(String), Ellipsis Object, Null Object, Ellipsis, Debug, basic operators, comparison, arithmetic, slicing and slice operator, logical, bitwise, loop and control statements, while, for, if, break, else, continue.
Hands-on Exercise – Write your first Python program Write a Python Function (with and without parameters) Use Lambda expression Write a class, create a member function and a variable, Create an object Write a for loop to print all odd numbers
OOP concepts in Python and database connection
How to write OOP concepts program in Python, connecting to a database, classes and objects in Python, OOPs paradigm, important concepts in OOP like polymorphism, inheritance, encapsulation, Python functions, return types, and parameters, Lambda expressions, connecting to database and pulling the data.
NumPy for mathematical computing
Introduction to arrays and matrices, indexing of array, datatypes, broadcasting of array math, standard deviation, conditional probability, coorelation and covariance.
Hands-on Exercise – How to import NumPy module, creating aray using ND-array, calculating standard deviation on array of numbers, calculating correlation between two variables.
SciPy for scientific computing
Introduction to SciPy and its functions, building on top of NumPy, cluster, linalg, signal, optimize, integrate, subpackages, SciPy with Bayes Theorem.
Hands-on Exercise – Importing of SciPy, applying the Bayes theorem on the given dataset.
Matplotlib for data visualization
How to plot graph and chart with Python, various aspects of line, scatter, bar, histogram, 3D, the API of MatPlotLib, subplots.
Hands-on Exercise – deploying MatPlotLib for creating Pie, Scatter, Line, Histogram.
Pandas for data analysis and machine learning
Introduction to Python dataframes, importing data from JSON, CSV, Excel, SQL database, NumPy array to dataframe, various data operations like selecting, filtering, sorting, viewing, joining, combining, how to handle missing values, time series analysis, linear regression.
Hands-on Exercise – working on importing data from JSON files, selecting record by a group, applying filter on top, viewing records, analyzing with linear regression, and creation of time series.
Scikit-Learn for Natural Language Processing
What is natural language processing, working with NLP on text data, setting up the environment using Jupyter Notebook, analyzing sentence, the Scikit-Learn machine learning algorithms, bags of words model, extracting feature from text, searching a grid, model training, multiple parameters, building of a pipeline.
Hands-on Exercise – setting up the Jupyter notebook environment, loading of a dataset in Jupyter, algorithms in Scikit-Learn package for performing machine learning techniques, training a model to search a grid.
Web scraping with Python
Introduction to web scraping in Python, the various web scraping libraries, beautifulsoup, Scrapy Python packages, installing of beautifulsoup, installing Python parser lxml, creating soup object with input HTML, searching of tree, full or partial parsing, output print, searching the tree.
Hands-on Exercise – Installation of Beautiful soup and lxml Python parser, making a soup object with input HTML file, navigating using Py objects in soup tree.
Python deployed for Hadoop
Introduction to Python for Hadoop, the basics of the Hadoop ecosystem, Hadoop common, the architecture of MapReduce and HDFS, deploying Python coding for MapReduce jobs on Hadoop framework.
Hands-on Exercise – How to write a MapReduce job with Python, connecting to the Hadoop framework and performing the tasks.
Python for Apache Spark coding
Introduction to Apache Spark, importance of RDD, the Spark libraries, deploying Spark code with Python, the machine learning library of Spark MLlib, deploying Spark MLlib for classification, clustering and regression.
Hands-on Exercise – How to implement Python in a sandbox, working with the HDFS file system.
Tableau Course Content
Introduction to Data Visualization and Power of Tableau
What is data visualization, Comparison and benefits against reading raw numbers, Real usage examples from various business domains, Some quick powerful examples using Tableau without going into the technical details of Tableau, installing Tableau, Tableau interface, connecting to DataSource, Tableau Data Types, data preparation.
Architecture of Tableau
Installation of Tableau Desktop, Architecture of Tableau, Interface of Tableau (Layout, Toolbars, Data Pane, Analytics Pane etc), How to start with Tableau, Ways to share and exporting the work done in Tableau
Hands-on Exercise – Play with the tableau desktop, interface to learn its user interface, Share an existing work, Export an existing work
Working with Metadata & Data Blending
Connection to Excels, PDFs and Cubes, Managing Metadata and Extracts, Data Preparation and dealing with NULL values, Data Joins (Inner, Left, Right, Outer) and Union, Cross Database joining, Data Blending, data extraction, refresh extraction, incremental extraction, how to build extract
Hands-on Exercise – Connect to an excel sheet and import data, Use metadata and extracts, Handle NULL values, Clean up the data before the actual use, Perform various join techniques, Perform data blending from more than one sources
Creation of sets
Marks, Highlighting, Sort and Group, Working with Sets (Creation of sets, Editing sets, IN/OUT, Sets in Hierarchies), constant sets, computed Sets, bins
Hands-on Exercise – Create and edit sets using Marks, Highlight desired items, Make groups, Applying sorting on result, Make Hierarchies in the created set
Working with Filters
Filters (Addition and Removal), Filtering continuous dates, dimensions, measures, Interactive Filters, marks card, hierarchies, how to create folders in Tableau, sorting in Tableau, types of sorting, filtering in Tableau, types of filters, filtering order of operations
Hands-on Exercise – Add Filter on data set by date/dimensions/measures, Use interactive filter to views, Remove some filters to see the result
Organizing Data and Visual Analytics
Formatting Data (Labels, Annotations, Tooltips, Edit axes), Formatting Pane (Menu, Settings, Font, Alignment, Copy-Paste), Trend and Reference Lines, Forecasting, k-means Cluster Analysis in Tableau, visual analytics in Tableau, reference lines and bands, confidence interval.
Hands-on Exercise – Apply labels, annotations, tooltips to graphs, Edit the attributes of axes, Set a reference line, Do k-means cluster analysis on a dataset
Working with Mapping
Coordinate points, Plotting Longitude and Latitude, Editing Unrecognized Locations, Custom Geocoding, Polygon Maps, WMS: Web Mapping Services, Background Image (Add Image, Plot Points on Image, Generate coordinates from Image), map visualization, custom territories, Map Box, WMS Map, how we can create map projects in Tableau, how to create Dual Access Map, how to edit location.
Hands-on Exercise – Plot latitude and longitude on geo map, Edit locations on the map, Create custom geocoding, Use images of a map and plot points on it, find coordinates in the image, Create a polygon map, Use WMS
Working with Calculations & Expressions
Calculation Syntax and Functions in Tableau, Types of Calculations (Table, String, Logic, Date, Number, Aggregate), LOD Expressions (concept and syntax), Aggregation and Replication with LOD Expressions, Nested LOD Expressions, Level of Details, Fixed Level of Details, Lower Level of Details, Higher Level of Details, Quick Table Calculations, how to create Calculated Fields, predefined Calculations, how to validate.
Working with Parameters
Create Parameters, Parameters in Calculations, Using Parameters with Filters, Column Selection Parameters, Chart Selection Parameters, how to use Parameters in Filter Session, how to use parameters in Calculated Fields, how to use parameters in Reference Line.
Hands-on Exercise – Create new parameters to apply on a filter, Pass parameters to filters to selet columns, Pass parameters to filters to select charts
Charts and Graphs
Dual Axes Graphs, Histogram (Single and Dual Axes), Box Plot, Pareto Chart, Motion Chart, Funnel Chart, Waterfall Chart, Tree Map, Heat Map, Market Basket analysis, Using Show me, Types of Charts, Text Table, Heat map, Highlighted Table, Pie Chart, Tree map, Bar chart, Line Chart, Bubble Chart, Bullet chart, Scatter Chart, Dual Axis Graphs, Funnel Charts, Pareto Chart, Maps, Hands on Lab, Assignment, Funnel Chart, Waterfall Chart, Maps
Hands-on Exercise – Plot a histogram, heat map, tree map, funnel chart and others using the same data set, Do market basket analysis on a given dataset
Dashboards and Stories
Build and Format a Dashboard (Size, Views, Objects, Legends and Filters), Best Practices for Creative and Interactive Dashboards using Actions, Create Stories (Intro of Story Points, Creating and Updating Story Points, Adding Visuals in Stories, Annotations with Description), DashBoards & Stories, what is Dashboard, Filter Actions, Highlight Actions, Url Actions , Selecting & Clearing values, DashBoard Examples, Best Practices in Creating DashBoards, Tableau WorkSpace, Tableau Interface, Tableau Joins, Types of Joins, Live vs Extract Connection, Tableau Field Types, Saving and Publishing Data Source, File Types
Hands-on Exercise – Create a dashboard view, Include objects, legends and filters, Make the dashboard interactive, Create and edit a story with visual effects, annotation, description
Introduction to Tableau Prep, how Tableau Prep helps to quickly combine join, shape and clean data for analysis, create smart experiences with Tableu Prep, get deeper insights into your data with great visual experience, make data preparation simpler and accessible, integrate Tableau Prep with Tableau analytical workflow, seamless process from data preparation to analysis with Tableau Prep
Integration of Tableau with R and Hadoop
Introduction to R Language, Applications and Use Cases of R, Deploying R on Tableau Platform, Learning R functions in Tableau, Integration with Hadoop
Hands-on Exercise – Deploy R on tableau, Create a line graph using R interface, Connect tableau with Hadoop and extract data