Data Science Literacy

Paths

Data Science Literacy

Authors: Janani Ravi, Axel Sirota, Bismark Adomako

Data science is a diverse field where scientific methods, software programming, and data analytics combine to glean insights from data, communicate those insights, and empower a... Read more

What you will learn:

  • Coping skills for bad or incomplete data
  • Data shaping and munging
  • Application of basic statistics

Pre-requisites

This skill assumes the learner has basic computer skills, basic mathematics skills, and rudimentary data skills such as using spreadsheets.

Beginner

This section teaches fundamental objectives around representing, processing, and shaping data for analysis, and communicating data to others.

Representing, Processing, and Preparing Data

by Janani Ravi

Jun 19, 2019 / 2h 44m

2h 44m

Start Course
Description

Data science and data modeling are fast emerging as crucial capabilities that every enterprise and every technologist must possess these days. As the process of actually constructing models becomes democratized, the general view is shifting toward using the right data and using the data right.

In this course, Representing, Processing, and Preparing Data, you will gain the ability to correctly represent information from your domain as numeric data, and get it into a form where the full capabilities of models can be leveraged.

First, you will learn how outliers and missing data can be dealt with in a theoretically sound manner.

Next, you will discover how to use spreadsheets, programming languages and relational databases to work with your data. You will see the different types of data that you may deal with in the real world and how you can collect and integrate data to a common destination to eliminate silos.

Finally, you will round out the course by working with visualization tools that allow every member of an enterprise to work with data and extract meaningful insights.

When you are finished with this course, you will have the skills and knowledge to use the right data sources, cope with data quality issues and choose the right technologies to extract insights from your enterprise data.

Table of contents
  1. Course Overview
  2. Understanding Data Cleaning and Preparation Techniques
  3. Preparing Data for Analysis Using Spreadsheets and Python
  4. Collecting Data to Extract Insights
  5. Loading and Processing Data Using Relational Databases
  6. Representing Insights Obtained from Data

Combining and Shaping Data

by Janani Ravi

Jun 21, 2019 / 3h 27m

3h 27m

Start Course
Description

Connecting the dots between data from different sources is becoming the most sought-after skill these days for everyone ranging from business professionals to data scientists.

In this course, Combining and Shaping Data, you will gain the ability to connect the dots by pulling together data from disparate sources and shaping it so that extracting connections and relationships becomes relatively easy.

First, you will learn how the most common constructs in shaping and combining data stay the same across spreadsheets, programming languages, and databases.

Next, you will discover how to use joins and vlookups to obtain wide datasets, and then use pivots to shape that into long form. You will then see how both long and wide data can be aggregated to obtain higher level insights. You will work with Excel spreadsheets and SQL as well as Python.

Finally, you will round out the course by integrating data from a variety of sources and working with streaming data, which helps your enterprise gain real-time insights into the world around you.

When you are finished with this course, you will have the skills and knowledge to pull together data from disparate sources, including from streaming sources, to construct integrated data models that truly connect the dots.

Table of contents
  1. Course Overview
  2. Exploring Techniques to Combine and Shape Data
  3. Combining and Shaping Data Using Spreadsheets
  4. Combining and Shaping Data Using SQL
  5. Combining and Shaping Data Using Python
  6. Integrating Data from Disparate Sources into a Data Warehouse
  7. Working with Streaming Data Using a Data Warehouse

Communicating Data Insights

by Janani Ravi

Jun 21, 2019 / 2h 26m

2h 26m

Start Course
Description

Providing crisp, clear, actionable points-of-view to senior executives is becoming an increasingly important role of data scientists and data professionals these days. In this course, Communicating Data Insights you will gain the ability to summarize complex information into such clear and actionable insights. First, you will learn how to sum up the important descriptive statistics from any numeric dataset. Next, you will discover how to build and use specialized visual representations such as candlestick charts, Sankey diagrams and funnel charts in Python. You will then see how the data behind such representations can now be fed in from enterprise-wide sources such as data warehouses and ETL pipelines.

Finally, you will round out the course by working with data residing in different public cloud platforms, and even in a hybrid environment, that is with some of it on-premise and some of it on the cloud.

When you’re finished with this course, you will have the skills and knowledge to pull together data from disparate sources and use nifty visualizations to convey crisp, actionable points-of-view to a senior executive audience.

Table of contents
  1. Course Overview
  2. Communicating Insights from Statistical Data
  3. Communicating Insights from Business Data
  4. Visualizing Distributions and Relationships in Data
  5. Integrating Data in a Multi-cloud Environment
  6. Integrating Data in a Hybrid Environment

Intermediate

In this section, you will apply descriptive statistics to data, use simple statistical models like regression, and design experiments to gain more insight into your problem domain.

Interpreting Data with Statistical Models

by Axel Sirota

Sep 28, 2020 / 2h 48m

2h 48m

Start Course
Description

Data is everywhere, from the newspaper you read on the subway to the report you are using to analyze yesterday's stock market performance. In this course, Interpreting Data with Statistical Models, you will gain the ability to effectively understand how to tackle problems that appear at your work, understand which is the right statistical analysis to use, and how to interpret the results to obtain insights. First, you will learn the very basics of statistics. Next, you will discover hypothesis testing to compare variables. Finally, you will explore how to make multiple comparisons and detect functional relationships with ANOVA and Regression. When you’re finished with this course, you will have the skills and knowledge of data analysis and statistical models needed to make your data speak for itself.

Table of contents
  1. Course Overview
  2. Thinking Like a Statistician
  3. Testing a Hypothesis
  4. Comparing Categorical Values with Frequency Analysis
  5. Analyzing Experiments with ANOVA
  6. Comparing Groups and Effects with ANOVA
  7. Predicting Linear Relationships with Regression
  8. Predicting Non-linear Relationships with Regression

Experimental Design for Data Analysis

by Janani Ravi

Jun 20, 2019 / 2h 45m

2h 45m

Start Course
Description

Providing crisp, clear, actionable points-of-view to senior executives is becoming an increasingly important role of data scientists and data professionals these days. Now, a point-of-view must represent a hypothesis, ideally backed by data. In this course, Experimental Design for Data Analysis, you will gain the ability to construct such hypotheses from data and use rigorous frameworks to test whether they hold true. First, you will learn how inferential statistics and hypothesis testing form the basis of data modeling and machine learning. Next, you will discover how the process of building machine learning models is akin to that of designing an experiment and how training and validation techniques help rigorously evaluate the results of such experiments. Then, you will round out the course by studying various forms of cross-validation, including both singular and iterative techniques to cope with independent, identically distributed data and grouped data. Finally, you will also learn how you can refine your models using these techniques with hyperparameter tuning. When you’re finished with this course, you will have the skills and knowledge to build and evaluate models, specifically including machine learning models, using rigorous cross-validation frameworks and hyperparameter tuning.

Table of contents
  1. Course Overview
  2. Designing an Experiment for Data Analysis
  3. Building and Training a Machine Learning Model
  4. Understanding and Overcoming Common Problems in Data Modeling
  5. Leveraging Different Validation Strategies in Data Modeling
  6. Tuning Hyperparameters Using Cross Validation Scores

Summarizing Data and Deducing Probabilities

by Janani Ravi

Jun 20, 2019 / 2h 47m

2h 47m

Start Course
Description

Data science and data modeling are fast emerging as crucial capabilities that every enterprise and every technologist must possess these days. Increasingly, different organizations are using the same models and the same modeling tools, so what differs is how those models are applied to the data. So, it is really important that you know your data well.

In this course, Summarizing Data and Deducing Probabilities, you will gain the ability to summarize your data using univariate, bivariate, and multivariate statistics in a range of technologies.

First, you will learn how measures of mean and central tendency can be calculated in Microsoft Excel and Python. Next, you will discover how to use correlations and covariances to explore pairwise relationships. You will then see how those constructs can be generalized to multiple variables using covariance and correlation matrices.

You will understand and apply Bayes' Theorem, one of the most powerful and widely-used results in probability, to build a robust classifier.

Finally, you will use Seaborn, a visualization library, to represent statistics visually.  

When you are finished with this course, you will have the skills and knowledge to use univariate, bivariate, and multivariate descriptive statistics from Excel and Python in order to find relationships and calculate probabilities.

Table of contents
  1. Course Overview
  2. Understanding Descriptive Statistics for Data Analysis
  3. Performing Exploratory Data Analysis in Spreadsheets
  4. Summarizing Data and Deducing Probabilities Using Python
  5. Understanding and Applying Bayes' Rule
  6. Visualizing Probabilistic and Statistical Data Using Seaborn

Advanced

This part of the skill helps you apply statistical models to business problems, and to identify and mitigate factors that impact your models.

Interpreting Data with Advanced Statistical Models

by Axel Sirota

Sep 10, 2019 / 3h 9m

3h 9m

Start Course
Description

When you look at the core of machine learning, there are advanced statistical models. In this course, Interpreting Data with Advanced Statistical Models, you will gain the ability to effectively understand how to create an ML application that will be able to revolutionize the problems that appear at your work. First, you will learn the basic of Machine learning. Next, you will discover linear regression in a more general pattern, expanding to multiple and polynomial features. Continuing, you will explore how to classify with Logistic Regression, SVMs, and Bayesian methods. Finally, you will learn the intrinsic patterns of data with unsupervised techniques such as K Means and PCA. When you’re finished with this course, you will have the skills and knowledge of Machine Learning needed to apply it in a real-world application.

Table of contents
  1. Course Overview
  2. Getting Started with Machine Learning
  3. Finding Those Models
  4. Predicting Linear Relationships with Regression
  5. Understanding Regression Models in Depth
  6. The Problem of Correct Classification
  7. Large Margin and Bayesian Classification
  8. The Subtle Art of Not Needing Labels: Unsupervised Learning

Building, Training, and Validating Models in Microsoft Azure

by Bismark Adomako

Oct 5, 2020 / 1h 42m

1h 42m

Start Course
Description

Building machine learning models in Microsoft Azure can appear intimidiating. This course, Building, Training, and Validating Models in Microsoft Azure, will help you decide which model to choose and why by building a model which will try to predict if a flight would be delayed more than 15 mins with given data. First, you will go through a real world problem to see how Azure ML can solve this problem, helping you form a hypothesis on which the model performance can be judged.

Next, you will quickly get Azure ML set up and learn why you need to split data for training and testing the models.

Then, you will explore the dependent and independent variables, which independent variables should be picked, why they should be picked, as well as feature data conversion such as label encoding and feature scaling.

Finally, you will discover which models to choose and why before obtaining the score of the model which will show how we can optimize the model and re-test.

When you are finished with this course, you will be ready to put your own model into production and monitor and retrain that model when necessary.

Table of contents
  1. Course Overview
  2. Creating a Hypothesis
  3. Sourcing and Transforming Data Relevant to a Hypothesis
  4. Identifying Features from Raw Data
  5. Building the Model
  6. Monitoring and Managing the Performance of a Model