Machine Learning for Data Analysis

Machine Learning for Data Analysis

This course is part of Data Analysis and Interpretation Specialization

Taught in English

Some content may not be translated

Instructors: Jen Rose

44,922 already enrolled

Included with Coursera Plus

Learn more

Course

Gain insight into a topic and learn the fundamentals

4.2

(322 reviews)

95%

10 hours (approximately)

Flexible schedule

Learn at your own pace

View course modules

Skills you'll gain

Details to know

Shareable certificate

Add to your LinkedIn profile

Course

Gain insight into a topic and learn the fundamentals

4.2

(322 reviews)

95%

10 hours (approximately)

Flexible schedule

Learn at your own pace

View course modules

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

Build your subject-matter expertise

This course is part of the Data Analysis and Interpretation Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

There are 4 modules in this course

Are you interested in predicting future outcomes using your data? This course helps you do just that! Machine learning is the process of developing, testing, and applying predictive algorithms to achieve this goal. Make sure to familiarize yourself with course 3 of this specialization before diving into these machine learning concepts. Building on Course 3, which introduces students to integral supervised machine learning concepts, this course will provide an overview of many additional concepts, techniques, and algorithms in machine learning, from basic classification to decision trees and clustering. By completing this course, you will learn how to apply, test, and interpret machine learning algorithms as alternative methods for addressing your research questions.

In this session, you will learn about decision trees, a type of data mining algorithm that can select from among a large number of variables those and their interactions that are most important in predicting the target or response variable to be explained. Decision trees create segmentations or subgroups in the data, by applying a series of simple rules or criteria over and over again, which choose variable constellations that best predict the target variable.

What's included

7 videos15 readings1 peer review

7 videosTotal 39 minutes

What Is Machine Learning?2 minutesPreview module
Machine Learning and the Bias Variance Trade-Off6 minutes
What Is a Decision Tree?5 minutes
What is the Process of Growing a Decision Tree?4 minutes
Building a Decision Tree with SAS9 minutes
Strengths and Weaknesses of Decision Trees in SAS4 minutes
Building a Decision Tree with Python9 minutes

15 readingsTotal 150 minutes

Some Guidance for Learners New to the Specialization10 minutes
SAS or Python - Which to Choose?10 minutes
Getting Started with SAS10 minutes
Getting Started with Python10 minutes
Course Codebooks10 minutes
Course Data Sets10 minutes
Uploading Your Own Data to SAS10 minutes
Data Set for Decision Tree Videos (tree_addhealth.csv)10 minutes
SAS Code: Decision Trees10 minutes
CART Paper - Prevention Science10 minutes
Python Code: Decision Trees10 minutes
Installing Graphviz and pydotplus10 minutes
Getting Set up for Assignments10 minutes
Tumblr Instructions10 minutes
Assignment Example10 minutes

1 peer reviewTotal 60 minutes

Running a Classification Tree60 minutes

In this session, you will learn about random forests, a type of data mining algorithm that can select from among a large number of variables those that are most important in determining the target or response variable to be explained. Unlike decision trees, the results of random forests generalize well to new data.

What's included

4 videos4 readings1 peer review

4 videosTotal 25 minutes

What Is A Random Forest and How Is It "Grown"?4 minutesPreview module
Building a Random Forest with SAS7 minutes
Building a Random Forest with Python6 minutes
Validation and Cross-Validation7 minutes

4 readingsTotal 40 minutes

SAS code: Random Forests10 minutes
The HPForest Procedure in SAS10 minutes
Python Code: Random Forests10 minutes
Assignment Example10 minutes

1 peer reviewTotal 60 minutes

Running a Random Forest60 minutes

Lasso regression analysis is a shrinkage and variable selection method for linear regression models. The goal of lasso regression is to obtain the subset of predictors that minimizes prediction error for a quantitative response variable. The lasso does this by imposing a constraint on the model parameters that causes regression coefficients for some variables to shrink toward zero. Variables with a regression coefficient equal to zero after the shrinkage process are excluded from the model. Variables with non-zero regression coefficients variables are most strongly associated with the response variable. Explanatory variables can be either quantitative, categorical or both. In this session, you will apply and interpret a lasso regression analysis. You will also develop experience using k-fold cross validation to select the best fitting model and obtain a more accurate estimate of your model’s test error rate. To test a lasso regression model, you will need to identify a quantitative response variable from your data set if you haven’t already done so, and choose a few additional quantitative and categorical predictor (i.e. explanatory) variables to develop a larger pool of predictors. Having a larger pool of predictors to test will maximize your experience with lasso regression analysis. Remember that lasso regression is a machine learning method, so your choice of additional predictors does not necessarily need to depend on a research hypothesis or theory. Take some chances, and try some new variables. The lasso regression analysis will help you determine which of your predictors are most important. Note also that if you are working with a relatively small data set, you do not need to split your data into training and test data sets. The cross-validation method you apply is designed to eliminate the need to split your data when you have a limited number of observations.

What's included

5 videos3 readings1 peer review

5 videosTotal 31 minutes

What is Lasso Regression?4 minutesPreview module
Testing a Lasso Regression with SAS10 minutes
Data Management for Lasso Regression in Python3 minutes
Testing a Lasso Regression Model in Python10 minutes
Lasso Regression Limitations2 minutes

3 readingsTotal 30 minutes

SAS Code: Lasso Regression10 minutes
Python Code: Lasso Regression10 minutes
Assignment Example10 minutes

1 peer reviewTotal 60 minutes

Running a Lasso Regression Analysis60 minutes

Cluster analysis is an unsupervised machine learning method that partitions the observations in a data set into a smaller set of clusters where each observation belongs to only one cluster. The goal of cluster analysis is to group, or cluster, observations into subsets based on their similarity of responses on multiple variables. Clustering variables should be primarily quantitative variables, but binary variables may also be included. In this session, we will show you how to use k-means cluster analysis to identify clusters of observations in your data set. You will gain experience in interpreting cluster analysis results by using graphing methods to help you determine the number of clusters to interpret, and examining clustering variable means to evaluate the cluster profiles. Finally, you will get the opportunity to validate your cluster solution by examining differences between clusters on a variable not included in your cluster analysis. You can use the same variables that you have used in past weeks as clustering variables. If most or all of your previous explanatory variables are categorical, you should identify some additional quantitative clustering variables from your data set. Ideally, most of your clustering variables will be quantitative, although you may also include some binary variables. In addition, you will need to identify a quantitative or binary response variable from your data set that you will not include in your cluster analysis. You will use this variable to validate your clusters by evaluating whether your clusters differ significantly on this response variable using statistical methods, such as analysis of variance or chi-square analysis, which you learned about in Course 2 of the specialization (Data Analysis Tools). Note also that if you are working with a relatively small data set, you do not need to split your data into training and test data sets.

What's included

6 videos3 readings1 peer review

6 videosTotal 41 minutes

What Is a k-Means Cluster Analysis?6 minutesPreview module
Running a k-Means Cluster Analysis in SAS, pt. 18 minutes
Running a k-Means Cluster Analysis in SAS, pt. 26 minutes
Running a k-Means Cluster Analysis in Python, pt. 18 minutes
Running a k-Means Cluster Analysis in Python, pt. 210 minutes
k-Means Cluster Analysis Limitations2 minutes

3 readingsTotal 30 minutes

SAS Code: k-Means Cluster Analysis10 minutes
Python Code: k-Means Cluster Analysis10 minutes
Assignment Example10 minutes

1 peer reviewTotal 60 minutes

Running a k-means Cluster Analysis60 minutes

Instructors

Instructor ratings

4.3 (17 ratings)

Jen Rose

Wesleyan University

4 Courses91,315 learners

Offered by

Wesleyan University

Recommended if you're interested in Machine Learning

Wesleyan University
Data Analysis and Interpretation
Specialization
Johns Hopkins University
Advanced Reproducibility in Cancer Informatics
Course
LearnQuest
Real-Time Big Data Access using HBase: Boosting Performance
Course
Johns Hopkins University
Introduction to Reproducibility in Cancer Informatics
Course

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

Showing 3 of 322

4.2

322 reviews

5 stars
56.83%
4 stars
25.46%
3 stars
7.76%
2 stars
4.03%
1 star
5.90%

Reviewed on Apr 26, 2020

Reviewed on Jan 5, 2018

Reviewed on Mar 2, 2016

View more reviews

New to Machine Learning? Start here.

Open new doors with Coursera Plus

Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Learn more

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Explore degrees

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Learn more

Frequently asked questions

Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:

The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.
The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

If you subscribed, you get a 7-day free trial during which you can cancel at no penalty. After that, we don’t give refunds, but you can cancel your subscription at any time. See our full refund policy.

Machine Learning for Data Analysis

Course

Skills you'll gain

Details to know

Course

See how employees at top companies are mastering in-demand skills

Build your subject-matter expertise

Earn a career certificate

There are 4 modules in this course

Decision Trees

What's included

Random Forests

What's included

Lasso Regression

What's included

K-Means Cluster Analysis

What's included

Instructors

Offered by

Recommended if you're interested in Machine Learning

Data Analysis and Interpretation

Advanced Reproducibility in Cancer Informatics

Real-Time Big Data Access using HBase: Boosting Performance

Introduction to Reproducibility in Cancer Informatics

Why people choose Coursera for their career

Learner reviews

New to Machine Learning? Start here.

Open new doors with Coursera Plus

Advance your career with an online degree

Join over 3,400 global companies that choose Coursera for Business

Frequently asked questions

When will I have access to the lectures and assignments?

What will I get if I subscribe to this Specialization?

What is the refund policy?

More questions