Running a k-Means Cluster Analysis in Python, pt. 1 - K-Means Cluster Analysis | Coursera

Running a k-Means Cluster Analysis in Python, pt. 1

Video placeholder

Loading...

Wesleyan University

Machine Learning for Data Analysis

Wesleyan University

4.2 (322 ratings)

|

45K Students Enrolled

Course 4 of 5 in the Data Analysis and Interpretation Specialization

Enroll for Free

Are you interested in predicting future outcomes using your data? This course helps you do just that! Machine learning is the process of developing, testing, and applying predictive algorithms to achieve this goal. Make sure to familiarize yourself with course 3 of this specialization before diving into these machine learning concepts. Building on Course 3, which introduces students to integral supervised machine learning concepts, this course will provide an overview of many additional concepts, techniques, and algorithms in machine learning, from basic classification to decision trees and clustering. By completing this course, you will learn how to apply, test, and interpret machine learning algorithms as alternative methods for addressing your research questions.

Skills You'll Learn

Data Analysis, Python Programming, Machine Learning, Exploratory Data Analysis

Reviews

4.2 (322 ratings)

5 stars
56.83%
4 stars
25.46%
3 stars
7.76%
2 stars
4.03%
1 star
5.90%

EM

Jun 25, 2016

Good introduction with python example for famous algorithm such as random forest and k-mean

DB

Jan 24, 2018

There is some problems because of changes both in SAS and Python after creating the course

From the lesson

K-Means Cluster Analysis

Cluster analysis is an unsupervised machine learning method that partitions the observations in a data set into a smaller set of clusters where each observation belongs to only one cluster. The goal of cluster analysis is to group, or cluster, observations into subsets based on their similarity of responses on multiple variables. Clustering variables should be primarily quantitative variables, but binary variables may also be included. In this session, we will show you how to use k-means cluster analysis to identify clusters of observations in your data set. You will gain experience in interpreting cluster analysis results by using graphing methods to help you determine the number of clusters to interpret, and examining clustering variable means to evaluate the cluster profiles. Finally, you will get the opportunity to validate your cluster solution by examining differences between clusters on a variable not included in your cluster analysis. You can use the same variables that you have used in past weeks as clustering variables. If most or all of your previous explanatory variables are categorical, you should identify some additional quantitative clustering variables from your data set. Ideally, most of your clustering variables will be quantitative, although you may also include some binary variables. In addition, you will need to identify a quantitative or binary response variable from your data set that you will not include in your cluster analysis. You will use this variable to validate your clusters by evaluating whether your clusters differ significantly on this response variable using statistical methods, such as analysis of variance or chi-square analysis, which you learned about in Course 2 of the specialization (Data Analysis Tools). Note also that if you are working with a relatively small data set, you do not need to split your data into training and test data sets.

Running a k-Means Cluster Analysis in Python, pt. 18:14

Running a k-Means Cluster Analysis in Python, pt. 210:01

Taught By

Jen Rose
Research Professor
Lisa Dierker
Professor

Try the Course for Free

Explore our Catalog

Join for free and get personalized recommendations, updates and offers.