Informações sobre o curso
4.6
1,471 classificações
260 avaliações
Programa de cursos integrados
100% online

100% online

Comece imediatamente e aprenda em seu próprio cronograma.
Prazos flexíveis

Prazos flexíveis

Redefinir os prazos de acordo com sua programação.
Horas para completar

Aprox. 21 horas para completar

Sugerido: 6 weeks of study, 5-8 hours/week...
Idiomas disponíveis

Inglês

Legendas: Inglês...

Habilidades que você terá

Data Clustering AlgorithmsK-Means ClusteringMachine LearningK-D Tree
Programa de cursos integrados
100% online

100% online

Comece imediatamente e aprenda em seu próprio cronograma.
Prazos flexíveis

Prazos flexíveis

Redefinir os prazos de acordo com sua programação.
Horas para completar

Aprox. 21 horas para completar

Sugerido: 6 weeks of study, 5-8 hours/week...
Idiomas disponíveis

Inglês

Legendas: Inglês...

Programa - O que você aprenderá com este curso

Semana
1
Horas para completar
1 hora para concluir

Welcome

Clustering and retrieval are some of the most high-impact machine learning tools out there. Retrieval is used in almost every applications and device we interact with, like in providing a set of products related to one a shopper is currently considering, or a list of people you might want to connect with on a social media platform. Clustering can be used to aid retrieval, but is a more broadly useful tool for automatically discovering structure in data, like uncovering groups of similar patients.<p>This introduction to the course provides you with an overview of the topics we will cover and the background knowledge and resources we assume you have....
Reading
4 vídeos (Total de 25 min), 4 leituras
Video4 videos
Course overview3min
Module-by-module topics covered8min
Assumed background6min
Reading4 leituras
Important Update regarding the Machine Learning Specialization10min
Slides presented in this module10min
Software tools you'll need for this course10min
A big week ahead!10min
Semana
2
Horas para completar
4 horas para concluir

Nearest Neighbor Search

We start the course by considering a retrieval task of fetching a document similar to one someone is currently reading. We cast this problem as one of nearest neighbor search, which is a concept we have seen in the Foundations and Regression courses. However, here, you will take a deep dive into two critical components of the algorithms: the data representation and metric for measuring similarity between pairs of datapoints. You will examine the computational burden of the naive nearest neighbor search algorithm, and instead implement scalable alternatives using KD-trees for handling large datasets and locality sensitive hashing (LSH) for providing approximate nearest neighbors, even in high-dimensional spaces. You will explore all of these ideas on a Wikipedia dataset, comparing and contrasting the impact of the various choices you can make on the nearest neighbor results produced....
Reading
22 vídeos (Total de 137 min), 4 leituras, 5 testes
Video22 videos
1-NN algorithm2min
k-NN algorithm6min
Document representation5min
Distance metrics: Euclidean and scaled Euclidean6min
Writing (scaled) Euclidean distance using (weighted) inner products4min
Distance metrics: Cosine similarity9min
To normalize or not and other distance considerations6min
Complexity of brute force search1min
KD-tree representation9min
NN search with KD-trees7min
Complexity of NN search with KD-trees5min
Visualizing scaling behavior of KD-trees4min
Approximate k-NN search using KD-trees7min
Limitations of KD-trees3min
LSH as an alternative to KD-trees4min
Using random lines to partition points5min
Defining more bins3min
Searching neighboring bins8min
LSH in higher dimensions4min
(OPTIONAL) Improving efficiency through multiple tables22min
A brief recap2min
Reading4 leituras
Slides presented in this module10min
Choosing features and metrics for nearest neighbor search10min
(OPTIONAL) A worked-out example for KD-trees10min
Implementing Locality Sensitive Hashing from scratch10min
Quiz5 exercícios práticos
Representations and metrics12min
Choosing features and metrics for nearest neighbor search10min
KD-trees10min
Locality Sensitive Hashing10min
Implementing Locality Sensitive Hashing from scratch10min
Semana
3
Horas para completar
2 horas para concluir

Clustering with k-means

In clustering, our goal is to group the datapoints in our dataset into disjoint sets. Motivated by our document analysis case study, you will use clustering to discover thematic groups of articles by "topic". These topics are not provided in this unsupervised learning task; rather, the idea is to output such cluster labels that can be post-facto associated with known topics like "Science", "World News", etc. Even without such post-facto labels, you will examine how the clustering output can provide insights into the relationships between datapoints in the dataset. The first clustering algorithm you will implement is k-means, which is the most widely used clustering algorithm out there. To scale up k-means, you will learn about the general MapReduce framework for parallelizing and distributing computations, and then how the iterates of k-means can utilize this framework. You will show that k-means can provide an interpretable grouping of Wikipedia articles when appropriately tuned....
Reading
13 vídeos (Total de 79 min), 2 leituras, 3 testes
Video13 videos
An unsupervised task6min
Hope for unsupervised learning, and some challenge cases4min
The k-means algorithm7min
k-means as coordinate descent6min
Smart initialization via k-means++4min
Assessing the quality and choosing the number of clusters9min
Motivating MapReduce8min
The general MapReduce abstraction5min
MapReduce execution overview and combiners6min
MapReduce for k-means7min
Other applications of clustering7min
A brief recap1min
Reading2 leituras
Slides presented in this module10min
Clustering text data with k-means10min
Quiz3 exercícios práticos
k-means18min
Clustering text data with K-means16min
MapReduce for k-means10min
Semana
4
Horas para completar
3 horas para concluir

Mixture Models

In k-means, observations are each hard-assigned to a single cluster, and these assignments are based just on the cluster centers, rather than also incorporating shape information. In our second module on clustering, you will perform probabilistic model-based clustering that provides (1) a more descriptive notion of a "cluster" and (2) accounts for uncertainty in assignments of datapoints to clusters via "soft assignments". You will explore and implement a broadly useful algorithm called expectation maximization (EM) for inferring these soft assignments, as well as the model parameters. To gain intuition, you will first consider a visually appealing image clustering task. You will then cluster Wikipedia articles, handling the high-dimensionality of the tf-idf document representation considered....
Reading
15 vídeos (Total de 91 min), 4 leituras, 3 testes
Video15 videos
Aggregating over unknown classes in an image dataset6min
Univariate Gaussian distributions2min
Bivariate and multivariate Gaussians7min
Mixture of Gaussians6min
Interpreting the mixture of Gaussian terms5min
Scaling mixtures of Gaussians for document clustering5min
Computing soft assignments from known cluster parameters7min
(OPTIONAL) Responsibilities as Bayes' rule5min
Estimating cluster parameters from known cluster assignments6min
Estimating cluster parameters from soft assignments8min
EM iterates in equations and pictures6min
Convergence, initialization, and overfitting of EM9min
Relationship to k-means3min
A brief recap1min
Reading4 leituras
Slides presented in this module10min
(OPTIONAL) A worked-out example for EM10min
Implementing EM for Gaussian mixtures10min
Clustering text data with Gaussian mixtures10min
Quiz3 exercícios práticos
EM for Gaussian mixtures18min
Implementing EM for Gaussian mixtures12min
Clustering text data with Gaussian mixtures8min
4.6
260 avaliaçõesChevron Right
Direcionamento de carreira

32%

comecei uma nova carreira após concluir estes cursos
Benefício de carreira

83%

consegui um benefício significativo de carreira com este curso

Melhores avaliações

por JMJan 17th 2017

Excellent course, well thought out lectures and problem sets. The programming assignments offer an appropriate amount of guidance that allows the students to work through the material on their own.

por AGSep 25th 2017

Nice course with all the practical stuffs and nice analysis about each topic but practical part of LDA was restricted for GraphLab users only which is a weak fallback and rest everything is fine.

Instrutores

Avatar

Emily Fox

Amazon Professor of Machine Learning
Statistics
Avatar

Carlos Guestrin

Amazon Professor of Machine Learning
Computer Science and Engineering

Sobre University of Washington

Founded in 1861, the University of Washington is one of the oldest state-supported institutions of higher education on the West Coast and is one of the preeminent research universities in the world....

Sobre o Programa de cursos integrados Machine Learning

This Specialization from leading researchers at the University of Washington introduces you to the exciting, high-demand field of Machine Learning. Through a series of practical case studies, you will gain applied experience in major areas of Machine Learning including Prediction, Classification, Clustering, and Information Retrieval. You will learn to analyze large and complex datasets, create systems that adapt and improve over time, and build intelligent applications that can make predictions from data....
Machine Learning

Perguntas Frequentes – FAQ

  • Ao se inscrever para um Certificado, você terá acesso a todos os vídeos, testes e tarefas de programação (se aplicável). Tarefas avaliadas pelos colegas apenas podem ser enviadas e avaliadas após o início da sessão. Caso escolha explorar o curso sem adquiri-lo, talvez você não consiga acessar certas tarefas.

  • Quando você se inscreve no curso, tem acesso a todos os cursos na Especialização e pode obter um certificado quando concluir o trabalho. Seu Certificado eletrônico será adicionado à sua página de Participações e você poderá imprimi-lo ou adicioná-lo ao seu perfil no LinkedIn. Se quiser apenas ler e assistir o conteúdo do curso, você poderá frequentá-lo como ouvinte sem custo.

Mais dúvidas? Visite o Central de Ajuda ao Aprendiz.