So, that method can be categorized into bottom-up methods,
top-down high-dimensional clustering method, or correlation-based methods.
We will also introduce some interesting method, a delta clustering.
Then, for high-dimensional clustering,
there are many methods developed for dimensionality reduction.
Essentially is we can think the height dimension
is in a vertical form there, containing lots of columns.
And for those columns, when we perform clustering,
essentially the columns are clustered, the rows or
columns can be clustered together, we call co-clustering.
There are several typical methods.
One is probabilistic latent semantic indexing.
Later, people develop another method called Latent Dirichlet Allocation.
So PLSI or LDA, are typical topic modeling methods for text data.
Essentially, we can think the text can be clustered into multiple topics.
Each topic is a cluster, and each topic
is associated with a set of words, or we can think of they are dimensions.
And also a set of documents, you can think they are rows, simultaneously.
And the second popular study method's called
nonnegative matrix factorization, NMF.
This is a kind of co-clustering method you can think as,
you'll get a nonnegative matrix because the word,
a word frequencies in documents are nonnegative, okay?
They are zero or more, but they are nonnegative values in the matrix.
For this nonnegative matrix, we can approximately
factorize it into two nonnegative lower ranked matrices,
type U and V, that will reduce the number of dimensions.
Another very interesting method we're going to study
is called spectral clustering.
That means we use a spectrum of the similarity matrix of the data,
to perform dimensionality reduction.
The higher dimension reduced into the lower,
fewer dimensions, then we can perform clustering in fewer dimensions.
That's spectral clustering methods.