## Coursera Week 1

How do biologists visualize gene expression matrices?

Why do we use the logarithms of expression values rather than the expression values themselves?

Why are we interested in analyzing genes whose expression significantly decreases during the course of an experiment?

How do we solve the k-center clustering problem for k = 1?

Can I modify FarthestFirstTraversal to solve the k-Means Clustering Problem?

Are there measures in addition to the squared error distortion for evaluating clustering quality?

If outliers present so many challenges for clustering, why don’t we simply remove outliers before running clustering algorithms?

How do biologists select the value of k in k-means clustering?

What is the running time of the Lloyd algorithm?

Can the Lloyd algorithm for k-means clustering start from k centers and end up with fewer than k centers?

Is it possible that two different clusters during the course of the Lloyd algorithm will have the same center of gravity?

Isn’t k-means++Initializer rather slow? And why is it better at initializing data points than FarthestFirstTraversal?

How many partitions of a set of points into k clusters are there?

Exercise Break: Find a formula for {n, 2} in terms of n.

## Coursera Week 2

If HiddenVector consists of all zeroes, the formula for computing θA in the section “From Coin Flipping to k-Means Clustering” does not work because we have to divide by 0. What should we do?

###
We saw that the Lloyd algorithm does not necessarily converge to an optimal solution to the k-Means Clustering Problem. Does the soft k-means clustering algorithm converge to an optimal solution?

Why is the soft clustering algorithm called "Expectation Maximization"?

Can we use k-means++Initializer for soft k-means clustering?

How do we determine an appropriate stiffness parameter?

What is the stopping rule for the EM algorithm?

How do we decide which horizontal line passing through the hierarchical clustering tree results in the best clustering?

In contrast to hierarchical clustering, the Lloyd algorithm is run for a fixed number of clusters k, and it is not clear how to select k in advance. Why would we ever select the Lloyd algorithm over hierarchical clustering?

How does scaling the dataset affect the result of clustering?

