Suppose you are given information on a large group of people and the social relationships between them. You would like to split them into relatively cohesive groups, such that most pairs of friends lie within groups rather than between them. Such cluster analysis problems are remarkably widespread, arising in subjects ranging from signal detection to controlling the spread of disease. How can one systematically find “good” groupings for this sort of data?
One idea is to try different splits until one of them “works”, but this is like looking for the proverbial needle in a haystack: there are just too many possibilities for exhaustive search to be efficient, even for relatively small groups of people. Is there a principled way to speed up this search that can be proven to work? Or is one stuck with a hard and hopeless problem? As it turns out, a wide variety of insights are available from optimization, combinatorics, high-dimensional geometry, computational complexity and statistical mechanics to address this problem when the data is assumed to be random. There is a vast and growing array of exciting problems at the interface of the above areas with the theory of stochastic processes.