Improved clustering and robust moment estimation via sum-of-squares
with Pravesh Kothari, Jacob Steinhardt. STOC 2018.
(merger of arxiv:1711.07465 [cs.DS] and arxiv:1711.11581 [cs.LG])
abstract
We develop efficient algorithms for clustering and robust moment estimation. Via convex sum-of-squares relaxations, our algorithms can exploit bounds on higher-order moments of the underlying distributions. In this way, our algorithms achieve substantially stronger guarantees than previous approaches.
For example, our clustering algorithm can learn mixtures of spherical Gaussians in time as long as the means have minimum separation for any , significantly improving over the minimum separation required by the previous best algorithm due to Vempala and Wang (JCSS'04).
Given corrupted samples that contain a constant fraction of adversarial outliers, our moment estimation algorithms can approximate low-degree moments of unknown distributions, achieving guarantees that match information-theoretic lower-bounds for a broad class of distributions. These algorithms allow us to enhance, in a black-box way, many previous learning algorithms based on the method of moments, e.g., for independent component analysis and latent Dirichlet allocation, such that they become resilient to adversarial outliers.
keywords
- sum-of-squares method
- machine learning
- semidefinite programming