Stochastic Optimization of Sorting Networks via Continuous Relaxations (1903.08850v2)

Published 21 Mar 2019 in stat.ML, cs.LG, and cs.NE

Abstract: Sorting input objects is an important step in many machine learning pipelines. However, the sorting operator is non-differentiable with respect to its inputs, which prohibits end-to-end gradient-based optimization. In this work, we propose NeuralSort, a general-purpose continuous relaxation of the output of the sorting operator from permutation matrices to the set of unimodal row-stochastic matrices, where every row sums to one and has a distinct arg max. This relaxation permits straight-through optimization of any computational graph involve a sorting operation. Further, we use this relaxation to enable gradient-based stochastic optimization over the combinatorially large space of permutations by deriving a reparameterized gradient estimator for the Plackett-Luce family of distributions over permutations. We demonstrate the usefulness of our framework on three tasks that require learning semantic orderings of high-dimensional objects, including a fully differentiable, parameterized extension of the k-nearest neighbors algorithm.

Citations (162)

View on Semantic Scholar

Summary

The paper introduces NeuralSort, a continuous relaxation that transforms sorting operators for effective gradient-based optimization in machine learning.
It employs a novel reparameterized gradient estimator for the Plackett-Luce distribution, reducing variance compared to REINFORCE methods.
Empirical results on kNN and large-MNIST tasks demonstrate significant improvements in sorting accuracy and optimization efficiency.

Stochastic Optimization of Sorting Networks via Continuous Relaxations

The paper "Stochastic Optimization of Sorting Networks via Continuous Relaxations" presents novel methodologies addressing the differentiation issues inherent in sorting operators within computational graphs. Sorting, a fundamental operation in machine learning workflows, poses challenges due to its non-differentiable nature with respect to input data—a problem that hinders the application of gradient-based optimization methods across end-to-end pipelines.

The authors introduce NeuralSort, a highly adaptable continuous relaxation of the sorting operator transforming permutation matrices into unimodal row-stochastic matrices. A unimodal row-stochastic matrix is characterized by positive real entries with each row summing to one and containing unique $\arg \max$ positions, facilitating seamless straight-through gradient optimization. NeuralSort maintains flexibility by adjusting a temperature parameter to control approximation precision, offering efficient projection capabilities even at non-zero temperature values.

Moreover, this paper provides a mechanism for stochastic optimization within permutation spaces by deriving a reparameterized gradient estimator for the Plackett-Luce (PL) distribution family. Sampling from this distribution incorporates Gumbel perturbations on scores, enabling sorting and subsequent relaxation for deriving gradients with respect to model parameters efficiently. This method is positioned as an improvement over vanilla REINFORCE estimators, emphasizing reduced variance and computational tractability.

Empirical evidence supports the effectiveness of NeuralSort through three tasks involving complex, high-dimensional input data, notably a differentiable extension of the $k$ -nearest neighbors (kNN) algorithm. NeuralSort notably improves sorting accuracy in image datasets, as demonstrated in sorting sequences of the large-MNIST dataset, and enhances performance in quantile regression tasks.

Implications and Future Directions

The research implications are substantial, both theoretically and practically. From a theoretical standpoint, the introduction of unimodal row-stochastic matrices as a relaxation approach opens up further inquiry into their properties and potential applications. Practically, the applicability of NeuralSort and the reparameterized PL gradient estimator significantly extends the capacity to integrate sorting-based operations within machine learning pipelines, enhancing model expressiveness and performance across tasks demanding ordered data handling.

Future work can explore further applications of NeuralSort's principles in extending classical algorithms, such as beam search, or in refining distribution proximity measures in latent variable models. Additionally, potential advancements include exploring alternative relaxations or integrating NeuralSort into broader frameworks, leveraging hardware acceleration for parallel computation to further increase efficiency in large-scale tasks.

NeuralSort exemplifies a significant step in mitigating non-differentiability limitations in computational graphs, providing researchers and practitioners with viable solutions for embedding sorting operations within stochastic computation architectures.