Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 126 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Sketching, Embedding, and Dimensionality Reduction for Information Spaces (1503.05225v1)

Published 17 Mar 2015 in cs.DS, cs.CG, cs.IT, and math.IT

Abstract: Information distances like the Hellinger distance and the Jensen-Shannon divergence have deep roots in information theory and machine learning. They are used extensively in data analysis especially when the objects being compared are high dimensional empirical probability distributions built from data. However, we lack common tools needed to actually use information distances in applications efficiently and at scale with any kind of provable guarantees. We can't sketch these distances easily, or embed them in better behaved spaces, or even reduce the dimensionality of the space while maintaining the probability structure of the data. In this paper, we build these tools for information distances---both for the Hellinger distance and Jensen--Shannon divergence, as well as related measures, like the $\chi2$ divergence. We first show that they can be sketched efficiently (i.e. up to multiplicative error in sublinear space) in the aggregate streaming model. This result is exponentially stronger than known upper bounds for sketching these distances in the strict turnstile streaming model. Second, we show a finite dimensionality embedding result for the Jensen-Shannon and $\chi2$ divergences that preserves pair wise distances. Finally we prove a dimensionality reduction result for the Hellinger, Jensen--Shannon, and $\chi2$ divergences that preserves the information geometry of the distributions (specifically, by retaining the simplex structure of the space). While our second result above already implies that these divergences can be explicitly embedded in Euclidean space, retaining the simplex structure is important because it allows us to continue doing inference in the reduced space. In essence, we preserve not just the distance structure but the underlying geometry of the space.

Citations (9)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube