Emergent Mind

A Language and Its Dimensions: Intrinsic Dimensions of Language Fractal Structures

(2311.10217)
Published Nov 16, 2023 in cs.CL , cs.AI , math.AT , and nlin.CD

Abstract

The present paper introduces a novel object of study - a language fractal structure. We hypothesize that a set of embeddings of all $n$-grams of a natural language constitutes a representative sample of this fractal set. (We use the term Hailonakea to refer to the sum total of all language fractal structures, over all $n$). The paper estimates intrinsic (genuine) dimensions of language fractal structures for the Russian and English languages. To this end, we employ methods based on (1) topological data analysis and (2) a minimum spanning tree of a data graph for a cloud of points considered (Steele theorem). For both languages, for all $n$, the intrinsic dimensions appear to be non-integer values (typical for fractal sets), close to 9 for both of the Russian and English language.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.