Emergent Mind

Automated languages phylogeny from Levenshtein distance

(0911.3280)
Published Nov 17, 2009 in cs.CL , q-bio.PE , and q-bio.QM

Abstract

Languages evolve over time in a process in which reproduction, mutation and extinction are all possible, similar to what happens to living organisms. Using this similarity it is possible, in principle, to build family trees which show the degree of relatedness between languages. The method used by modern glottochronology, developed by Swadesh in the 1950s, measures distances from the percentage of words with a common historical origin. The weak point of this method is that subjective judgment plays a relevant role. Recently we proposed an automated method that avoids the subjectivity, whose results can be replicated by studies that use the same database and that doesn't require a specific linguistic knowledge. Moreover, the method allows a quick comparison of a large number of languages. We applied our method to the Indo-European and Austronesian families, considering in both cases, fifty different languages. The resulting trees are similar to those of previous studies, but with some important differences in the position of few languages and subgroups. We believe that these differences carry new information on the structure of the tree and on the phylogenetic relationships within families.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.