Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 63 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 18 tok/s Pro
GPT-4o 102 tok/s Pro
Kimi K2 225 tok/s Pro
GPT OSS 120B 450 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Legal document retrieval across languages: topic hierarchies based on synsets (1911.12637v1)

Published 28 Nov 2019 in cs.IR and cs.DL

Abstract: Cross-lingual annotations of legislative texts enable us to explore major themes covered in multilingual legal data and are a key facilitator of semantic similarity when searching for similar documents. Multilingual probabilistic topic models have recently emerged as a group of semi-supervised machine learning models that can be used to perform thematic explorations on collections of texts in multiple languages. However, these approaches require theme-aligned training data to create a language-independent space, which limits the amount of scenarios where this technique can be used. In this work, we provide an unsupervised document similarity algorithm based on hierarchies of multi-lingual concepts to describe topics across languages. The algorithm does not require parallel or comparable corpora, or any other type of translation resource. Experiments performed on the English, Spanish, French and Portuguese editions of JCR-Acquis corpora reveal promising results on classifying and sorting documents by similar content.

Citations (2)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.