Emergent Mind

The Corpus Replication Task

(1806.07978)

Published Jun 20, 2018 in cs.LG , cs.CL , and stat.ML

Abstract

In the field of NLP, we revisit the well-known word embedding algorithm word2vec. Word embeddings identify words by vectors such that the words' distributional similarity is captured. Unexpectedly, besides semantic similarity even relational similarity has been shown to be captured in word embeddings generated by word2vec, whence two questions arise. Firstly, which kind of relations are representable in continuous space and secondly, how are relations built. In order to tackle these questions we propose a bottom-up point of view. We call generating input text for which word2vec outputs target relations solving the Corpus Replication Task. Deeming generalizations of this approach to any set of relations possible, we expect solving of the Corpus Replication Task to provide partial answers to the questions.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.