Emergent Mind

Parallel Knowledge Embedding with MapReduce on a Multi-core Processor

(1509.01183)
Published Sep 3, 2015 in cs.DC and cs.DB

Abstract

This article firstly attempts to explore parallel algorithms of learning distributed representations for both entities and relations in large-scale knowledge repositories with {\it MapReduce} programming model on a multi-core processor. We accelerate the training progress of a canonical knowledge embedding method, i.e. {\it translating embedding} ({\bf TransE}) model, by dividing a whole knowledge repository into several balanced subsets, and feeding each subset into an individual core where local embeddings can concurrently run updating during the {\it Map} phase. However, it usually suffers from inconsistent low-dimensional vector representations of the same key, which are collected from different {\it Map} workers, and further leads to conflicts when conducting {\it Reduce} to merge the various vectors associated with the same key. Therefore, we try several strategies to acquire the merged embeddings which may not only retain the performance of {\it entity inference}, {\it relation prediction}, and even {\it triplet classification} evaluated by the single-thread {\bf TransE} on several well-known knowledge bases such as Freebase and NELL, but also scale up the learning speed along with the number of cores within a processor. So far, the empirical studies show that we could achieve comparable results as the single-thread {\bf TransE} performs by the {\it stochastic gradient descend} (SGD) algorithm, as well as increase the training speed multiple times via adapting the {\it batch gradient descend} (BGD) algorithm for {\it MapReduce} paradigm.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.