Emergent Mind

Machine Learning and Cloud Computing: Survey of Distributed and SaaS Solutions

(1603.08767)
Published Mar 29, 2016 in cs.DC and cs.LG

Abstract

Applying popular machine learning algorithms to large amounts of data raised new challenges for the ML practitioners. Traditional ML libraries does not support well processing of huge datasets, so that new approaches were needed. Parallelization using modern parallel computing frameworks, such as MapReduce, CUDA, or Dryad gained in popularity and acceptance, resulting in new ML libraries developed on top of these frameworks. We will briefly introduce the most prominent industrial and academic outcomes, such as Apache Mahout, GraphLab or Jubatus. We will investigate how cloud computing paradigm impacted the field of ML. First direction is of popular statistics tools and libraries (R system, Python) deployed in the cloud. A second line of products is augmenting existing tools with plugins that allow users to create a Hadoop cluster in the cloud and run jobs on it. Next on the list are libraries of distributed implementations for ML algorithms, and on-premise deployments of complex systems for data analytics and data mining. Last approach on the radar of this survey is ML as Software-as-a-Service, several BigData start-ups (and large companies as well) already opening their solutions to the market.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.