Emergent Mind

Abstract

Crowdsourcing platforms offer a practical solution to the problem of affordably annotating large datasets for training supervised classifiers. Unfortunately, poor worker performance frequently threatens to compromise annotation reliability, and requesting multiple labels for every instance can lead to large cost increases without guaranteeing good results. Minimizing the required training samples using an active learning selection procedure reduces the labeling requirement but can jeopardize classifier training by focusing on erroneous annotations. This paper presents an active learning approach in which worker performance, task difficulty, and annotation reliability are jointly estimated and used to compute the risk function guiding the sample selection procedure. We demonstrate that the proposed approach, which employs active learning with Bayesian networks, significantly improves training accuracy and correctly ranks the expertise of unknown labelers in the presence of annotation noise.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.