Positive-Unlabeled Learning with Non-Negative Risk Estimator (1703.00593v2)

Published 2 Mar 2017 in cs.LG and stat.ML

Abstract: From only positive (P) and unlabeled (U) data, a binary classifier could be trained with PU learning, in which the state of the art is unbiased PU learning. However, if its model is very flexible, empirical risks on training data will go negative, and we will suffer from serious overfitting. In this paper, we propose a non-negative risk estimator for PU learning: when getting minimized, it is more robust against overfitting, and thus we are able to use very flexible models (such as deep neural networks) given limited P data. Moreover, we analyze the bias, consistency, and mean-squared-error reduction of the proposed risk estimator, and bound the estimation error of the resulting empirical risk minimizer. Experiments demonstrate that our risk estimator fixes the overfitting problem of its unbiased counterparts.

Authors (4)

Ryuichi Kiryo (3 papers)
Gang Niu (125 papers)
Marthinus C. du Plessis (2 papers)
Masashi Sugiyama (286 papers)

Citations (445)

View on Semantic Scholar

Summary

The paper's main contribution is introducing a capped non-negative risk estimator that prevents negative empirical risks in PU learning.
The method reduces overfitting by integrating theoretical analyses and error bounds, ensuring robust performance even with deep models.
Its scalable algorithm employing stochastic optimization enables efficient training on large datasets while handling misspecified class priors.

Positive-Unlabeled Learning with a Non-Negative Risk Estimator

Positive-Unlabeled (PU) learning has grown as a significant approach in scenarios where reliable negative data is scarce or unavailable. This paper by Kiryo et al. addresses a fundamental challenge within PU learning: the propensity for empirical risks to become negative, particularly when leveraging highly flexible models such as deep neural networks. Although unbiased risk estimators represent a milestone in the field, they are prone to produce negative empirical risks, which in turn exacerbates overfitting. The paper introduces a novel non-negative risk estimator designed to overcome these challenges.

Core Contributions

The proposed non-negative risk estimator enhances existing methods by ensuring the empirical risk remains non-negative. This is achieved by modifying the risk estimation formula such that if the risk becomes negative during the minimization process, it is capped at zero. Theoretical analyses demonstrate that this capped estimator minimizes overfitting while being statistically efficient.

Key contributions include:

Robustness Against Overfitting: The non-negative risk estimator allows for the effective use of complex models with limited positive data, which is vital in real-world applications where data collection can be expensive or restricted.
Theoretical Validation: Comprehensive theoretical analysis establishes the bias, consistency, and mean-squared-error characteristics of the non-negative risk estimator. The estimation error bound is also provided, demonstrating that the proposed method maintains competitive theoretical performance compared to unbiased methods.
Scalability: A novel algorithm is introduced for large-scale PU learning which integrates stochastic optimization techniques, facilitating efficient training on substantial datasets.

Experimental Evaluation

Utilizing a series of benchmark datasets, including MNIST, epsilon, 20Newsgroups, and CIFAR-10, the paper demonstrates that the proposed estimator effectively mitigates overfitting in deep neural networks, outperforming both traditional PN learning and unbiased PU learning methods. Notably, experiments show that even with misspecified class-prior probability, the estimator maintains performance robustness, suggesting potential robustness in PU learning applications.

Implications and Future Directions

The research offers significant practical implications in various domains where labeled data, particularly negative samples, are challenging to acquire. Areas spanning from biomedical data analysis to monitoring and surveillance can benefit from this approach. The paper also opens avenues for enhancements in semi-supervised learning techniques where the proposed methodologies could be adapted and potentially combined with additional sources of learning supervision.

In summary, this paper provides an innovative solution to a prevalent problem within PU learning frameworks, ensuring scalable and robust application of sophisticated learning models to transform how incomplete labeling scenarios are approached within machine learning paradigms. This work not only advances the state of the art in PU learning but also lays a foundation for further exploration into robust learning frameworks under label constraints.