On Catastrophic Forgetting and Mode Collapse in Generative Adversarial Networks (1807.04015v8)

Published 11 Jul 2018 in cs.LG and stat.ML

Abstract: In this paper, we show that Generative Adversarial Networks (GANs) suffer from catastrophic forgetting even when they are trained to approximate a single target distribution. We show that GAN training is a continual learning problem in which the sequence of changing model distributions is the sequence of tasks to the discriminator. The level of mismatch between tasks in the sequence determines the level of forgetting. Catastrophic forgetting is interrelated to mode collapse and can make the training of GANs non-convergent. We investigate the landscape of the discriminator's output in different variants of GANs and find that when a GAN converges to a good equilibrium, real training datapoints are wide local maxima of the discriminator. We empirically show the relationship between the sharpness of local maxima and mode collapse and generalization in GANs. We show how catastrophic forgetting prevents the discriminator from making real datapoints local maxima, and thus causes non-convergence. Finally, we study methods for preventing catastrophic forgetting in GANs.

Citations (54)

View on Semantic Scholar

Summary

The paper views GAN training as a continual learning problem, revealing catastrophic forgetting (CF) in the discriminator as a key factor in non-convergence and mode collapse.
Empirical analysis shows CF causes unstable discriminator landscapes with sharp local maxima at real data points, which correlate with heightened mode collapse.
Certain GAN variants (like WGAN, GAN-R1) and methods (momentum, gradient penalties) mitigate CF by stabilizing the discriminator landscape, suggesting pathways for more robust GANs.

An Analysis of Catastrophic Forgetting and Mode Collapse in GANs

The paper, "Catastrophic Forgetting and Mode Collapse in GANs" by Hoang Thanh-Tung and Truyen Tran, presents an insightful explorative paper into the dynamics of Generative Adversarial Networks (GANs), underscoring the emergence of catastrophic forgetting (CF) and its intricate connection to mode collapse. Stepping away from the conventional view of GAN training as a single task approximation, the authors advocate for understanding GAN training as a continual learning problem. This perspective illuminates the intrinsic connection between the evolving discriminator tasks and CF, thereby offering a nuanced understanding of non-convergence and mode collapse in GANs.

The authors commence with the assertion that traditional GAN architectures, despite aiming to approximate a single target distribution, encounter CF—a phenomenon primarily identified in continual learning setups. CF arises when the GAN discriminator, tasked with distinguishing between real and fake samples, inadvertently forgets the intricacies of earlier discriminator tasks due to subsequent tasks. Such forgetting not only hampers convergence but also exacerbates mode collapse, characterized by a failure to generate diverse samples, hence limiting the expressive capability of GANs.

A core component of the paper involves a detailed empirical analysis of various GAN architectures and loss functions to elucidate the discrimination landscape and discriminator output behavior. The paper reveals that GANs often fail to converge to a stable state, predominantly due to the absence of local maxima at real data points in the discriminator's response surface when CF occurs. The relationship between the sharpness of these local maxima and mode collapse is stark; sharper maxima correlate with heightened mode collapse, thus impeding convergence.

Significant numerical insights include the observation that state-of-the-art GAN variants like Wasserstein GANs and GANs with gradient penalty mechanisms (e.g., GAN-R1 and GAN-0GP) exhibit improved resilience against CF. This is achieved by ensuring real data points remain as wide local maxima, thus stabilizing the discriminator landscape and enhancing sample diversity and quality.

Further, the authors provide detailed remarks on how various stabilization methods effectively mitigate CF. Techniques such as momentum-based optimizers, imbalanced loss functions favoring real samples, and zero-centered gradient penalties form a suite of strategies that collectively fortify the learning process against forgetting. These methods ensure that previous task information is retained and utilized during ongoing trainings, thus aligning the GAN training with broader continual learning principles.

In exploring theoretical underpinnings, the paper offers a rigorous analysis through a high-dimensional analogy of the Dirac GAN. By scrutinizing the monotonic and directional attributes of discriminator functions under CF, the authors illuminate how introducing memory-preserving mechanisms can significantly arrest the forgetting cycle and reinforce convergence.

The implications of this research are profound, particularly for the development of robust GAN models that are not only resilient against mode collapse but also capable of generalizing across diverse data distributions. The insights unearthed by Hoang Thanh-Tung and Truyen Tran pave the way for subsequent advancements in GAN architectures and training algorithms, suggesting future exploration into enhancing generalizability through flatter and wider local maxima in discriminators.

Overall, this paper offers an invaluable contribution to the paper of GANs by bridging insights from continual learning and providing a robust framework for understanding and mitigating catastrophic forgetting within adversarial networks. This research lays the groundwork for future explorations aimed at perfecting GAN training dynamics and optimizing performance across complex datasets.