- The paper introduces a robust autoencoder that learns nonlinear data representations to detect anomalies despite significant data corruption.
- It employs an alternating optimization approach to efficiently train the model, demonstrating improved performance on datasets like USPS, CIFAR-10, and restaurant.
- Experimental results, including high AUPRC and AUROC scores, validate the model's inductive anomaly detection capabilities and robustness in noisy environments.
Robust Deep and Inductive Anomaly Detection
Anomaly detection is critical for identifying unusual instances in datasets. The paper "Robust, Deep and Inductive Anomaly Detection" (1704.06743) introduces a novel approach using robust autoencoders to overcome the limitations of traditional methods like PCA and its robust variants. The robust autoencoder learns a nonlinear subspace that captures the majority of data points while accommodating arbitrary corruption, effectively addressing sensitivity to data perturbation and enabling inductive anomaly detection.
Traditional anomaly detection methods, including PCA, are sensitive to data perturbations, where a single extreme data point can significantly alter the projection, masking anomalies. Robust PCA (RPCA) addresses this by decomposing the data matrix but is limited to linear projections and cannot be used for inductive anomaly detection. The paper reviews several popular examples of this approach, beginning with PCA, autoencoders, robust PCA, direct robust matrix factorization, and robust kernel PCA.
Robust Autoencoders: Methodology
The paper proposes a robust autoencoder model that combines the strengths of autoencoders and RPCA. This model uses a nonlinear activation function and multiple hidden layers to learn a more complex representation of the data. The objective function is:
U,V,Nmin∥X−(f(XU)V+N)∥F2+2μ⋅(∥U∥F2+∥V∥F2)+λ⋅∥N∥1
where X is the input data, U and V are the encoder and decoder weights, f is a nonlinear activation function, N is a noise matrix capturing gross outliers, and λ and μ are tuning parameters. This formulation allows the model to learn a robust representation of the input data, even in the presence of significant noise. The model can be extended to convolutional autoencoders, making it suitable for image data.
A key advantage of the robust autoencoder is its ability to perform inductive anomaly detection. Given a new data point x∗, the model computes f(x∗TU)V and scores the point based on the reconstruction error ∥x∗−VTf(UTx∗)∥22. This allows the model to generalize to unseen data points, unlike RPCA.
Training and Implementation Details
The training process involves alternating between optimizing the autoencoder parameters and the noise matrix N. For a fixed N, the objective is equivalent to that of a standard autoencoder, which can be optimized using stochastic gradient descent methods like Adam. For fixed autoencoder parameters, the objective can be solved using a soft thresholding operator. This alternating optimization approach allows the model to be trained efficiently, even in online or streaming settings.
Experimental Evaluation
The effectiveness of the proposed Robust Convolutional Autoencoder (RCAE) was evaluated on three real-world datasets: {\tt restaurant}, {\tt usps}, and {\tt cifar-10}. The RCAE was compared against several state-of-the-art anomaly detection methods, including Truncated SVD, RPCA, RKPCA, AE, and CAE.
Anomaly Detection on the Restaurant Dataset
On the {\tt restaurant} dataset, which comprises video background modeling and activity detection, a qualitative analysis revealed that RCAE outperforms RPCA in capturing foreground objects. The most anomalous images identified by RCAE contained high foreground activity, and the background reconstructions produced by RCAE were smoother than those produced by RPCA (Figure 1).
(Figure 1)
Figure 1: Top anomalous images containing original image (people walking in the lobby) decomposed into background (lobby) and foreground (people) for the {\tt restaurant} dataset.
Anomaly Detection on the USPS Dataset
On the {\tt usps} dataset, created by mixing images of '1's and '7's, RCAE achieved an AUPRC of 0.9614, an AUROC of 0.9988, and a P@10 of 0.9108. These results demonstrate that RCAE can accurately identify anomalies in the {\tt usps} dataset (Figure 2).

Figure 2: Top anomalous images from the {\tt usps} dataset as identified by RCAE.
Anomaly Detection on the CIFAR-10 Dataset
On the {\tt cifar-10} dataset, RCAE also outperformed existing methods, with an AUPRC of 0.9934, an AUROC of 0.6255, and a P@10 of 0.8716. The most anomalous images identified by RCAE were primarily cats, demonstrating its ability to distinguish cats from dogs effectively (Figure 3).
(Figure 3)
Figure 3: Top anomalous images from the {\tt cifar-10} dataset as identified by RCAE.
Inductive Anomaly Detection
The ability of RCAE to perform inductive anomaly detection was evaluated by training the model on 5000 dog images and testing it on a separate dataset of 500 dogs and 50 cats. RCAE outperformed SVD and AE baselines in this task, demonstrating its ability to generalize to unseen data (Figure 4).
Image Denoising
The model's ability to denoise images was tested by training all models on a set of 5000 images of dogs from {\tt cifar-10} with salt-and-pepper noise added at a rate of 10%. RCAE effectively suppressed the noise, as evident from the low error. The improvement over raw CAE was modest but suggests that there is benefit to explicitly accounting for noise.
(Figure 4)
Figure 4: Mean square error boxplots for various models on the image denoising task using the {\tt cifar-10} dataset.
Implications and Future Directions
The robust autoencoder model presented in this paper offers a significant advancement in anomaly detection, combining robustness, nonlinearity, and inductiveness. This approach is particularly valuable for applications involving complex, high-dimensional data where traditional methods may struggle.
Future research directions include extending deep autoencoders for outlier description to explain why a data point is anomalous. Additionally, exploring fast approximations of kernel methods could improve the scalability of robust kernel PCA, providing a competitive alternative to autoencoder-based methods.