Disentangled Conditional Variational Autoencoder for Unsupervised Anomaly Detection
PaperID: BigData693 (Session - D2S4: Big Data Science and Foundations II), 2024 IEEE International Conference on Big Data (IEEE BigData 2024), Washington, DC, USA
December 15 - 18, 2024
Asif Ahmed Neloy
Asif Ahmed Neloy
Faculty Member, Douglas College
Adjunct Faculty, University of British Columbia (UBC)
Canada.
neloya@douglascollege.ca
Dr. Maxime Turgeon
Dr. Maxime Turgeon
Data Scientist, Tesera Systems
Adjunct Professor, Department of Statistics, University of Manitoba
Canada.
max.turgeon@umanitoba.ca
Douglas College University of British Columbia VADA Program University of Manitoba

Unsupervised Anomaly Detection using Auto-encoders

  • Why Autoencoders Work Well:
    • Learn compressed representations, making it easier to identify deviations.
    • Capture complex patterns in data without needing labeled examples.
    • Ideal for high-dimensional data like images or sensor data.

  • How Autoencoders Detect Anomalies:
    • Reconstruct input data and flag high reconstruction errors as anomalies.
    • Can adapt to various domains, identifying subtle anomalies in healthcare, cybersecurity, and more.
VAE Diagram

Unsupervised Anomaly Detection using Auto-encoders


  • Autoencoders in Anomaly Detection:

    • Why Autoencoders? Autoencoders efficiently capture the core structure in data, making them effective for anomaly detection.

    • Low Reconstruction Error: Normal data points have low reconstruction error, fitting closely to learned patterns.

    • Continuously learn from new data, ideal for dynamic, real-time environments.

Recent Advancements in VAE Architectures

  • Feature Disentangling with Increment Strategy
    • H. Li et al.: Introduced a Feature Disentangling Autoencoder for targeted anomaly detection in reactor core temperatures.
  • ELBO Convergence to Sum of Entropies
    • S. Damm et al.: Improved VAE optimization stability by demonstrating ELBO decompositions convergence to entropy sum.
  • Latent Space Partitioning via Mutual Information
    • M. Vera et al.: Used mutual information regularization for informative latent feature learning in VAEs.
  • Metrics for Measuring Disentanglement
    • M. A. Carbonneau et al.: Reviewed metrics for disentanglement, improving feature separation in latent space.
  • Total Correlation (TC) Loss for Reducing Redundancy
    • A. Staffini et al.: Utilized TC loss in a VAE-BiLSTM model, reducing latent dimension redundancy and enhancing anomaly detection for continuous data.

Shortcomings in Auto-encoder Architectures

  • Loss of Essential Information
    • Difficulty balancing disentanglement with retaining crucial details.
    • Results in the omission of key features needed for anomaly detection.
  • Trade-offs with Disentanglement
    • Focus on independent latent factors often reduces reconstruction quality.
    • Lower accuracy in identifying subtle anomalies.
  • Sensitivity to Total Correlation (TC) Loss
    • While reducing dependencies among latent features, high TC penalization weakens reconstruction.
  • Challenges in Real-World Applications
    • Difficulty in capturing complex variations within real-world datasets.
    • Subtle anomalies may be misclassified or overlooked.
  • Disentanglement: Learn disentangled latent features for more robust anomaly detection.
  • Minimize TC Loss: Reduce redundancy among latent features while maintaining reconstruction quality.
  • Improve Reconstruction: Enhance reconstruction accuracy to better capture subtle anomalies.

Architecture of dCVAE

Disentanglement Learning

  • β-VAE: Enhanced from β-VAE to better isolate independent features.
  • Creates distinct latent representations, aiding precise anomaly detection.

Information Theory Principles

  • Total Correlation (TC) Loss: Minimizes redundancy in latent dimensions.
  • Improves feature diversity for clearer anomaly identification.

Conditioning with CVAE

  • Utilizes Conditional VAE to refine latent space targeting specific anomalies.
  • Adapts model to complex datasets by conditioning variables.

Key Problems dCVAE Solves

  • ✅ Capture Independent Features: Latent variables hold distinct information for enhanced anomaly detection.
  • ✅ Reduce Latent Space Redundancy: Minimizes redundancy, enhancing feature focus.
  • ✅ High-Precision Anomaly Detection: High reconstruction quality distinguishes normal from anomalous data.

Formulation of dCVAE - Part 1

Step 1: Latent Variable Disentanglement (β-VAE)

  • β-VAE: Introduces the β-weighted KL-divergence to enforce disentanglement.
    • Isolates independent latent factors, enhancing the model’s capacity to capture meaningful structure in anomaly detection.
    • Improves feature disentanglement, which helps dCVAE to produce clearer and more distinct anomaly representations.
				LβVAE(θ, φ) = Eqφ(z | x) [log pθ(x | z)] - β DKL(qφ(z | x) || p(z))
					

Step 2: Mutual Information via CorEx

  • Enhanced Disentanglement: CorEx minimizes total correlation in latent variables, isolating independent features for anomaly detection.
  • Informativeness Maximization: CorEx maximizes the informativeness of latent representations, preserving essential details while reducing noise.
				L(θ; x) = Σi=1d Iθ(xi: z) - Σi=1m Iθ(zi: x) ≥ Σi=1d H(xi) + Epθ(x, z)(log qφ(x | z)) - DKL(pθ(z | x) || rα(z))
					

Formulation of dCVAE - Part 2

Step 3: Conditional Disentanglement via CVAE

  • Uses class labels (c) to enhance disentanglement, ensuring informative latent variables.
  • Optimizes with Conditional Mutual Information to model known sources of data variation.
				L(θ; x, c) = TCθ(x | c) − TCθ(x | z, c) − TCθ(z | c)
					

Step 4: Objective Function of dCVAE

  • Combines disentanglement, mutual information, and conditional variational constraints.
  • Maximizes latent variable informativeness while minimizing redundancy for high-quality reconstruction.
				L(θ; x, c) ≥ Σi=1n H(xi | c) + E(log qφ(x | z, c)) - Σi=1m β DKL(p(zi | x, c) || r(zi | c))
					
  • Σi=1n H(xi | c): Represents the conditional entropy, capturing variability in x given class label c.
  • E(log qφ(x | z, c)): Expected log-likelihood term ensuring high-quality reconstruction by the decoder.
  • Σi=1m β DKL(p(zi | x, c) || r(zi | c)): KL-divergence term with weight β, regularizing latent space by reducing redundancy and improving disentanglement.

Experiments

Models

  • VAE
  • CVAE
  • β-VAE
  • Factor-VAE
  • RFVAE

Datasets

  • MNIST
  • Fashion-MNIST (FMNIST)
  • KMNIST
  • EMNIST

Experiment Metrics

  • Downstream Tasks: Reconstructions, Manifold Learning, Latent Representation Analysis

  • Classification Tasks: Anomaly Detection, ROC-AUC, Accuracy Evaluation

  • Used Anomaly Score (𝒜) and Reconstruction Error () to assess model performance in anomaly identification.

Results - Reconstruction (EMNIST and KMNIST)

EMNIST Image 1
= -235, 𝒜 = 0.97
EMNIST Image 2
= -248, 𝒜 = 0.90
EMNIST Image 3
= -245, 𝒜 = 0.91
EMNIST Image 4
= -246, 𝒜 = 0.91
EMNIST Image 5
= -244, 𝒜 = 0.92
EMNIST Image 6
= -246, 𝒜 = 0.92
KMNIST Image 1
= -171, 𝒜 = 0.33
KMNIST Image 2
= -199, 𝒜 = 0.48
KMNIST Image 3
= -180, 𝒜 = 0.53
KMNIST Image 4
= -185, 𝒜 = 0.52
KMNIST Image 5
= -195, 𝒜 = 0.57
KMNIST Image 6
= -190, 𝒜 = 0.55
  • Key Observations:
    • dCVAE and FactorVAE models demonstrate low and high 𝒜 scores, indicating efficient anomaly detection.
    • Other models like VAE and CVAE show inconsistent and 𝒜 scores, suggesting limitations in accurately reconstructing anomalous samples.

Results - Latent Representations

beta-VAE

beta-VAE EMNIST

CVAE

CVAE EMNIST

dCVAE

dCVAE EMNIST

FactorVAE

FactorVAE EMNIST

RFVAE

RFVAE EMNIST

beta-VAE

beta-VAE KMNIST

CVAE

CVAE KMNIST

dCVAE

dCVAE KMNIST

FactorVAE

FactorVAE KMNIST

RFVAE

RFVAE KMNIST
  • Key Observations:
    • dCVAE achieves the most distinct and well-separated clusters across classes in both datasets, showing effective disentanglement.
    • FactorVAE also provides meaningful latent separations but is less consistent compared to dCVAE.
    • beta-VAE and CVAE display less organized latent representations, suggesting challenges in class separation.
    • RFVAE shows dispersed clusters with limited class separation, indicating weaker disentanglement capabilities.

Results - Manifold Embeddings

VAE

VAE EMNIST

beta-VAE

beta-VAE EMNIST

CVAE

CVAE EMNIST

dCVAE

dCVAE EMNIST

FactorVAE

FactorVAE EMNIST

RFVAE

RFVAE EMNIST

VAE

VAE KMNIST

beta-VAE

beta-VAE KMNIST

CVAE

CVAE KMNIST

dCVAE

dCVAE KMNIST

FactorVAE

FactorVAE KMNIST

RFVAE

RFVAE KMNIST
  • Key Observations:
    • dCVAE shows the clearest and most consistent latent manifold structures, providing distinct class clusters.
    • FactorVAE maintains class separations but displays some overlapping areas in both datasets.
    • VAE and beta-VAE reveal less structured manifolds, with less pronounced separations between classes.
    • RFVAE demonstrates weaker structure, with dispersed embeddings that do not distinctly cluster by class.

Training Time and AUC Analysis

Model Training Time and AUC Scores
Model EMNIST KMNIST
AUC Training Time (min) AUC Training Time (min)
dCVAE 78.98 102 61.02 95
VAE 67.23 92 51.13 78
CVAE 66.01 117 42.35 104
FactorVAE 62.91 138 49.23 117
β-VAE 65.12 123 50.01 119
RFVAE 55.03 130 49.51 132

Training Time & AUC Analysis

  • dCVAE achieves the highest AUC scores on both EMNIST and KMNIST datasets.
  • FactorVAE has the longest training time (138 minutes on EMNIST and 117 minutes on KMNIST), indicating a more computationally demanding process.
  • RFVAE shows high training times, especially on KMNIST (132 minutes).
  • VAE and β-VAE offer moderate AUC scores and training times, but do not match the performance efficiency seen in dCVAE.
  • CVAE achieves the shortest training time on KMNIST, reflecting a trade-off between training efficiency and detection accuracy.

Final Takeaways

  • dCVAE and FactorVAE demonstrate superior performance in anomaly detection, particularly with high AUC scores and distinct latent separations, making them ideal for nuanced anomaly analysis.

  • Trade-off between Training Time and Accuracy: Models like FactorVAE show higher training time but consistent latent clustering, whereas CVAE achieves faster training with some compromise on AUC scores.

  • RFVAE Limitations: While computationally intensive, RFVAE struggles to provide well-separated latent representations, indicating limited effectiveness in anomaly detection applications.

  • Practical Implications: dCVAE is recommended for applications prioritizing accuracy and interpretability, while VAE-based models are more suited for less resource-intensive environments with moderate accuracy needs.

Conclusion


  • dCVAE enhances unsupervised anomaly detection by leveraging disentangled learning, TC loss, and optimized reconstruction quality.

  • The model’s architecture supports diverse applications in controlled image synthesis, molecular design, bio-signal separation, and conditional text generation.

  • Future work will explore reducing posterior-prior distribution gaps and refining loss trade-offs to further improve dCVAE’s robustness and performance.

Acknowledgments


We would like to acknowledge support from the NSERC CREATE grant on the Visual and Automated Disease Analytics program.

MT also acknowledges funding via a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (NSERC), RGPIN-2021-04073.

AAN acknowledges Supplemental Professional Development (SPD) from Douglas College.

Thank You!