Disentangled Conditional Variational Autoencoder for Unsupervised Anomaly Detection

PaperID: BigData693 (Session - D2S4: Big Data Science and Foundations II), 2024 IEEE International Conference on Big Data (IEEE BigData 2024), Washington, DC, USA
December 15 - 18, 2024

Asif Ahmed Neloy

Faculty Member, Douglas College
Adjunct Faculty, University of British Columbia (UBC)
Canada.
neloya@douglascollege.ca

Dr. Maxime Turgeon

Data Scientist, Tesera Systems
Adjunct Professor, Department of Statistics, University of Manitoba
Canada.
max.turgeon@umanitoba.ca

https://aaneloy.github.io/IEEE_BigData2024_dcvae/

Unsupervised Anomaly Detection using Auto-encoders

Why Autoencoders Work Well:
- Learn compressed representations, making it easier to identify deviations.
- Capture complex patterns in data without needing labeled examples.
- Ideal for high-dimensional data like images or sensor data.

How Autoencoders Detect Anomalies:
- Reconstruct input data and flag high reconstruction errors as anomalies.
- Can adapt to various domains, identifying subtle anomalies in healthcare, cybersecurity, and more.

Unsupervised Anomaly Detection using Auto-encoders

Autoencoders in Anomaly Detection:
- Why Autoencoders? Autoencoders efficiently capture the core structure in data, making them effective for anomaly detection.
- Low Reconstruction Error: Normal data points have low reconstruction error, fitting closely to learned patterns.
- Continuously learn from new data, ideal for dynamic, real-time environments.

Recent Advancements in VAE Architectures

Feature Disentangling with Increment Strategy
- H. Li et al.: Introduced a Feature Disentangling Autoencoder for targeted anomaly detection in reactor core temperatures.
ELBO Convergence to Sum of Entropies
- S. Damm et al.: Improved VAE optimization stability by demonstrating ELBO decompositions convergence to entropy sum.

Latent Space Partitioning via Mutual Information
- M. Vera et al.: Used mutual information regularization for informative latent feature learning in VAEs.
Metrics for Measuring Disentanglement
- M. A. Carbonneau et al.: Reviewed metrics for disentanglement, improving feature separation in latent space.

Total Correlation (TC) Loss for Reducing Redundancy
- A. Staffini et al.: Utilized TC loss in a VAE-BiLSTM model, reducing latent dimension redundancy and enhancing anomaly detection for continuous data.

Shortcomings in Auto-encoder Architectures

Loss of Essential Information
- Difficulty balancing disentanglement with retaining crucial details.
- Results in the omission of key features needed for anomaly detection.
Trade-offs with Disentanglement
- Focus on independent latent factors often reduces reconstruction quality.
- Lower accuracy in identifying subtle anomalies.

Sensitivity to Total Correlation (TC) Loss
- While reducing dependencies among latent features, high TC penalization weakens reconstruction.
Challenges in Real-World Applications
- Difficulty in capturing complex variations within real-world datasets.
- Subtle anomalies may be misclassified or overlooked.

❌ Disentanglement: Learn disentangled latent features for more robust anomaly detection.
❌ Minimize TC Loss: Reduce redundancy among latent features while maintaining reconstruction quality.
❌ Improve Reconstruction: Enhance reconstruction accuracy to better capture subtle anomalies.

Architecture of dCVAE

Disentanglement Learning

β-VAE: Enhanced from β-VAE to better isolate independent features.
Creates distinct latent representations, aiding precise anomaly detection.

Information Theory Principles

Total Correlation (TC) Loss: Minimizes redundancy in latent dimensions.
Improves feature diversity for clearer anomaly identification.

Conditioning with CVAE

Utilizes Conditional VAE to refine latent space targeting specific anomalies.
Adapts model to complex datasets by conditioning variables.

Key Problems dCVAE Solves

✅ Capture Independent Features: Latent variables hold distinct information for enhanced anomaly detection.
✅ Reduce Latent Space Redundancy: Minimizes redundancy, enhancing feature focus.
✅ High-Precision Anomaly Detection: High reconstruction quality distinguishes normal from anomalous data.

Formulation of dCVAE - Part 1

Step 1: Latent Variable Disentanglement (β-VAE)

β-VAE: Introduces the β-weighted KL-divergence to enforce disentanglement.

Isolates independent latent factors, enhancing the model’s capacity to capture meaningful structure in anomaly detection.
Improves feature disentanglement, which helps dCVAE to produce clearer and more distinct anomaly representations.

				L_βVAE(θ, φ) = E_{q_φ(z | x)} [log p_θ(x | z)] - β D_KL(q_φ(z | x) || p(z))

Step 2: Mutual Information via CorEx

Enhanced Disentanglement: CorEx minimizes total correlation in latent variables, isolating independent features for anomaly detection.
Informativeness Maximization: CorEx maximizes the informativeness of latent representations, preserving essential details while reducing noise.

				L(θ; x) = Σ_i=1^d I_θ(x_i: z) - Σ_i=1^m I_θ(z_i: x) ≥ Σ_i=1^d H(x_i) + E_{p_θ(x, z)}(log q_φ(x | z)) - D_KL(p_θ(z | x) || r_α(z))

Formulation of dCVAE - Part 2

Step 3: Conditional Disentanglement via CVAE

Uses class labels (c) to enhance disentanglement, ensuring informative latent variables.
Optimizes with Conditional Mutual Information to model known sources of data variation.

				L(θ; x, c) = TC_θ(x | c) − TC_θ(x | z, c) − TC_θ(z | c)

Step 4: Objective Function of dCVAE

Combines disentanglement, mutual information, and conditional variational constraints.
Maximizes latent variable informativeness while minimizing redundancy for high-quality reconstruction.

				L(θ; x, c) ≥ Σ_i=1ⁿ H(x_i | c) + E(log q_φ(x | z, c)) - Σ_i=1^m β D_KL(p(z_i | x, c) || r(z_i | c))

Σ_i=1ⁿ H(x_i | c): Represents the conditional entropy, capturing variability in x given class label c.
E(log q_φ(x | z, c)): Expected log-likelihood term ensuring high-quality reconstruction by the decoder.
Σ_i=1^m β D_KL(p(z_i | x, c) || r(z_i | c)): KL-divergence term with weight β, regularizing latent space by reducing redundancy and improving disentanglement.

Experiments

Models

VAE
CVAE
β-VAE
Factor-VAE
RFVAE

Datasets

MNIST
Fashion-MNIST (FMNIST)
KMNIST
EMNIST

Experiment Metrics

Downstream Tasks: Reconstructions, Manifold Learning, Latent Representation Analysis

Classification Tasks: Anomaly Detection, ROC-AUC, Accuracy Evaluation

Used Anomaly Score (𝒜) and Reconstruction Error (ℰ) to assess model performance in anomaly identification.

Results - Reconstruction (EMNIST and KMNIST)

ℰ = -235, 𝒜 = 0.97

ℰ = -248, 𝒜 = 0.90

ℰ = -245, 𝒜 = 0.91

ℰ = -246, 𝒜 = 0.91

ℰ = -244, 𝒜 = 0.92

ℰ = -246, 𝒜 = 0.92

ℰ = -171, 𝒜 = 0.33

ℰ = -199, 𝒜 = 0.48

ℰ = -180, 𝒜 = 0.53

ℰ = -185, 𝒜 = 0.52

ℰ = -195, 𝒜 = 0.57

ℰ = -190, 𝒜 = 0.55

Key Observations:
- dCVAE and FactorVAE models demonstrate low ℰ and high 𝒜 scores, indicating efficient anomaly detection.
- Other models like VAE and CVAE show inconsistent ℰ and 𝒜 scores, suggesting limitations in accurately reconstructing anomalous samples.

Results - Latent Representations

beta-VAE

CVAE

dCVAE

FactorVAE

RFVAE

beta-VAE

CVAE

dCVAE

FactorVAE

RFVAE

Key Observations:
- dCVAE achieves the most distinct and well-separated clusters across classes in both datasets, showing effective disentanglement.
- FactorVAE also provides meaningful latent separations but is less consistent compared to dCVAE.
- beta-VAE and CVAE display less organized latent representations, suggesting challenges in class separation.
- RFVAE shows dispersed clusters with limited class separation, indicating weaker disentanglement capabilities.

Results - Manifold Embeddings

VAE

beta-VAE

CVAE

dCVAE

FactorVAE

RFVAE

VAE

beta-VAE

CVAE

dCVAE

FactorVAE

RFVAE

Key Observations:
- dCVAE shows the clearest and most consistent latent manifold structures, providing distinct class clusters.
- FactorVAE maintains class separations but displays some overlapping areas in both datasets.
- VAE and beta-VAE reveal less structured manifolds, with less pronounced separations between classes.
- RFVAE demonstrates weaker structure, with dispersed embeddings that do not distinctly cluster by class.

Training Time and AUC Analysis

Model Training Time and AUC Scores
Model	EMNIST		KMNIST
	AUC	Training Time (min)	AUC	Training Time (min)
dCVAE	78.98	102	61.02	95
VAE	67.23	92	51.13	78
CVAE	66.01	117	42.35	104
FactorVAE	62.91	138	49.23	117
β-VAE	65.12	123	50.01	119
RFVAE	55.03	130	49.51	132

Training Time & AUC Analysis

dCVAE achieves the highest AUC scores on both EMNIST and KMNIST datasets.
FactorVAE has the longest training time (138 minutes on EMNIST and 117 minutes on KMNIST), indicating a more computationally demanding process.
RFVAE shows high training times, especially on KMNIST (132 minutes).
VAE and β-VAE offer moderate AUC scores and training times, but do not match the performance efficiency seen in dCVAE.
CVAE achieves the shortest training time on KMNIST, reflecting a trade-off between training efficiency and detection accuracy.

Final Takeaways

dCVAE and FactorVAE demonstrate superior performance in anomaly detection, particularly with high AUC scores and distinct latent separations, making them ideal for nuanced anomaly analysis.

Trade-off between Training Time and Accuracy: Models like FactorVAE show higher training time but consistent latent clustering, whereas CVAE achieves faster training with some compromise on AUC scores.

RFVAE Limitations: While computationally intensive, RFVAE struggles to provide well-separated latent representations, indicating limited effectiveness in anomaly detection applications.

Practical Implications: dCVAE is recommended for applications prioritizing accuracy and interpretability, while VAE-based models are more suited for less resource-intensive environments with moderate accuracy needs.

Conclusion

dCVAE enhances unsupervised anomaly detection by leveraging disentangled learning, TC loss, and optimized reconstruction quality.

The model’s architecture supports diverse applications in controlled image synthesis, molecular design, bio-signal separation, and conditional text generation.

Future work will explore reducing posterior-prior distribution gaps and refining loss trade-offs to further improve dCVAE’s robustness and performance.

Acknowledgments

We would like to acknowledge support from the NSERC CREATE grant on the Visual and Automated Disease Analytics program.

MT also acknowledges funding via a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (NSERC), RGPIN-2021-04073.

AAN acknowledges Supplemental Professional Development (SPD) from Douglas College.

Unsupervised Anomaly Detection using Auto-encoders

Unsupervised Anomaly Detection using Auto-encoders

Recent Advancements in VAE Architectures

Shortcomings in Auto-encoder Architectures

Architecture of dCVAE

Disentanglement Learning

Information Theory Principles

Conditioning with CVAE

Key Problems dCVAE Solves

Formulation of dCVAE - Part 1

Step 1: Latent Variable Disentanglement (β-VAE)

Step 2: Mutual Information via CorEx

Formulation of dCVAE - Part 2

Step 3: Conditional Disentanglement via CVAE

Step 4: Objective Function of dCVAE

Experiments

Models

Datasets

Experiment Metrics

Results - Reconstruction (EMNIST and KMNIST)

Results - Latent Representations

beta-VAE

CVAE

dCVAE

FactorVAE

RFVAE

beta-VAE

CVAE

dCVAE

FactorVAE

RFVAE

Results - Manifold Embeddings

VAE

beta-VAE

CVAE

dCVAE

FactorVAE

RFVAE

VAE

beta-VAE

CVAE

dCVAE

FactorVAE

RFVAE

Training Time and AUC Analysis

Training Time & AUC Analysis

Final Takeaways

Conclusion

Acknowledgments

Thank You!