Asif Ahmed Neloy

I am actively seeking roles in academia and as a Data Scientist / Machine Learning Engineer / AI Consultant

I am a Lead Data Scientist/MLE with GM Training Arc, Vancouver, BC, where I build player-facing analytics for esports training using gameplay telemetry and video-based computer vision.

I am also an Adjunct Faculty member in the Faculty of Land and Food Systems at the University of British Columbia (UBC). I teach programming, algorithms, networking, computer vision, databases, machine learning, and analytics.

I received an MSc in Computer Science from the University of Manitoba under the supervision of Dr. Maxime Turgeon and Dr. Cüneyt Akçora. My thesis studied disentangled VAEs for unsupervised anomaly detection.

I work across end-to-end machine learning systems, from data ingestion and feature engineering to training, evaluation, deployment, and monitoring. My day-to-day tools include Python and SQL, deep learning with PyTorch, and production APIs using FastAPI with containerized workflows. I build reproducible pipelines with Airflow and PySpark, work with Snowflake and PostgreSQL for analytics and feature stores, and deliver decision-ready reporting in Power BI or Tableau. I also build and evaluate retrieval-based large language model assistants using embeddings and vector search, with careful logging, quality checks, and guardrails.

My research focuses on anomaly detection, representation learning, and probabilistic or Bayesian modeling, with an emphasis on unsupervised methods and reproducibility. I study auto-encoder families and variational formulations for high-dimensional data, and I publish practical comparisons that surface efficiency and trade-offs across model classes. My applied work spans natural language processing, computer vision, and governed analytics in population and health settings.

Earlier in my career, I held industry roles across supply planning and forecasting at Advanced Chemical Industries (ACI) Ltd, and portfolio and leasing analytics in real estate at Daris Properties Ltd. and Forum Inc.. I have also contributed to large language model evaluation work, focusing on factuality, safety, and user experience under detailed rubrics.

Email  /  CV  /  Bio  /  Google Scholar  /  Github

profile photo

Recent News

  • [August 2025] Led a week long MFRE bootcamp and workshop series on Python and R covering data access, visualization, and coding for economic analysis.
  • [April 2025] Supervised graduate students in the UBC MFRE Summer Program.
  • [November 2024] My paper titled "Disentangled Conditional Variational Autoencoder for Unsupervised Anomaly Detection" was accepted at the IEEE Big Data Conference (IEEE BigData 2024), Washington, D.C., December 15-18, 2024. IEEE Xplore
  • [July 2024] My paper titled "A Comprehensive Study of Auto-Encoders for Anomaly Detection: Efficiency and Trade-offs" was published in Machine Learning with Applications. ScienceDirect
  • [June 2024] Received Research Dissemination Present and Research Dissemination Publish Grant from Douglas College
  • [December 2023] Joined Douglas College, New Westminster Campus as a Full-time Regular Faculty Member.
  • [August 2023] Started my new journey as a Faculty Member at the Vancouver Island University.
  • [May 2023] Promoted to Machine Learning Engineer, Daris Properties Ltd.
  • [February 2023] Latest published conference paper - Feature Extraction and Prediction of Combined Text and Survey Data using Two-Staged Modeling
  • [January 2023] My MSc dissertation, Dimension Reduction and Anomaly Detection using Unsupervised Machine is now online
  • [November 2022] Guest Lecture, Introduction to Python and Numpy, STAT 447: Statistical Machine Learning for Data Science, Department of Mathematics and Statistics, University of Saskatchewan
  • [September 2022] Received Graduate Travel Award from University of Manitoba, NSERC CREATE VADA Program

Projects

1. Crosshair and event detection for skill assessment (Training Arc)
Problem statement: Convert raw gameplay video into reliable, frame-level signals so player aim and reaction metrics can be computed consistently across different resolutions, HUD settings, and frame rates.
Project details: Built a computer-vision pipeline to extract frames, detect crosshair and targets, track events over time, and compute stable reaction and correction metrics. Implemented model inference with YOLO-style detectors, lightweight tracking, and robust timestamp alignment to prevent drift. Packaged the workflow as reproducible scripts with config-driven runs and structured JSONL outputs for downstream analytics.
Results/outcome: Enabled repeatable measurement of aim stability, first-shot reaction, and correction behavior from VODs, producing clean artifacts that can be used for benchmarking players and validating training interventions.
My contributions: I owned the end-to-end design and implementation, including data layout, frame timebase logic, detection and tracking integration, metric definitions, and export formats. I validated failure modes across clips and added guardrails for edge cases (missing HUD, occlusion, and partial visibility).
2. Streaming integrity-aware recommendation engine (ranking service)
Problem statement: Rank candidates reliably under noisy and incomplete signals while keeping latency low enough for real-time product use and maintaining predictable behavior under distribution shifts.
Project details: Designed a ranking pipeline using gradient-boosted models and embedding features, served through a FastAPI layer with caching and feature hydration. Implemented an offline replay framework for consistent evaluation (NDCG, precision at k) and added monitoring hooks to detect drift in top features. Deployed containerized services and data stores to support scalable inference.
Results/outcome: Delivered a stable, low-latency ranking API that improved offline ranking quality and reduced operational friction through automated evaluation and monitoring.
My contributions: I led the model and system design, built the evaluation harness, implemented the serving stack, and set up the deployment workflow. I drove the success-metric alignment and maintained a regression suite to prevent silent quality drops.
3. Lifecycle analytics: churn prediction and intervention prioritization
Problem statement: Identify at-risk users early and prioritize interventions with interpretable drivers so retention actions can be targeted and measurable, not generic.
Project details: Built churn models on product usage, engagement, and subscription signals with a feature pipeline that supports backtesting and leakage checks. Added explainability with SHAP for driver analysis and delivered a scoring workflow that ranks users for outreach and experimentation. Implemented periodic recalibration and monitoring dashboards to keep performance stable over time.
Results/outcome: Produced a repeatable churn and prioritization workflow that improved decision quality for retention targeting and reduced manual review overhead through ranked, explainable outputs.
My contributions: I owned the modeling lifecycle (data preparation, feature engineering, training, thresholding strategy, and explainability outputs). I also set up monitoring and documented playbooks for retraining and governance so the system stays operational after handoff.
4. Experimentation and KPI layer for product analytics
Problem statement: Eliminate inconsistent metric definitions and slow reporting cycles that lead to rework, misalignment, and low trust in business dashboards.
Project details: Implemented a standardized KPI layer and A/B analysis workflow backed by a warehouse-first approach, with Python-based transformation and validation checks. Built executive dashboards in Power BI and automated scheduled refresh, metric documentation, and alerting for anomalies. Added data-quality gates to catch schema breaks and missing partitions before stakeholders see incorrect numbers.
Results/outcome: Reduced reporting friction and created a single source of truth for KPIs and experimentation readouts, improving dashboard reliability and stakeholder alignment.
My contributions: I led metric standardization end to end, built the transformation and validation layer, implemented the dashboards, and introduced automated checks and documentation to keep the system maintainable.
5. Retrieval Q&A assistant and rubric-driven evaluation harness
Problem statement: Teams needed fast, grounded answers from internal documents, plus a way to measure language model quality and regressions under clear evaluation criteria.
Project details: Built a retrieval-based assistant using embeddings and vector search, wrapped in an API with structured logging and citation-style output so responses are auditable. Implemented a rubric-driven evaluation harness with test suites, edge-case probes, and regression tracking across prompt variants. Added guardrails to detect unsafe or unsupported completions and route uncertain cases to human review.
Results/outcome: Enabled reliable internal Q&A with traceability, and introduced a measurable evaluation loop that supports iteration without quality drift.
My contributions: I designed the retrieval pipeline, implemented the service layer and logging, and built the evaluation framework end to end. I defined rubric interpretations and drove regression strategy so quality stays stable as prompts and models change.
6. Disentangled Conditional Variational Autoencoder (DCVAE) for unsupervised anomaly detection
Problem statement: Standard autoencoders often learn entangled latent factors, making anomaly scoring less interpretable and less robust across datasets with complex structure.
Project details: Implemented a PyTorch-based DCVAE that combines conditional structure with disentanglement objectives, including total-correlation regularization. Built a reproducible training and evaluation pipeline, ran qualitative and quantitative studies, and packaged code and experiments for repeatability. Reported performance across benchmark datasets with consistent splits and hyperparameter controls.
Results/outcome: Produced stronger, more interpretable latent representations for anomaly scoring and validated the approach through benchmarking and publication artifacts.
My contributions: I led the research implementation, experimental design, ablations, and reproducibility packaging. I wrote the core training and evaluation codepaths and maintained the repo for release-ready usage.
7. Autoencoder benchmark suite for anomaly detection
Problem statement: Published anomaly detection comparisons are often hard to reproduce due to inconsistent training setups, missing baselines, and unclear evaluation choices.
Project details: Built a benchmark suite comparing multiple autoencoder families under consistent training budgets and evaluation protocols. Standardized datasets, metrics, and reporting, and added scripts for reconstruction analysis, sampling behavior, latent visualization, and downstream classification. Packaged experiments so results can be rerun with minimal setup and consistent outputs.
Results/outcome: Delivered a reproducible comparison framework that surfaces practical trade-offs across architectures and supports fair benchmarking for future work.
My contributions: I designed the evaluation protocol, implemented the training and reporting pipelines, built the visualization and analysis scripts, and curated reproducibility artifacts so results can be rerun consistently.
8. Autonomous robotics perception and navigation
Problem statement: Build reliable perception modules that support navigation and obstacle avoidance in cluttered environments where sensing is noisy and compute budgets are limited.
Project details: Integrated onboard vision sensors with detection and navigation logic for field robotics, including lane and obstacle detection components and basic localization support. Implemented perception modules that run within constrained resources and validated system behavior through iterative testing and documented experiments for publication and demos.
Results/outcome: Enabled stable perception-driven navigation for prototype robots and produced publishable results demonstrating practical operation in real scenarios.
My contributions: I implemented core computer vision components, integrated perception outputs into navigation control loops, and drove testing and debugging in real runs. I contributed to documentation and experiment reporting for dissemination.

Research

My work centers on anomaly detection, representation learning, and probabilistic or Bayesian modeling, with an emphasis on unsupervised methods and reproducibility. I study auto-encoder families and variational formulations for high-dimensional data, build governed analytics for population health settings, and publish practical comparisons that surface efficiency and trade-offs across model classes.

  • Unsupervised anomaly detection and generative modeling. Disentanglement in latent spaces, total-correlation objectives, and conditional VAEs for detecting rare structure in image and tabular data.
  • Representation learning and evaluation. Comparative studies of auto-encoder architectures that quantify reconstruction quality, sampling behavior, latent visualization, and classification accuracy under consistent training setups.
  • Applied health analytics. Population-scale modeling with governance, documentation, and repeatable pipelines as part of the NSERC CREATE VADA program.
  • Earlier lines. Robotics, recommender systems, and computer vision, including mobile platforms and domain-specific recommenders.

Teaching

  • University of British Columbia
  • Douglas College
    • Summer 2025:
      • CSIS 3300: Database II
      • CSIS 3360: Fundamentals of Data Analytics
      • CSIS 4260: Special Topics in Data Analytics
    • Winter 2025:
      • CSIS 1175: Introduction to Programming I
      • CSIS 2200: Systems Analysis & Design
      • CSIS 3860: Data Visualization
    • Summer 2024:
      • CSIS 2300: Database I
      • CSIS 3300: Database II
      • CSIS 3360: Fundamentals of Data Analytics
    • Winter 2024:
  • Vancouver Island University
    • Fall 2023:
      • CSCI 251: Systems and Networks
      • CSCI 159: Computer Science I
      • CSCI 112: Applications Programming
  • University of Manitoba
    • Winter 2023:
      • DATA 2010: Tools and Techniques for Data Science
    • Fall 2022:
      • COMP 3490: Computer Graphics 1

Guest Lectures and Seminar Presentations


Publications

See my Google Scholar profile for the most recent publications.

dcvae
Disentangled Conditional Variational Autoencoder for Unsupervised Anomaly Detection
Asif Ahmed Neloy*, Maxime Turgeon,
Published at the 2024 IEEE International Conference on Big Data (IEEE BigData 2024), 2024
GitHub Repo / Publication

Generative models have recently become an effective approach for anomaly detection by leveraging auto-encoders to model high-dimensional data and identify anomalies based on reconstruction quality. However, a primary challenge in unsupervised anomaly detection (UAD) lies in learning meaningful, disentangled features without losing essential information. In this paper, we introduce a novel generative architecture that combines the frameworks of β-VAE, Conditional Variational Auto-encoder (CVAE), and the principle of total correlation (TC) to enhance feature disentanglement and retain critical information. Our approach improves the separation of latent features, optimizes TC loss more effectively, and enhances the detection of anomalies in complex, high-dimensional datasets such as image data. Through extensive qualitative and quantitative evaluations in benchmark datasets, we demonstrate that our method not only achieves strong performance in anomaly detection but also captures interpretable, disentangled representations, highlighting the importance of feature disentanglement in advancing UAD.

mlwa
A comprehensive study of auto-encoders for anomaly detection: Efficiency and trade-offs
Asif Ahmed Neloy*, Maxime Turgeon,
Machine Learning with Applications, 2024
project page / DOI Link

Unsupervised anomaly detection (UAD) is a diverse research area explored across various application domains. Over time, numerous anomaly detection techniques, including clustering, generative, and variational inference-based methods, are developed to address specific drawbacks and advance state-of-the-art techniques. Deep learning and generative models recently played a significant role in identifying unique challenges and devising advanced approaches. Auto-encoders (AEs) represent one such powerful technique that combines generative and probabilistic variational modeling with deep architecture. This study systematically reviews 11 Auto-Encoder architectures categorized into three groups, aiming to differentiate their reconstruction ability, sample generation, latent space visualization, and accuracy in classifying anomalous data using the Fashion-MNIST (FMNIST) and MNIST datasets. We conclude by analyzing the experimental results, which guide us in identifying the efficiency and trade-offs among auto-encoders.

icdm
Feature Extraction and Prediction of Combined Text and Survey Data using Two-Staged Modeling
Asif Ahmed Neloy*, Maxime Turgeon,
ICDM, 2022
project page / IEEE

This work introduces a two-stage modeling approach to combine classical statistical analysis with NLP problems in a real-world dataset. We layout a combination of a stacked ensemble classifier and a deep learning framework using convolutional neural networks and bidirectional recurrent neural networks to structure a decomposed architecture with lower computational complexity, followed by ablation and validation experiments.

icmlc
Ensemble learning based rental apartment price prediction model by categorical features factoring
Asif Ahmed Neloy*, HM Sadman Haque, Md Mahmud Ul Islam,
ICMLC, 2021
ACM ICMLC

We study apartment rental price prediction using categorical feature factoring and ensemble learning. The work compares tree-based and boosting methods, stacked ensembles, and regularized regression baselines, and reports model behavior across feature groups that drive rental price outcomes.


Python Packages

  • Data Scaler Selector: Data Scaler is an open-source python library to select the appropriate data scaler for your Machine Learning model.
  • Image to Sketch: Python open-source library to convert color or B&W image to pencil sketch.
  • Data Preparer (In Progress): Data Preparer is an open-source Python package to clean and prepare your dataset before applying a machine learning model.


Template credit-Jon Barron!
Last updated: October 20, 2025.


Visitors

© 2025 Asif. All rights reserved.