Asif Ahmed Neloy

Recent News Projects Research Teaching Guest Lectures Publications Software

Asif Ahmed Neloy

I am actively seeking roles in academia and as a Data Scientist / Machine Learning Engineer / AI Consultant

I am a Lead Data Scientist/MLE with GM Training Arc, Vancouver, BC, where I build player-facing analytics for esports training using gameplay telemetry and video-based computer vision.

I am also an Adjunct Faculty member in the Faculty of Land and Food Systems at the University of British Columbia (UBC). I teach programming, algorithms, networking, computer vision, databases, machine learning, and analytics.

I received an MSc in Computer Science from the University of Manitoba under the supervision of Dr. Maxime Turgeon and Dr. Cüneyt Akçora. My thesis studied disentangled VAEs for unsupervised anomaly detection.

I work across end-to-end machine learning systems, from data ingestion and feature engineering to training, evaluation, deployment, and monitoring. My day-to-day tools include Python and SQL, deep learning with PyTorch, and production APIs using FastAPI with containerized workflows. I build reproducible pipelines with Airflow and PySpark, work with Snowflake and PostgreSQL for analytics and feature stores, and deliver decision-ready reporting in Power BI or Tableau. I also build and evaluate retrieval-based large language model assistants using embeddings and vector search, with careful logging, quality checks, and guardrails.

My research focuses on anomaly detection, representation learning, and probabilistic or Bayesian modeling, with an emphasis on unsupervised methods and reproducibility. I study auto-encoder families and variational formulations for high-dimensional data, and I publish practical comparisons that surface efficiency and trade-offs across model classes. My applied work spans natural language processing, computer vision, and governed analytics in population and health settings.

Earlier in my career, I held industry roles across supply planning and forecasting at Advanced Chemical Industries (ACI) Ltd, and portfolio and leasing analytics in real estate at Daris Properties Ltd. and Forum Inc.. I have also contributed to large language model evaluation work, focusing on factuality, safety, and user experience under detailed rubrics.

Email / CV / Bio / Google Scholar / Github

Projects

1. Crosshair and event detection for skill assessment (Training Arc)

Problem statement: Convert raw gameplay video into reliable, frame-level signals so player aim and reaction metrics can be computed consistently across different resolutions, HUD settings, and frame rates.

Project details: Built a computer-vision pipeline to extract frames, detect crosshair and targets, track events over time, and compute stable reaction and correction metrics. Implemented model inference with YOLO-style detectors, lightweight tracking, and robust timestamp alignment to prevent drift. Packaged the workflow as reproducible scripts with config-driven runs and structured JSONL outputs for downstream analytics.

Results/outcome: Enabled repeatable measurement of aim stability, first-shot reaction, and correction behavior from VODs, producing clean artifacts that can be used for benchmarking players and validating training interventions.

My contributions: I owned the end-to-end design and implementation, including data layout, frame timebase logic, detection and tracking integration, metric definitions, and export formats. I validated failure modes across clips and added guardrails for edge cases (missing HUD, occlusion, and partial visibility).

2. Streaming integrity-aware recommendation engine (ranking service)

Problem statement: Rank candidates reliably under noisy and incomplete signals while keeping latency low enough for real-time product use and maintaining predictable behavior under distribution shifts.

Project details: Designed a ranking pipeline using gradient-boosted models and embedding features, served through a FastAPI layer with caching and feature hydration. Implemented an offline replay framework for consistent evaluation (NDCG, precision at k) and added monitoring hooks to detect drift in top features. Deployed containerized services and data stores to support scalable inference.

Results/outcome: Delivered a stable, low-latency ranking API that improved offline ranking quality and reduced operational friction through automated evaluation and monitoring.

My contributions: I led the model and system design, built the evaluation harness, implemented the serving stack, and set up the deployment workflow. I drove the success-metric alignment and maintained a regression suite to prevent silent quality drops.

3. Lifecycle analytics: churn prediction and intervention prioritization

Problem statement: Identify at-risk users early and prioritize interventions with interpretable drivers so retention actions can be targeted and measurable, not generic.

Project details: Built churn models on product usage, engagement, and subscription signals with a feature pipeline that supports backtesting and leakage checks. Added explainability with SHAP for driver analysis and delivered a scoring workflow that ranks users for outreach and experimentation. Implemented periodic recalibration and monitoring dashboards to keep performance stable over time.

Results/outcome: Produced a repeatable churn and prioritization workflow that improved decision quality for retention targeting and reduced manual review overhead through ranked, explainable outputs.

My contributions: I owned the modeling lifecycle (data preparation, feature engineering, training, thresholding strategy, and explainability outputs). I also set up monitoring and documented playbooks for retraining and governance so the system stays operational after handoff.

4. Experimentation and KPI layer for product analytics

Problem statement: Eliminate inconsistent metric definitions and slow reporting cycles that lead to rework, misalignment, and low trust in business dashboards.

Project details: Implemented a standardized KPI layer and A/B analysis workflow backed by a warehouse-first approach, with Python-based transformation and validation checks. Built executive dashboards in Power BI and automated scheduled refresh, metric documentation, and alerting for anomalies. Added data-quality gates to catch schema breaks and missing partitions before stakeholders see incorrect numbers.

Results/outcome: Reduced reporting friction and created a single source of truth for KPIs and experimentation readouts, improving dashboard reliability and stakeholder alignment.

My contributions: I led metric standardization end to end, built the transformation and validation layer, implemented the dashboards, and introduced automated checks and documentation to keep the system maintainable.

5. Retrieval Q&A assistant and rubric-driven evaluation harness

Problem statement: Teams needed fast, grounded answers from internal documents, plus a way to measure language model quality and regressions under clear evaluation criteria.

Project details: Built a retrieval-based assistant using embeddings and vector search, wrapped in an API with structured logging and citation-style output so responses are auditable. Implemented a rubric-driven evaluation harness with test suites, edge-case probes, and regression tracking across prompt variants. Added guardrails to detect unsafe or unsupported completions and route uncertain cases to human review.

Results/outcome: Enabled reliable internal Q&A with traceability, and introduced a measurable evaluation loop that supports iteration without quality drift.

My contributions: I designed the retrieval pipeline, implemented the service layer and logging, and built the evaluation framework end to end. I defined rubric interpretations and drove regression strategy so quality stays stable as prompts and models change.

6. Disentangled Conditional Variational Autoencoder (DCVAE) for unsupervised anomaly detection

Problem statement: Standard autoencoders often learn entangled latent factors, making anomaly scoring less interpretable and less robust across datasets with complex structure.

Project details: Implemented a PyTorch-based DCVAE that combines conditional structure with disentanglement objectives, including total-correlation regularization. Built a reproducible training and evaluation pipeline, ran qualitative and quantitative studies, and packaged code and experiments for repeatability. Reported performance across benchmark datasets with consistent splits and hyperparameter controls.

Results/outcome: Produced stronger, more interpretable latent representations for anomaly scoring and validated the approach through benchmarking and publication artifacts.

My contributions: I led the research implementation, experimental design, ablations, and reproducibility packaging. I wrote the core training and evaluation codepaths and maintained the repo for release-ready usage.

7. Autoencoder benchmark suite for anomaly detection

Problem statement: Published anomaly detection comparisons are often hard to reproduce due to inconsistent training setups, missing baselines, and unclear evaluation choices.

Project details: Built a benchmark suite comparing multiple autoencoder families under consistent training budgets and evaluation protocols. Standardized datasets, metrics, and reporting, and added scripts for reconstruction analysis, sampling behavior, latent visualization, and downstream classification. Packaged experiments so results can be rerun with minimal setup and consistent outputs.

Results/outcome: Delivered a reproducible comparison framework that surfaces practical trade-offs across architectures and supports fair benchmarking for future work.

My contributions: I designed the evaluation protocol, implemented the training and reporting pipelines, built the visualization and analysis scripts, and curated reproducibility artifacts so results can be rerun consistently.

8. Autonomous robotics perception and navigation

Problem statement: Build reliable perception modules that support navigation and obstacle avoidance in cluttered environments where sensing is noisy and compute budgets are limited.

Project details: Integrated onboard vision sensors with detection and navigation logic for field robotics, including lane and obstacle detection components and basic localization support. Implemented perception modules that run within constrained resources and validated system behavior through iterative testing and documented experiments for publication and demos.

Results/outcome: Enabled stable perception-driven navigation for prototype robots and produced publishable results demonstrating practical operation in real scenarios.

My contributions: I implemented core computer vision components, integrated perception outputs into navigation control loops, and drove testing and debugging in real runs. I contributed to documentation and experiment reporting for dissemination.

Research

My work centers on anomaly detection, representation learning, and probabilistic or Bayesian modeling, with an emphasis on unsupervised methods and reproducibility. I study auto-encoder families and variational formulations for high-dimensional data, build governed analytics for population health settings, and publish practical comparisons that surface efficiency and trade-offs across model classes.

Unsupervised anomaly detection and generative modeling. Disentanglement in latent spaces, total-correlation objectives, and conditional VAEs for detecting rare structure in image and tabular data.
Representation learning and evaluation. Comparative studies of auto-encoder architectures that quantify reconstruction quality, sampling behavior, latent visualization, and classification accuracy under consistent training setups.
Applied health analytics. Population-scale modeling with governance, documentation, and repeatable pipelines as part of the NSERC CREATE VADA program.
Earlier lines. Robotics, recommender systems, and computer vision, including mobile platforms and domain-specific recommenders.

Teaching

University of British Columbia
- Fall 2025:
  - FRE 501: Topics in Food Market Analysis (Co-instructor)
- Winter 2025:
  - FRE 521D: Data Analytics in Climate, Food and Environment
Douglas College
- Summer 2025:
  - CSIS 3300: Database II
  - CSIS 3360: Fundamentals of Data Analytics
  - CSIS 4260: Special Topics in Data Analytics
- Winter 2025:
  - CSIS 1175: Introduction to Programming I
  - CSIS 2200: Systems Analysis & Design
  - CSIS 3860: Data Visualization
- Summer 2024:
  - CSIS 2300: Database I
  - CSIS 3300: Database II
  - CSIS 3360: Fundamentals of Data Analytics
- Winter 2024:
  - CSIS 2200: Systems Analysis & Design
  - CSIS 2300: Database I
  - CSIS 3290: Fundamentals of Machine Learning in Data Science
Vancouver Island University
- Fall 2023:
  - CSCI 251: Systems and Networks
  - CSCI 159: Computer Science I
  - CSCI 112: Applications Programming
University of Manitoba
- Winter 2023:
  - DATA 2010: Tools and Techniques for Data Science
- Fall 2022:
  - COMP 3490: Computer Graphics 1

Guest Lectures and Seminar Presentations

Invited Sessions:
- ICSA-Canada Chapter 2022 Symposium, Banff Center, Banff, Alberta, Canada.
  Topic: Auto-encoders for Anomaly Detection: Efficiency and Trade-Offs.
Lectures:
- Introduction to Machine Learning, North South University, Dhaka, Bangladesh.
- Courses: STAT 447: Statistical Machine Learning for Data Science

Publications

See my Google Scholar profile for the most recent publications.

	Disentangled Conditional Variational Autoencoder for Unsupervised Anomaly Detection Asif Ahmed Neloy, Maxime Turgeon, Published at the 2024 IEEE International Conference on Big Data (IEEE BigData 2024)*, 2024 GitHub Repo / Publication Generative models have recently become an effective approach for anomaly detection by leveraging auto-encoders to model high-dimensional data and identify anomalies based on reconstruction quality. However, a primary challenge in unsupervised anomaly detection (UAD) lies in learning meaningful, disentangled features without losing essential information. In this paper, we introduce a novel generative architecture that combines the frameworks of β-VAE, Conditional Variational Auto-encoder (CVAE), and the principle of total correlation (TC) to enhance feature disentanglement and retain critical information. Our approach improves the separation of latent features, optimizes TC loss more effectively, and enhances the detection of anomalies in complex, high-dimensional datasets such as image data. Through extensive qualitative and quantitative evaluations in benchmark datasets, we demonstrate that our method not only achieves strong performance in anomaly detection but also captures interpretable, disentangled representations, highlighting the importance of feature disentanglement in advancing UAD.
	A comprehensive study of auto-encoders for anomaly detection: Efficiency and trade-offs Asif Ahmed Neloy, Maxime Turgeon, Machine Learning with Applications*, 2024 project page / DOI Link Unsupervised anomaly detection (UAD) is a diverse research area explored across various application domains. Over time, numerous anomaly detection techniques, including clustering, generative, and variational inference-based methods, are developed to address specific drawbacks and advance state-of-the-art techniques. Deep learning and generative models recently played a significant role in identifying unique challenges and devising advanced approaches. Auto-encoders (AEs) represent one such powerful technique that combines generative and probabilistic variational modeling with deep architecture. This study systematically reviews 11 Auto-Encoder architectures categorized into three groups, aiming to differentiate their reconstruction ability, sample generation, latent space visualization, and accuracy in classifying anomalous data using the Fashion-MNIST (FMNIST) and MNIST datasets. We conclude by analyzing the experimental results, which guide us in identifying the efficiency and trade-offs among auto-encoders.
	Feature Extraction and Prediction of Combined Text and Survey Data using Two-Staged Modeling Asif Ahmed Neloy, Maxime Turgeon, ICDM*, 2022 project page / IEEE This work introduces a two-stage modeling approach to combine classical statistical analysis with NLP problems in a real-world dataset. We layout a combination of a stacked ensemble classifier and a deep learning framework using convolutional neural networks and bidirectional recurrent neural networks to structure a decomposed architecture with lower computational complexity, followed by ablation and validation experiments.
	Ensemble learning based rental apartment price prediction model by categorical features factoring Asif Ahmed Neloy, HM Sadman Haque, Md Mahmud Ul Islam, ICMLC*, 2021 ACM ICMLC We study apartment rental price prediction using categorical feature factoring and ensemble learning. The work compares tree-based and boosting methods, stacked ensembles, and regularized regression baselines, and reports model behavior across feature groups that drive rental price outcomes.

Python Packages

Data Scaler Selector: Data Scaler is an open-source python library to select the appropriate data scaler for your Machine Learning model.
Image to Sketch: Python open-source library to convert color or B&W image to pencil sketch.
Data Preparer (In Progress): Data Preparer is an open-source Python package to clean and prepare your dataset before applying a machine learning model.

Template credit-Jon Barron!
Last updated: October 20, 2025.

Recent News