Asif Ahmed Neloy

I am actively seeking roles in academia and as a Data Scientist / Machine Learning Engineer / AI Consultant

I am a Lead Data Scientist/MLE with GM Training Arc, Vancouver, BC, where I build player-facing analytics for esports training using gameplay telemetry and video-based computer vision.

I am also an Adjunct Faculty member in the Faculty of Land and Food Systems at the University of British Columbia (UBC). I teach programming, algorithms, networking, computer vision, databases, machine learning, and analytics.

I received an MSc in Computer Science from the University of Manitoba under the supervision of Dr. Maxime Turgeon and Dr. Cuneyt Akcora. My thesis studied disentangled VAEs for unsupervised anomaly detection.

I work across end-to-end machine learning systems, from data ingestion and feature engineering to training, evaluation, deployment, and monitoring. My day-to-day tools include Python and SQL, deep learning with PyTorch, and production APIs using FastAPI with containerized workflows. I build reproducible pipelines with Airflow and PySpark, work with Snowflake and PostgreSQL for analytics and feature stores, and deliver decision-ready reporting in Power BI or Tableau. I also build and evaluate retrieval-based large language model assistants using embeddings and vector search, with careful logging, quality checks, and guardrails.

My research focuses on anomaly detection, representation learning, and probabilistic or Bayesian modeling, with an emphasis on unsupervised methods and reproducibility. I study auto-encoder families and variational formulations for high-dimensional data, and I publish practical comparisons that surface efficiency and trade-offs across model classes. My applied work spans natural language processing, computer vision, and governed analytics in population and health settings.

Earlier in my career, I held industry roles across supply planning and forecasting at Advanced Chemical Industries (ACI) Ltd, and portfolio and leasing analytics in real estate at Daris Properties Ltd. and Forum Inc.. I have also contributed to large language model evaluation work, focusing on factuality, safety, and user experience under detailed rubrics.

Python PyTorch NLP RAG Computer Vision SQL FastAPI LLM

Recent News

Projects

01
Crosshair and Event Detection for Skill Assessment (Training Arc)
Click to expand details
Problem Convert raw gameplay video into reliable, frame-level signals so player aim and reaction metrics can be computed consistently across different resolutions, HUD settings, and frame rates.
Details Built a computer-vision pipeline to extract frames, detect crosshair and targets, track events over time, and compute stable reaction and correction metrics. Implemented model inference with YOLO-style detectors, lightweight tracking, and robust timestamp alignment to prevent drift. Packaged the workflow as reproducible scripts with config-driven runs and structured JSONL outputs for downstream analytics.
Outcome Enabled repeatable measurement of aim stability, first-shot reaction, and correction behavior from VODs, producing clean artifacts that can be used for benchmarking players and validating training interventions.
Role I owned the end-to-end design and implementation, including data layout, frame timebase logic, detection and tracking integration, metric definitions, and export formats. I validated failure modes across clips and added guardrails for edge cases (missing HUD, occlusion, and partial visibility).
02
Intelligent Document Processing and OCR Pipeline
Click to expand details
Problem Organizations deal with heterogeneous document formats (PDFs, scanned images, handwritten forms) requiring automated extraction and structuring of key information for downstream processing and analytics.
Details Developed an end-to-end document processing pipeline combining OCR engines (Tesseract, EasyOCR) with layout analysis and named entity recognition. Implemented preprocessing steps for deskewing, denoising, and contrast enhancement. Built structured output schemas for invoices, forms, and contracts with field-level confidence scoring. Integrated with vector stores for semantic search across extracted content.
Outcome Reduced manual data entry time significantly while maintaining high extraction accuracy. Enabled searchable document archives with semantic retrieval capabilities and automated compliance checking workflows.
Role Designed the pipeline architecture, selected and fine-tuned OCR models, implemented post-processing logic for entity extraction, and built the API layer for integration with existing document management systems.
03
Enterprise RAG System with Multi-Source Knowledge Integration
Click to expand details
Problem Enterprise knowledge is scattered across wikis, documents, databases, and code repositories, making it difficult for teams to find accurate, contextual answers without extensive manual search.
Details Built a production-grade RAG (Retrieval-Augmented Generation) system with multi-source connectors for Confluence, SharePoint, GitHub, and internal databases. Implemented hybrid search combining dense embeddings (sentence-transformers) with sparse BM25 retrieval. Designed chunking strategies optimized for different content types. Added re-ranking with cross-encoders and citation generation for traceability. Deployed with FastAPI, Redis caching, and comprehensive logging for quality monitoring.
Outcome Delivered a reliable internal Q&A system with grounded, auditable responses. Reduced time-to-answer for common queries while maintaining high factual accuracy through retrieval-based grounding.
Role Architected the retrieval pipeline, implemented embedding and indexing strategies, built the API service layer, and designed the evaluation framework for measuring retrieval quality and answer accuracy.
04
NLP Pipeline for Multi-Domain Text Classification and Sentiment Analysis
Click to expand details
Problem Unstructured text data from customer feedback, support tickets, and surveys contains valuable insights but requires scalable, accurate classification and sentiment extraction to be actionable.
Details Developed a modular NLP pipeline supporting multi-label text classification, aspect-based sentiment analysis, and topic modeling. Fine-tuned transformer models (BERT, RoBERTa) for domain-specific classification tasks. Implemented active learning workflows to efficiently expand training data. Built preprocessing modules for text cleaning, language detection, and PII redaction. Deployed as microservices with batch and real-time inference capabilities.
Outcome Enabled automated routing of support tickets, real-time sentiment monitoring dashboards, and trend analysis across customer feedback channels. Improved response time and customer satisfaction metrics.
Role Led model development and fine-tuning, designed the annotation guidelines and active learning strategy, implemented the inference pipeline, and built monitoring dashboards for model performance tracking.
05
Streaming Integrity-Aware Recommendation Engine
Click to expand details
Problem Rank candidates reliably under noisy and incomplete signals while keeping latency low enough for real-time product use and maintaining predictable behavior under distribution shifts.
Details Designed a ranking pipeline using gradient-boosted models and embedding features, served through a FastAPI layer with caching and feature hydration. Implemented an offline replay framework for consistent evaluation (NDCG, precision at k) and added monitoring hooks to detect drift in top features. Deployed containerized services and data stores to support scalable inference.
Outcome Delivered a stable, low-latency ranking API that improved offline ranking quality and reduced operational friction through automated evaluation and monitoring.
Role I led the model and system design, built the evaluation harness, implemented the serving stack, and set up the deployment workflow. I drove the success-metric alignment and maintained a regression suite to prevent silent quality drops.
06
Lifecycle Analytics: Churn Prediction and Intervention Prioritization
Click to expand details
Problem Identify at-risk users early and prioritize interventions with interpretable drivers so retention actions can be targeted and measurable, not generic.
Details Built churn models on product usage, engagement, and subscription signals with a feature pipeline that supports backtesting and leakage checks. Added explainability with SHAP for driver analysis and delivered a scoring workflow that ranks users for outreach and experimentation. Implemented periodic recalibration and monitoring dashboards to keep performance stable over time.
Outcome Produced a repeatable churn and prioritization workflow that improved decision quality for retention targeting and reduced manual review overhead through ranked, explainable outputs.
Role I owned the modeling lifecycle (data preparation, feature engineering, training, thresholding strategy, and explainability outputs). I also set up monitoring and documented playbooks for retraining and governance so the system stays operational after handoff.
07
Retrieval Q&A Assistant with Rubric-Driven Evaluation
Click to expand details
Problem Teams needed fast, grounded answers from internal documents, plus a way to measure language model quality and regressions under clear evaluation criteria.
Details Built a retrieval-based assistant using embeddings and vector search, wrapped in an API with structured logging and citation-style output so responses are auditable. Implemented a rubric-driven evaluation harness with test suites, edge-case probes, and regression tracking across prompt variants. Added guardrails to detect unsafe or unsupported completions and route uncertain cases to human review.
Outcome Enabled reliable internal Q&A with traceability, and introduced a measurable evaluation loop that supports iteration without quality drift.
Role I designed the retrieval pipeline, implemented the service layer and logging, and built the evaluation framework end to end. I defined rubric interpretations and drove regression strategy so quality stays stable as prompts and models change.

Research

My work centers on anomaly detection, representation learning, and probabilistic or Bayesian modeling, with an emphasis on unsupervised methods and reproducibility. I study auto-encoder families and variational formulations for high-dimensional data, build governed analytics for population health settings, and publish practical comparisons that surface efficiency and trade-offs across model classes.

Unsupervised Anomaly Detection

Disentanglement in latent spaces, total-correlation objectives, and conditional VAEs for detecting rare structure in image and tabular data.

Representation Learning

Comparative studies of auto-encoder architectures that quantify reconstruction quality, sampling behavior, latent visualization, and classification accuracy.

Applied Health Analytics

Population-scale modeling with governance, documentation, and repeatable pipelines as part of the NSERC CREATE VADA program.

NLP & Information Retrieval

RAG systems, document processing, text classification, and evaluation frameworks for language model applications.

Teaching

University of British Columbia
Fall 2025
  • FRE 501: Topics in Food Market Analysis (Co-instructor)
Douglas College
Summer 2025
  • CSIS 3300: Database II
  • CSIS 3360: Fundamentals of Data Analytics
  • CSIS 4260: Special Topics in Data Analytics
Winter 2025
  • CSIS 1175: Introduction to Programming I
  • CSIS 2200: Systems Analysis & Design
  • CSIS 3860: Data Visualization
Summer 2024
  • CSIS 2300: Database I
  • CSIS 3300: Database II
  • CSIS 3360: Fundamentals of Data Analytics
Winter 2024
Vancouver Island University
Fall 2023
  • CSCI 251: Systems and Networks
  • CSCI 159: Computer Science I
  • CSCI 112: Applications Programming
University of Manitoba
Winter 2023
  • DATA 2010: Tools and Techniques for Data Science
Fall 2022
  • COMP 3490: Computer Graphics 1

Guest Lectures and Seminar Presentations

Invited Sessions

ICSA-Canada Chapter 2022 Symposium, Banff Center, Banff, Alberta, Canada.
Topic: Auto-encoders for Anomaly Detection: Efficiency and Trade-Offs.

Lectures

Introduction to Machine Learning, North South University, Dhaka, Bangladesh.

Publications

See my Google Scholar profile for the most recent publications.

DCVAE
Disentangled Conditional Variational Autoencoder for Unsupervised Anomaly Detection
Asif Ahmed Neloy*, Maxime Turgeon
IEEE International Conference on Big Data (IEEE BigData 2024), 2024

A novel generative architecture combining beta-VAE, CVAE, and total correlation to enhance feature disentanglement and improve anomaly detection in high-dimensional datasets.

MLWA
A Comprehensive Study of Auto-Encoders for Anomaly Detection: Efficiency and Trade-offs
Asif Ahmed Neloy*, Maxime Turgeon
Machine Learning with Applications, 2024

Systematic review of 11 Auto-Encoder architectures, analyzing reconstruction ability, sample generation, latent space visualization, and anomaly classification accuracy.

ICDM
Feature Extraction and Prediction of Combined Text and Survey Data using Two-Staged Modeling
Asif Ahmed Neloy*, Maxime Turgeon
ICDM, 2022

Two-stage modeling approach combining stacked ensemble classifiers with CNN and bidirectional RNN for NLP problems in real-world datasets.

ICMLC
Ensemble Learning Based Rental Apartment Price Prediction Model by Categorical Features Factoring
Asif Ahmed Neloy*, HM Sadman Haque, Md Mahmud Ul Islam
ICMLC, 2021

Apartment rental price prediction using categorical feature factoring and ensemble learning, comparing tree-based and boosting methods.

Python Packages

Open-source Python library to select the appropriate data scaler for your Machine Learning model.

Python open-source library to convert color or B&W image to pencil sketch.

Data Preparer In Progress

Open-source Python package to clean and prepare your dataset before applying a machine learning model.