Rehan Fazal rhnfzl

About Me

Hey, I'm Rehan. ML/AI Engineer designing and shipping production machine learning and Natural Language Processing (NLP) systems end-to-end, from training custom language models and embedding pipelines to deploying real-time classification APIs at scale. Architecting across the full stack: transformer fine-tuning, semantic search, RAG systems, agentic AI, LLM orchestration, and ML infrastructure. Currently leading an R&D team building multilingual NLP systems serving English, Dutch, and German languages.

MSc in Computer Science & Data Science · Eindhoven University of Technology (TU/e)

When not training models: runner, boulderer, speed listener, and occasional traveler.

How I Work

If something has to be done twice, the third time it's automated into a pipeline. I'd rather spend a day automating than a week doing the same thing manually.
Don't believe in guessing thresholds. Measure them from both quantitative and qualitative perspectives. When multiple approaches each get something right, combine them and let the data decide. Everything gets tested before it ships.
Build models from scratch and put them in production. If it can't run unattended at scale, it's not done.

What I Build

RAG, Agents & LLM Systems
Agentic AI architectures with tool orchestration and MCP servers auto-generated from API specs. Multi-LLM judge-based evaluation with cross-review. RAG pipelines with ChromaDB, LanceDB, Qdrant, and FAISS. Prompt optimization with structured output.

Architecture & MLOps
Designed AI infrastructure layer over existing SaaS platform: event-driven multi-agent orchestration, MCP tooling, SSE streaming, observability, and EKS deployment. Domain-agnostic shared infrastructure across production classifiers. SageMaker endpoints, Redis caching, MLflow, CI/CD pipelines.

Semantic Classification Systems
Production multilingual classifiers using transformer embeddings. Low-latency inference at scale, deployed as FastAPI services on AWS.

Automation & Developer Tooling
Auto-generated MCP tooling from API specs, autonomous Jira-to-Slack digests, meeting transcript processing pipelines, CI/CD workflows with quality gates.

Custom Language Models
Domain-specific BERT, RoBERTa, and ModernBERT variants trained from scratch with custom tokenizers. Contrastive ranking models for relevance filtering. Deployed on SageMaker GPU endpoints.

Entity Similarity & Embeddings
Multi-algorithm score fusion, embedding aggregation via attention and Set Transformer, empirically calibrated thresholds, LLM-based quality evaluation.

Information Extraction & NLP Pipelines
Document parsing with LLM post-processing, NER-based extraction (GliNER, ONNX ensembles), PII anonymization, multilingual taxonomy enrichment, clustering and outlier detection.

Data & Performance Engineering
Large-scale multilingual data processing with language detection. FAISS index benchmarking, async caching layers, stress testing, and latency optimization for production classifiers.

Tech Stack

Core ML & AI

LLM & NLP

Infrastructure & DevOps

Tools & Platforms

Currently Learning

Exploring advanced RAG architectures, LLM training and fine-tuning, agentic AI, Model Context Protocol (MCP), and Rust for systems programming.

Open Source

SqueakyCleanText - Text cleaning and NER pipeline for NLP. Multi-backend entity recognition (ONNX, PyTorch, GLiNER), PII anonymization, and multilingual support.

reddit-stash-insights - Semantic search and RAG-powered chat over your Reddit archive. Hybrid retrieval (BGE-M3 embeddings + BM25) with multi-LLM support.

reddit-stash - Automated Reddit backup via GitHub Actions. Archives saved posts, comments, and upvotes to local storage, Dropbox, or S3.

GitHub Stats

Connect With Me

Rehan Fazal GitHub contribution snake animation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly