Hey, I'm Rehan. ML/AI Engineer designing and shipping production machine learning and Natural Language Processing (NLP) systems end-to-end, from training custom language models and embedding pipelines to deploying real-time classification APIs at scale. Architecting across the full stack: transformer fine-tuning, semantic search, RAG systems, agentic AI, LLM orchestration, and ML infrastructure. Currently leading an R&D team building multilingual NLP systems serving English, Dutch, and German languages.
MSc in Computer Science & Data Science · Eindhoven University of Technology (TU/e)
When not training models: runner, boulderer, speed listener, and occasional traveler.
-
If something has to be done twice, the third time it's automated into a pipeline. I'd rather spend a day automating than a week doing the same thing manually.
-
Don't believe in guessing thresholds. Measure them from both quantitative and qualitative perspectives. When multiple approaches each get something right, combine them and let the data decide. Everything gets tested before it ships.
-
Build models from scratch and put them in production. If it can't run unattended at scale, it's not done.
|
RAG, Agents & LLM Systems Architecture & MLOps Semantic Classification Systems Automation & Developer Tooling |
Custom Language Models Entity Similarity & Embeddings Information Extraction & NLP Pipelines Data & Performance Engineering |
Exploring advanced RAG architectures, LLM training and fine-tuning, agentic AI, Model Context Protocol (MCP), and Rust for systems programming.
SqueakyCleanText - Text cleaning and NER pipeline for NLP. Multi-backend entity recognition (ONNX, PyTorch, GLiNER), PII anonymization, and multilingual support.
reddit-stash-insights - Semantic search and RAG-powered chat over your Reddit archive. Hybrid retrieval (BGE-M3 embeddings + BM25) with multi-LLM support.
reddit-stash - Automated Reddit backup via GitHub Actions. Archives saved posts, comments, and upvotes to local storage, Dropbox, or S3.




