I'm an AI Engineer based in Cape Town, South Africa. I specialize in Multi-Agent Reinforcement Learning, LLM Agents & Engineering, and Goal-Conditioned Reinforcement Learning
I'm an AI Research Engineer with experience spanning Multi-Agent Reinforcement Learning and LLM Agents & Engineering. At InstaDeep, I develop MARL algorithms using contrastive learning and curriculum strategies. During my Master's, I built autonomous LLM agents for ML engineering, designed inference-time scaling strategies, and fine-tuned large language models — bridging the gap between RL and modern LLM systems.
Currently at InstaDeep working with the Mava team on cutting-edge MARL research. I hold a Master's degree from AIMS South Africa through the AI for Science program in partnership with DeepMind.
Multi-Agent RL • Goal-Conditioned RL • Contrastive Learning • Curriculum Learning • LLM Agents • Inference-Time Scaling
Python • JAX • PyTorch • vLLM • HuggingFace • TRL • Unsloth • LangGraph • LangSmith • LiteLLM • Hydra • TPU/GPU
Autonomous LLM Agents • Agentic Workflows • Inference-Time Scaling • SFT/Fine-tuning • vLLM Serving • MLE-Bench
Working with the Mava team on multi-agent reinforcement learning research. Developing novel approaches combining contrastive learning, goal-conditioned RL, and curriculum learning for complex multi-agent environments. Also built Tinkerer — an autonomous LLM-powered multi-agent system for automated scientific discovery, featuring a closed-loop workflow of idea generation, code implementation, experiment scheduling, and result collection.
University of Cape Town & AIMS South Africa
AI for Science program in partnership with DeepMind. Focused on reinforcement learning, deep learning, and their applications to scientific problems.
Worked on the development of agriAI, a field monitoring application providing real-time insights into crop health, soil conditions, and environmental factors.
Built a crop classification pipeline using Random Forest with Sentinel-2 multispectral bands, achieving 89.22% accuracy in categorizing key crops in El Gezira. Contributed to the agriAI project — an intelligent farming assistant providing real-time crop health and soil analysis.
A selection of my original research and engineering projects
Fine-Grained Credit Assignment for RL Training. Token-level reward assignment to improve training stability and sample efficiency for LLMs.
Combining contrastive reinforcement learning with unsupervised environment design for multi-agent curriculum learning.
Autonomous LLM agent for end-to-end ML engineering via tree search. Implements inference-time scaling strategies (Self-Reflection, Planner-Coder, Self-Consistency) to make open-source LLMs competitive with GPT-4 on MLE-Bench. Served locally via vLLM.
LLM-powered multi-agent workflow for automated scientific discovery. A closed-loop system where an AI Scientist generates research ideas, an Engineer implements them in code, a Scheduler runs experiments, and a Collector gathers results — autonomously iterating on ML research. Built with LiteLLM, Claude Code, Hydra, and Neptune.
AI coding agent that generates ML ideas, implements them in code, auto-debugs using stack traces, and launches experiments on cloud compute. Acts as an autonomous research intern exploring the solution space. Built with OpenAI API, LiteLLM, Flask, and Docker.
Detecting GenAI-generated content and sophisticated manipulation in public media using machine learning.
Supervised fine-tuning pipeline for DeepSeek-7B. Custom training data curation, LoRA/full fine-tuning experiments, and evaluation for specialized ML engineering tasks.
Benchmarking inference time scaling strategies on MLE-bench. Measuring how well AI agents perform at ML engineering.
AIDE: The Machine Learning CodeGen Agent. Automated ML engineering through intelligent code generation.
Neural machine translation system for Arabic to Swahili, addressing low-resource language pair challenges.
I'm always interested in discussing research collaborations, new opportunities, or just chatting about RL and AI.