Asim Osman

Hi, I'm Asim

I'm an AI Engineer based in Cape Town, South Africa. I specialize in Multi-Agent Reinforcement Learning, LLM Agents & Engineering, and Goal-Conditioned Reinforcement Learning

About Me

I'm an AI Research Engineer with experience spanning Multi-Agent Reinforcement Learning and LLM Agents & Engineering. At InstaDeep, I develop MARL algorithms using contrastive learning and curriculum strategies. During my Master's, I built autonomous LLM agents for ML engineering, designed inference-time scaling strategies, and fine-tuned large language models — bridging the gap between RL and modern LLM systems.

Currently at InstaDeep working with the Mava team on cutting-edge MARL research. I hold a Master's degree from AIMS South Africa through the AI for Science program in partnership with DeepMind.

Research Interests

Multi-Agent RL • Goal-Conditioned RL • Contrastive Learning • Curriculum Learning • LLM Agents • Inference-Time Scaling

Tech Stack

Python • JAX • PyTorch • vLLM • HuggingFace • TRL • Unsloth • LangGraph • LangSmith • LiteLLM • Hydra • TPU/GPU

LLM Engineering

Autonomous LLM Agents • Agentic Workflows • Inference-Time Scaling • SFT/Fine-tuning • vLLM Serving • MLE-Bench

Experience

AI Research Engineer

InstaDeep

Jul 2025 - Present

Working with the Mava team on multi-agent reinforcement learning research. Developing novel approaches combining contrastive learning, goal-conditioned RL, and curriculum learning for complex multi-agent environments. Also built Tinkerer — an autonomous LLM-powered multi-agent system for automated scientific discovery, featuring a closed-loop workflow of idea generation, code implementation, experiment scheduling, and result collection.

Master's in Artificial Intelligence

University of Cape Town & AIMS South Africa

Sep 2024 - Jul 2025

AI for Science program in partnership with DeepMind. Focused on reinforcement learning, deep learning, and their applications to scientific problems.

Research Engineer Intern / Research Assistant

Enigma AI Research and Development Team

May 2024 - Aug 2024

Worked on the development of agriAI, a field monitoring application providing real-time insights into crop health, soil conditions, and environmental factors.

Jul 2022 - Mar 2023

Built a crop classification pipeline using Random Forest with Sentinel-2 multispectral bands, achieving 89.22% accuracy in categorizing key crops in El Gezira. Contributed to the agriAI project — an intelligent farming assistant providing real-time crop health and soil analysis.

Featured Projects

A selection of my original research and engineering projects

Research

VinePPO

Fine-Grained Credit Assignment for RL Training. Token-level reward assignment to improve training stability and sample efficiency for LLMs.

JAX LLM RL
Research Private

Contrastive-RL-UED

Combining contrastive reinforcement learning with unsupervised environment design for multi-agent curriculum learning.

MARL Curriculum
Research Master's Thesis

AIDE Agent

Autonomous LLM agent for end-to-end ML engineering via tree search. Implements inference-time scaling strategies (Self-Reflection, Planner-Coder, Self-Consistency) to make open-source LLMs competitive with GPT-4 on MLE-Bench. Served locally via vLLM.

LLM Agents Tree Search vLLM Inference Scaling
LLM Engineering InstaDeep

Tinkerer: AI Scientific Discovery

LLM-powered multi-agent workflow for automated scientific discovery. A closed-loop system where an AI Scientist generates research ideas, an Engineer implements them in code, a Scheduler runs experiments, and a Collector gathers results — autonomously iterating on ML research. Built with LiteLLM, Claude Code, Hydra, and Neptune.

Multi-Agent LLM Agentic Workflows LiteLLM Hydra
LLM Engineering InstaDeep

Tinkerer: ML Research Assistant

AI coding agent that generates ML ideas, implements them in code, auto-debugs using stack traces, and launches experiments on cloud compute. Acts as an autonomous research intern exploring the solution space. Built with OpenAI API, LiteLLM, Flask, and Docker.

LLM Agents Auto-Debug Flask Docker
Private

Digital Integrity Detection

Detecting GenAI-generated content and sophisticated manipulation in public media using machine learning.

GenAI Detection
LLM Engineering

DeepSeek-SFT

Supervised fine-tuning pipeline for DeepSeek-7B. Custom training data curation, LoRA/full fine-tuning experiments, and evaluation for specialized ML engineering tasks.

LLM SFT DeepSeek HuggingFace
Research

ITS-Bench

Benchmarking inference time scaling strategies on MLE-bench. Measuring how well AI agents perform at ML engineering.

Benchmark Agents
Research

AIDE-DS

AIDE: The Machine Learning CodeGen Agent. Automated ML engineering through intelligent code generation.

LLM CodeGen Agents
Graduation Project

Arabic-Swahili MT

Neural machine translation system for Arabic to Swahili, addressing low-resource language pair challenges.

NLP Translation

Let's Connect

I'm always interested in discussing research collaborations, new opportunities, or just chatting about RL and AI.