← Home

2026 Projects

Things I Plan to Build

Fundamentals

Backpropagation Engine

Planned

Micrograd-style autograd.

Transformer from Scratch

Planned

NumPy/PyTorch, no nn.Transformer.

BPE Tokenizer from Scratch

Planned

Byte-pair encoding implementation.

GPT from Scratch

Planned

Decoder-only transformer.

Train a Small LLM

Planned

100M-1B params, real data pipeline, proper training loop.

Inference & Systems

Basic Inference Server

Planned

Batching, concurrent requests, measure bottlenecks.

CUDA Kernel — Matrix Multiplication

Planned

Simple CUDA matmul implementation.

CUDA Kernel — Fused Softmax

Planned

Fused softmax kernel.

CUDA Kernel — Attention

Planned

Custom attention kernel.

FlashAttention from Scratch

Planned

Full FlashAttention implementation.

KV Cache Implementation

Planned

KV cache with optimization.

Quantization — GPTQ

Planned

Implement GPTQ from scratch.

Quantization — AWQ

Planned

Implement AWQ from scratch.

Speculative Decoding

Planned

Speculative decoding implementation.

PagedAttention

Planned

PagedAttention implementation.

Continuous Batching System

Planned

Dynamic batching for inference.

Full Inference Engine

Planned

Combine all the above, benchmark against vLLM.

ML Compiler

Planned

PyTorch-like API → optimized GPU kernels, TinyGrad-style.

Multimodal & Research

Fine-tuning — LoRA from Scratch

Planned

Implement LoRA from scratch.

Fine-tuning — QLoRA from Scratch

Planned

Implement QLoRA from scratch.

Embedding Similarity Search

Planned

Implement HNSW or IVF from scratch.

RAG System End-to-End

Planned

Full retrieval-augmented generation pipeline.

Image Diffusion — DDPM

Planned

Implement DDPM from scratch.

Image Diffusion — DDIM

Planned

Implement DDIM.

Image Diffusion — Latent Diffusion

Planned

Implement latent diffusion.

Consistency Models

Planned

Few-step generation.

Vision Encoder — ViT from Scratch

Planned

Implement ViT from scratch.

Vision Encoder — CLIP-style Training

Planned

Contrastive training.

Train Your Own VLM

Planned

Vision encoder + LLM + projection layer.

OCR / Document Understanding Model

Planned

Document understanding.

Video Diffusion from Scratch

Planned

Temporal consistency, 3D attention.

World Model / Frame Predictor

Planned

Game environment, action-conditioned.

Neural Video Codec

Planned

Learned video compression.

Multimodal Tokenizer

Planned

Unified text/image/video representation.

VLA for Robotics

Planned

Deploy on real or simulated hardware.

Deep Systems

x86 Assembly Fundamentals

Planned

Write non-trivial programs.

Bootloader

Planned

BIOS/UEFI, get to protected mode, load a kernel.

OS — Interrupt Handling

Planned

Interrupt handling implementation.

OS — Physical Memory Manager

Planned

Physical memory management.

OS — Virtual Memory and Paging

Planned

Virtual memory implementation.

OS — Process Abstraction and Scheduling

Planned

Process scheduler.

OS — System Calls

Planned

Syscall interface.

OS — Basic Userspace

Planned

Run ELF binaries.

Filesystem — Basic Implementation

Planned

Basic filesystem.

Filesystem — Journaling

Planned

Add journaling support.

Network Stack — TCP/IP

Planned

Implement TCP/IP from scratch.

POSIX-ish OS

Planned

Port something real (bash, vim, etc).

Hypervisor — Type-2

Planned

Run Linux inside your hypervisor.

Hypervisor — VT-x/AMD-V

Planned

Implement VT-x/AMD-V support.

GPU Driver Basics

Planned

Talk to hardware, submit commands.

Custom Accelerator Design

Planned

Define your own ML accelerator ISA.

Cycle-accurate Simulator

Planned

For your accelerator.

Overlap

End-to-End Inference Stack

Planned

Your compiler → your runtime → your kernels → beat PyTorch.

On-Device VLM

Planned

Phone/edge, requires quantization + architecture search + kernel optimization.

Browser-based Inference

Planned

WebGPU/WASM, real model, acceptable perf.

Distributed Training System

Planned

From scratch: all-reduce, gradient compression, fault tolerance.

Self-Improving Code Agent

Planned

Modifies its own inference code based on profiling.

Training & Scaling

Distributed Training from Scratch

Planned

DDP, FSDP, pipeline parallelism.

Gradient Checkpointing

Planned

Implementation from scratch.

Mixed Precision Training

Planned

Implement loss scaling, understand numerics.

Data Loading at Scale

Planned

Don't let your GPUs starve.

Curriculum Learning

Planned

Data mixing strategies.

Pretraining a Real Model

Planned

Not just fine-tuning.

RLHF/DPO from Scratch

Planned

Full implementation.

Scaling Laws

Planned

Run experiments, understand compute-optimal training.

Debug Training Runs

Planned

Loss spikes, instabilities, diagnose what went wrong.

Agents & Reasoning

Tool Use Framework

Planned

From scratch.

Code Execution Sandbox

Planned

Safe code execution environment.

Planning Algorithms

Planned

MCTS, tree search with LLMs.

Multi-agent Systems

Planned

Coordination and communication.

Long-horizon Task Completion

Planned

Extended task execution.

Memory Systems

Planned

Beyond RAG — working memory, episodic memory.

Self-reflection and Error Correction

Planned

Correction loops.

Benchmark Your Agents

Planned

Rigorously, not just vibes.

Evals & Interpretability

Eval Harnesses from Scratch

Planned

Build evaluation frameworks.

Design Your Own Benchmarks

Planned

Harder than it sounds.

Probing Classifiers

Planned

Understand what representations encode.

Activation Patching

Planned

Causal tracing.

Sparse Autoencoders

Planned

On activations.

Attention Head Analysis

Planned

Induction heads, etc.

Circuit Analysis

Planned

Find algorithms in weights.

Red-teaming

Planned

Adversarial robustness.

Data & Infrastructure

Large-scale Data Pipelines

Planned

Crawling, filtering, deduplication.

Data Quality Classifiers

Planned

Quality filtering models.

PII Detection and Scrubbing

Planned

Privacy-preserving data processing.

Synthetic Data Generation

Planned

Generate training data.

Dataset Curation

Planned

Underrated — taste matters.

Cluster Management

Planned

Slurm, Kubernetes for ML.

Experiment Tracking

Planned

Reproducibility.

Cost Optimization

Planned

Spot instances, efficient scheduling.

Applied / Product

Prompt Engineering at Depth

Planned

Not just tricks — systematic optimization.

Structured Output

Planned

Constrained generation.

Streaming and Real-time UX

Planned

Real-time user experience.

Caching Strategies

Planned

Semantic cache, KV cache reuse across requests.

Guardrails and Content Filtering

Planned

Safety systems.

Latency Optimization

Planned

For production.

A/B Testing ML Systems

Planned

Experimentation infrastructure.

Build a Product

Planned

People actually use.