Haoran Zhang
CS @ University of Michigan · systems reliability & ML infrastructure
Hi, I’m Haoran Zhang
I’m a senior undergraduate in Computer Science at the University of Michigan, also pursuing a dual degree in Mechanical Engineering at Shanghai Jiao Tong University.
My research interests lies broadly in Systems Reliability, Systems for Machine Learning and Machine Learning for System. And I’m I am doing some related projects in agentic distributed system, MoE inference acceleration etc.
Research & project interests
- Distributed systems reliability and fail-slow behavior
- Runtime systems for GPU-heavy ML inference and training
- Tooling for observing, diagnosing, and mitigating production incidents
Selected projects
Agentic Distributed System Ops
Agent-based auto-mitigation loop (reproduce → measure → decide → mitigate) on ZooKeeper; chaos-injection, HAProxy/Resilience4j mitigations, and Prometheus/JMX observability for overload and network faults.
View on projects page
CUDA Proxy Player (Hybrid CUDA Runtime)
Hybrid CUDA runtime combining CUDA Graphs with persistent kernels to cut launch overheads and smooth tail latency on bursty MoE-style inference while keeping routing flexible.
View on projects pageCOCONUT Replication
Course project on latent reasoning for LLMs (GSM8k / ProsQA) extending the COCONUT framework; instrumented prompts/beam search to study token efficiency vs. accuracy and hallucination trade-offs.