Haoran Zhang

CS @ University of Michigan · systems reliability & ML infrastructure

profile_picture.JPG

Hi, I’m Haoran Zhang

I’m a senior undergraduate in Computer Science at the University of Michigan, also pursuing a dual degree in Mechanical Engineering at Shanghai Jiao Tong University.

My research interests lies broadly in Systems Reliability, Systems for Machine Learning and Machine Learning for System. And I’m I am doing some related projects in agentic distributed system, MoE inference acceleration etc.

Research & project interests

  • Distributed systems reliability and fail-slow behavior
  • Runtime systems for GPU-heavy ML inference and training
  • Tooling for observing, diagnosing, and mitigating production incidents

Selected projects

Agentic Distributed System Ops diagram

Agentic Distributed System Ops

Agent-based auto-mitigation loop (reproduce → measure → decide → mitigate) on ZooKeeper; chaos-injection, HAProxy/Resilience4j mitigations, and Prometheus/JMX observability for overload and network faults.

View on projects page
CUDA Graph Runtime diagram

CUDA Proxy Player (Hybrid CUDA Runtime)

Hybrid CUDA runtime combining CUDA Graphs with persistent kernels to cut launch overheads and smooth tail latency on bursty MoE-style inference while keeping routing flexible.

View on projects page

COCONUT Replication

Course project on latent reasoning for LLMs (GSM8k / ProsQA) extending the COCONUT framework; instrumented prompts/beam search to study token efficiency vs. accuracy and hallucination trade-offs.