Hosted Large Language Model Distributed Inference Workshop at Harvard
Published:
I recently hosted a hands-on workshop at Harvard University, focused on the technical challenges and solutions for Large Language Model (LLM) Distributed Inference. As both a Lead ML Research Engineer and a PyTorch Ambassador, my goal was to bridge the gap between theoretical AI models and the high-performance computing (HPC) infrastructure required to serve them at scale.
The workshop provided a deep dive into the LLM lifecycle, specifically focusing on the Deployment and Monitoring phase. We explored several key technical areas:
Inference Engines: How vLLM functions within the AI ecosystem to solve bottlenecks like serial execution and static VRAM allocation.
PagedAttention: A look at how vLLM’s core innovation eliminates memory waste by partitioning the KV cache into non-contiguous physical blocks.
Multi-GPU Scaling: Strategies for sharding massive models, such as Llama 3 405B, using Tensor Parallelism (TP) and Pipeline Parallelism (PP) across multi-node clusters.
Continuous Batching: Demonstrating how dynamic request injection and the removal of sync barriers significantly improve GPU occupancy and reduce latency.
During the hands-on labs, participants accessed the Kempner HPC cluster to run offline batch inference and deploy their own vLLM inference servers, interacting with model endpoints via OpenAI-compatible APIs and monitoring real-time activity with nvtop.
All materials, including the lab guides and code, are available in the public repository:
Workshop Materials: https://github.com/KempnerInstitute/distributed-inference-vllm
Full Presentation: LLM Distributed Inference Slides
Special Thanks
This workshop would not have been possible without the support and dedication of the following individuals:
Associate Director of Education
- Denise Yoon
Teaching Assistants (listed alphabetically)
- Bala Desinghu
- Yasin Mazloumi
- Nihal Vivekanand Nayak
- Timothy Ngotiaoco
Their expertise and commitment to helping participants navigate the hands-on labs made this workshop a success.

Workshop Flyer: Download PDF
