Hosted Large Language Model Distributed Inference Workshop at Harvard

Published: March 16, 2026

I recently hosted a hands-on workshop at Harvard University, focused on the technical challenges and solutions for Large Language Model (LLM) Distributed Inference. As both a Lead ML Research Engineer and a PyTorch Ambassador, my goal was to bridge the gap between theoretical AI models and the high-performance computing (HPC) infrastructure required to serve them at scale.

The workshop provided a deep dive into the LLM lifecycle, specifically focusing on the Deployment and Monitoring phase. We explored several key technical areas:

Inference Engines: How vLLM functions within the AI ecosystem to solve bottlenecks like serial execution and static VRAM allocation.
PagedAttention: A look at how vLLM’s core innovation eliminates memory waste by partitioning the KV cache into non-contiguous physical blocks.
Multi-GPU Scaling: Strategies for sharding massive models, such as Llama 3 405B, using Tensor Parallelism (TP) and Pipeline Parallelism (PP) across multi-node clusters.
Continuous Batching: Demonstrating how dynamic request injection and the removal of sync barriers significantly improve GPU occupancy and reduce latency.

During the hands-on labs, participants accessed the Kempner HPC cluster to run offline batch inference and deploy their own vLLM inference servers, interacting with model endpoints via OpenAI-compatible APIs and monitoring real-time activity with nvtop.

All materials, including the lab guides and code, are available in the public repository:

Workshop Materials: https://github.com/KempnerInstitute/distributed-inference-vllm
Full Presentation: LLM Distributed Inference Slides

Special Thanks

This workshop would not have been possible without the support and dedication of the following individuals:

Associate Director of Education

Denise Yoon

Teaching Assistants (listed alphabetically)

Bala Desinghu
Yasin Mazloumi
Nihal Vivekanand Nayak
Timothy Ngotiaoco

Their expertise and commitment to helping participants navigate the hands-on labs made this workshop a success.

Workshop Flyer: Download PDF

Naeem Khoshnevis

Special Thanks