Publications
TECHCON '25
SpliDT: Partitioned Decision Trees for Scalable Stateful ML Inference at Line Rate
Marilyn Rego, Murayyiam Parvez, Annus Zulfiqar, Roman Beltiukov, Shir Landau Feibish, Walter Willinger, Arpit Gupta, Muhammad Shahbaz
NSDI '25 (Poster)
A Smart Cache for a SmartNIC!
Annus Zulfiqar, Ali Imran, Venkat Kunaparaju, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz
NSDI '25 (Poster)
BranchPipe: Scalable Decision Trees for Stateful Processing at Line Rate
Murayyiam Parvez, Annus Zulfiqar, Roman Beltiukov, Shir Landau Feibish, Walter Willinger, Arpit Gupta, Muhammad Shahbaz
Euro S&P '25
O'MINE: A Novel Collaborative DDoS Detection Mechanism for Programmable Data-Planes
Enkeleda Bardhi, Chenxing Ji, Ali Imran, Muhammad Shahbaz, Riccardo Lazzeretti, Mauro Conti, Fernando Kuipers
ISCA '25
HardHarvest: Hardware-Supported Core Harvesting for Microservices
Jovan Stojkovic, Chunao Liu, Muhammad Shahbaz, Josep Torrellas
ASPLOS '25
Gigaflow: Pipeline-Aware Sub-Traversal Caching for Modern SmartNICs
Annus Zulfiqar, Ali Imran, Venkat Kunaparaju, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz

Computer Science and Engineering

Streamlining cloud traffic with a Gigaflow Cache

Gigaflow improves virtual switches for programmable SmartNICs, delivering a 51% higher hit rate and 90% lower misses.

Tech Xplore

Gigaflow cache streamlines cloud traffic, with 51% higher hit rate and 90% lower misses for programmable SmartNICs

A new way to temporarily store memory, Gigaflow, helps direct heavy traffic in cloud data centers caused by AI and machine learning workloads, according to a study led by University of Michigan researchers.

15:34

YouTube

Gigaflow - Pipeline-Aware Sub-Traversal Caching for Modern SmartNICs (ASPLOS 2025)

Learn about Gigaflow: a high hit rate, SmartNIC-native cache for virtual switches (like OVS) that expands rule space coverage by two orders of magnitude and reduces cache misses by up to 90%. This work was presented as ASPLOS'25.

NSDI '25
OptiReduce: Resilient and Tail-Optimal AllReduce for Distributed Deep Learning in the Cloud
Ertza Warraich, Omer Shabtai, Khalid Manaa, Shay Vargaftik, Yonatan Piasetzky, Matty Kadosh, Lalith Suresh, Muhammad Shahbaz
Google Research Scholar Award

Computer Science and Engineering

Perfect is the enemy of good for distributed deep learning in the cloud

Leveraging deep learning’s resilience, approximating data lost by allowing some servers to time out speeds up model training while preserving performance.

Tech Xplore

Perfect is the enemy of good for distributed deep learning in the cloud

A new communication-collective system, OptiReduce, speeds up AI and machine learning training across multiple cloud servers by setting time boundaries rather than waiting for every server to catch up, ...

Compound AI Systems Workshop & PACMI '24
Caravan: Practical Online Learning of In-Network ML Models with Labeling Agents
Qizheng Zhang, Ali Imran, Enkeleda Bardhi, Tushar Swamy, Muhammad Shahbaz, Kunle Olukotun
APSys '24 (Poster)
EdgeScaler: Smart (Auto-)Scaling for the 5G Edge
Lauren Trinks (Lead Undergraduate Student), Bilal Saleem, Muhammad Shahbaz
TECHCON '24
GigaFlow: A Scalable and Efficient Hardware Fast-Path for Open vSwitch
Venkat Kunaparaju, Annus Zulfiqar, Ali Imran, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz
HotChips '24
A Smart Cache for a SmartNIC!
Annus Zulfiqar, Ali Imran, Venkat Kunaparaju, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz
OSDI '24
Caravan: Practical Online Learning of In-Network ML Models with Labeling Agents
Qizheng Zhang, Ali Imran, Enkeleda Bardhi, Tushar Swamy, Nathan Zhang, Muhammad Shahbaz, Kunle Olukotun
SRC JUMP 2.0 Best Paper Award
IMC '23
Modeling and Generating Control-Plane Traffic for Cellular Networks
Jiayi Meng, Jinqgi Huang, Y. Charlie Hu, Yaron Koral, Xiaojun Lin, Muhammad Shahbaz, Abhigyan Sharma
Best Paper Award (Runner Up)
TECHCON '23
Towards a Performant and Scalable Cloud-Native 5G Mobile Core Architecture
Jinqgi Huang*, Jiayi Meng*, Bilal Saleem*, Iftekhar Alam, Ajay Thakur, Muhammad Shahbaz, Christian Maciocco, Y. Charlie Hu (*co-primary)
SIGCOMM CCR '23
The Slow Path Needs an Accelerator Too!
Annus Zulfiqar, Ben Pfaff, William Tu, Gianni Antichi, Muhammad Shahbaz
ISCA '23
μManycore: A Cloud-Native CPU for Tail at Scale
Jovan Stojkovic, Chunao Liu, Muhammad Shahbaz, Josep Torrellas
IEEE Micro Top Picks (Honorable Mention)
ASPLOS '23
Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter Networks
Tushar Swamy, Annus Zulfiqar, Muhammad Shahbaz, Luigi Nardi, Kunle Olukotun
ASPLOS Distinguished Artifact Award
YArch '23
Hardware Support for Efficient and Secure Resource Harvesting in the Cloud
Jovan Stojkovic, Chunao Liu, Muhammad Shahbaz, Josep Torrellas
arXiv '22
Enabling the Reflex Plane with the nanoPU
Stephen Ibanez, Alex Mallery, Serhat Arslan, Theo Jepsen, Muhammad Shahbaz, Changhoon Kim, Nick McKeown
NativeNI '22
The Case for Native Multi-Node In-Network Machine Learning
Lorenzo Bracciale, Tushar Swamy, Muhammad Shahbaz, Pierpaolo Loreti, Stefano Salsano, Hesham Elbakoury
OSDI '22 (Poster)
Ultima: Robust and Tail-Optimal All-Reduce for Distributed Deep Learning
Ertza Warraich, Leonard Liu, Omer Shabtai, Yonatan Piasetzky, Shay Vargaftik, Matty Kadosh, Lalith Suresh, Muhammad Shahbaz
P4 Workshop '22
Accelerating 5G (Mobile Core) Control Plane using P4
Jingqi Huang*, Jiayi Meng*, Iftekhar Alam, Christian Maciocco, Y. Charlie Hu, Muhammad Shahbaz (*co-primary)
P4 Workshop '22
Primitives for Finite Field Arithmetic in Network Switches
Daniel Seara, Bernardo Conde, Eduard Marin, Muhammad Shahbaz, Muriel Medard, Fernando Ramos
P4 Workshop '22 & NVMW '22
PMNet: In-Network Data Persistence
Korakit Seemakhupt, Sihang Liu, Yasas Senevirathne, Muhammad Shahbaz, Samira Khan
ASPLOS '22
Taurus: A Data Plane Architecture for Per-Packet ML
Tushar Swamy, Alexander Rucker, Muhammad Shahbaz, Ishan Gaur, Kunle Olukotun
IETF/IRTF ANRP Prize
IEEE Micro Top Picks (Honorable Mention)

APNIC Blog

Taurus: A data plane architecture for per-packet machine learning | APNIC Blog

Taurus bridges the gap between speed and intelligence by running machine-learning models directly in the network on every packet.

IEEE Computer Architecture Letters (CAL) '21
Chopping Off the Tail: Bounded Non-Determinism for Real-Time Accelerators
Alexander Rucker, Muhammad Shahbaz, Kunle Olukotun
Best of CAL Paper Award
Facebook Research Award
SIGCOMM '21 (Poster)
Constructing the Face of Network Data
Ertza Warraich, Muhammad Shahbaz
OSDI '21
The nanoPU: A Nanosecond RPC Stack for Data Centers
Stephen Ibanez, Alex Mallery, Serhat Arslan, Theo Jepsen, Muhammad Shahbaz, Changhoon Kim, Nick McKeown
Google Faculty Award

The Next Platform

Forget Microservices: A NIC-CPU Co-Design For The Nanoservices Era

The remote procedure call, or RPC, might be the single most important invention in the history of modern computing. The ability to reach out from a

42:32

YouTube

nanoPU: Redesigning the CPU-Network Interface to Minimize RPC Tail Latency - Steve Ibanez, Stanford

Presentation for the ONF 5G Connected Edge Cloud for Industry 4.0 Transformation.

ISCA '21
SARA: Scaling a Reconfigurable Dataflow Accelerator
Yaqi Zhang, Nathan Zhang, Tian Zhao, Matt Vilim, Muhammad Shahbaz, Kunle Olukotun
ISCA '21
PMNet: In-Network Data Persistence
Korakit Seemakhupt, Sihang Liu, Yasas Senevirathne, Muhammad Shahbaz, Samira Khan

15:53

YouTube

PMNet: In-Network Data Persistence

This is a talk for our ISCA paper -- PMNet: In-Network Data Persistence The paper: https://www.cs.virginia.edu/~smk9u/PMNet_ISCA2021.pdf