NextGArch Lab: Publications

Publications
NSDI '26 > Poster
mrLLM: Fast Multi-Region LLM Inference using Learned Adaptors
Marilyn Rego, Maxwell Kumbong, Hermann Kumbong, Ertza Warraich, Muhammad Shahbaz
NSDI '26
SpliDT: Partitioned Decision Trees for Scalable Stateful Inference at Line Rate
Murayyiam Parvez*, Annus Zulfiqar*, Roman Beltiukov, Shir Landau Feibish, Walter Willinger, Arpit Gupta, Muhammad Shahbaz (*co-primary)
PDF
Website
Artifact
33:26
YouTube
SpliDT: Partitioned Decision Trees for Scalable Stateful Inference at Line Rate || P4 Developer Day
Machine learning is increasingly used in programmable data planes, such as switches and smartNICs, to enable real-time traffic analysis and security monitoring at line rate. Decision trees (DTs) are particularly well-suited for these tasks due to their interpretability and compatibility with the Reconfigurable Match-Action Table (RMT) architecture. However, current DT implementations require collecting all features upfront, which limits scalability and accuracy due to constrained data plane reso


IEEE Computer Architecture Letters (CAL) '25
Reimagining RDMA Through the Lens of ML
Ertza Warraich, Ali Imran, Annus Zulfiqar, Shay Vargaftik, Sonia Fahmy, Muhammad Shahbaz
PDF
P4 Workshop '25 > Demo
Gigaflow: Pipeline-Aware Caching in Virtual Switches with P4
Advay Singh, Annus Zulfiqar, Ali Imran, Muhammad Shahbaz
MICRO '25
NetSparse: In-Network Acceleration of Distributed Sparse Kernels
Gerasimos Gerogiannis, Charles Block, Dimitrios Merkouriadis, Annus Zulfiqar, Filippos Tofalos, Muhammad Shahbaz, Josep Torrellas
SIGCOMM '25 > Shorts
SpliDT: Partitioned Decision Trees for Scalable Stateful Inference at Line Rate
Murayyiam Parvez, Annus Zulfiqar, Roman Beltiukov, Shir Landau Feibish, Walter Willinger, Arpit Gupta, Muhammad Shahbaz
PDF
SIGCOMM '25 > Poster
Kairo: Incremental View Maintenance for Scalable Virtual Switch Caching
Annus Zulfiqar, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz
PDF
TECHCON '25
SpliDT: Partitioned Decision Trees for Scalable Stateful ML Inference at Line Rate
Marilyn Rego, Murayyiam Parvez, Annus Zulfiqar, Roman Beltiukov, Shir Landau Feibish, Walter Willinger, Arpit Gupta, Muhammad Shahbaz
Euro S&P '25
O'MINE: A Novel Collaborative DDoS Detection Mechanism for Programmable Data-Planes
Enkeleda Bardhi, Chenxing Ji, Ali Imran, Muhammad Shahbaz, Riccardo Lazzeretti, Mauro Conti, Fernando Kuipers
PDF
ISCA '25
HardHarvest: Hardware-Supported Core Harvesting for Microservices
Jovan Stojkovic, Chunao Liu, Muhammad Shahbaz, Josep Torrellas
PDF
ASPLOS '25
Gigaflow: Pipeline-Aware Sub-Traversal Caching for Modern SmartNICs
Annus Zulfiqar, Ali Imran, Venkat Kunaparaju, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz
PDF
Website
Artifact
p4.org
Gigaflow: Pipeline-Aware Sub-Traversal Caching for Modern SmartNICs – P4 – Language Consortium
Figure 1: (a) A traversal is a complete sequence of table lookups through the vSwitch pipeline that generates a Megaflow rule. (b) A sub-traversal is a subset of these lookups within a traversal, capturing smaller, reusable segments shared across multiple flows.
Computer Science and Engineering
Streamlining cloud traffic with a Gigaflow Cache
Gigaflow improves virtual switches for programmable SmartNICs, delivering a 51% higher hit rate and 90% lower misses.
Tech Xplore
Gigaflow cache streamlines cloud traffic, with 51% higher hit rate and 90% lower misses for programmable SmartNICs
A new way to temporarily store memory, Gigaflow, helps direct heavy traffic in cloud data centers caused by AI and machine learning workloads, according to a study led by University of Michigan researchers.
15:34
YouTube
Gigaflow - Pipeline-Aware Sub-Traversal Caching for Modern SmartNICs (ASPLOS 2025)
Learn about Gigaflow: a high hit rate, SmartNIC-native cache for virtual switches (like OVS) that expands rule space coverage by two orders of magnitude and reduces cache misses by up to 90%. 

This work was presented as ASPLOS'25.
NSDI '25
OptiReduce: Resilient and Tail-Optimal AllReduce for Distributed Deep Learning in the Cloud 
Ertza Warraich, Omer Shabtai, Khalid Manaa, Shay Vargaftik, Yonatan Piasetzky, Matty Kadosh, Lalith Suresh, Muhammad Shahbaz
Google Research Scholar Award
PDF
Website
Artifact
Computer Science and Engineering
Perfect is the enemy of good for distributed deep learning in the cloud
Leveraging deep learning’s resilience, approximating data lost by allowing some servers to time out speeds up model training while preserving performance.
Tech Xplore
Perfect is the enemy of good for distributed deep learning in the cloud
A new communication-collective system, OptiReduce, speeds up AI and machine learning training across multiple cloud servers by setting time boundaries rather than waiting for every server to catch up, ...
NSDI '25 > Poster
A Smart Cache for a SmartNIC!
Annus Zulfiqar, Ali Imran, Venkat Kunaparaju, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz
PDF
NSDI '25 > Poster
BranchPipe: Scalable Decision Trees for Stateful Processing at Line Rate
Murayyiam Parvez, Annus Zulfiqar, Roman Beltiukov, Shir Landau Feibish, Walter Willinger, Arpit Gupta, Muhammad Shahbaz
PDF


Compound AI Systems Workshop & PACMI '24
Caravan: Practical Online Learning of In-Network ML Models with Labeling Agents
Qizheng Zhang, Ali Imran, Enkeleda Bardhi, Tushar Swamy, Muhammad Shahbaz, Kunle Olukotun
APSys '24 > Poster
EdgeScaler: Smart (Auto-)Scaling for the 5G Edge
Lauren Trinks (Lead Undergraduate Student), Bilal Saleem, Muhammad Shahbaz
TECHCON '24
GigaFlow: A Scalable and Efficient Hardware Fast-Path for Open vSwitch
Venkat Kunaparaju, Annus Zulfiqar, Ali Imran, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz
HotChips '24
A Smart Cache for a SmartNIC!
Annus Zulfiqar, Ali Imran, Venkat Kunaparaju, Ben Pfaff, Gianni Antichi, Muhammad Shahbaz
OSDI '24
Caravan: Practical Online Learning of In-Network ML Models with Labeling Agents
Qizheng Zhang, Ali Imran, Enkeleda Bardhi, Tushar Swamy, Nathan Zhang, Muhammad Shahbaz, Kunle Olukotun
SRC JUMP 2.0 Best Paper Award
PDF
Artifact


IMC '23
Modeling and Generating Control-Plane Traffic for Cellular Networks
Jiayi Meng, Jinqgi Huang, Y. Charlie Hu, Yaron Koral, Xiaojun Lin, Muhammad Shahbaz, Abhigyan Sharma
Best Paper Award (Runner Up)
PDF
Artifact
TECHCON '23
Towards a Performant and Scalable Cloud-Native 5G Mobile Core Architecture
Jinqgi Huang*, Jiayi Meng*, Bilal Saleem*, Iftekhar Alam, Ajay Thakur, Muhammad Shahbaz, Christian Maciocco, Y. Charlie Hu (*co-primary)
SIGCOMM CCR '23
The Slow Path Needs an Accelerator Too!
Annus Zulfiqar, Ben Pfaff, William Tu, Gianni Antichi, Muhammad Shahbaz
PDF
ISCA '23
μManycore: A Cloud-Native CPU for Tail at Scale
Jovan Stojkovic, Chunao Liu, Muhammad Shahbaz, Josep Torrellas
IEEE Micro Top Picks (Honorable Mention)
PDF
ASPLOS '23
Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter Networks
Tushar Swamy, Annus Zulfiqar, Muhammad Shahbaz, Luigi Nardi, Kunle Olukotun
ASPLOS Distinguished Artifact Award
PDF
Artifact
YArch '23
Hardware Support for Efficient and Secure Resource Harvesting in the Cloud
Jovan Stojkovic, Chunao Liu, Muhammad Shahbaz, Josep Torrellas


arXiv '22
Enabling the Reflex Plane with the nanoPU
Stephen Ibanez, Alex Mallery, Serhat Arslan, Theo Jepsen, Muhammad Shahbaz, Changhoon Kim, Nick McKeown
Preprint
NativeNI '22
The Case for Native Multi-Node In-Network Machine Learning
Lorenzo Bracciale, Tushar Swamy, Muhammad Shahbaz, Pierpaolo Loreti, Stefano Salsano, Hesham Elbakoury
PDF
OSDI '22 > Poster
Ultima: Robust and Tail-Optimal All-Reduce for Distributed Deep Learning
Ertza Warraich, Leonard Liu, Omer Shabtai, Yonatan Piasetzky, Shay Vargaftik, Matty Kadosh, Lalith Suresh, Muhammad Shahbaz
PDF
P4 Workshop '22
Accelerating 5G (Mobile Core) Control Plane using P4
Jingqi Huang*, Jiayi Meng*, Iftekhar Alam, Christian Maciocco, Y. Charlie Hu, Muhammad Shahbaz (*co-primary)
P4 Workshop '22
Primitives for Finite Field Arithmetic in Network Switches
Daniel Seara, Bernardo Conde, Eduard Marin, Muhammad Shahbaz, Muriel Medard, Fernando Ramos
P4 Workshop '22 & NVMW '22
PMNet: In-Network Data Persistence
Korakit Seemakhupt, Sihang Liu, Yasas Senevirathne, Muhammad Shahbaz, Samira Khan
ASPLOS '22
Taurus: A Data Plane Architecture for Per-Packet ML
Tushar Swamy, Alexander Rucker, Muhammad Shahbaz, Ishan Gaur, Kunle Olukotun
IETF/IRTF ANRP Prize 
IEEE Micro Top Picks (Honorable Mention)
PDF
Artifact
Tutorial
APNIC Blog
Taurus: A data plane architecture for per-packet machine learning | APNIC Blog
Taurus bridges the gap between speed and intelligence by running machine-learning models directly in the network on every packet.


IEEE Computer Architecture Letters (CAL) '21
Chopping Off the Tail: Bounded Non-Determinism for Real-Time Accelerators
Alexander Rucker, Muhammad Shahbaz, Kunle Olukotun
Best of CAL Paper Award 
Facebook Research Award
PDF
SIGCOMM '21 > Poster
Constructing the Face of Network Data
Ertza Warraich, Muhammad Shahbaz
PDF
OSDI '21
The nanoPU: A Nanosecond RPC Stack for Data Centers
Stephen Ibanez, Alex Mallery, Serhat Arslan, Theo Jepsen, Muhammad Shahbaz, Changhoon Kim, Nick McKeown
Google Faculty Award
PDF
Artifact
The Next Platform
Forget Microservices: A NIC-CPU Co-Design For The Nanoservices Era
The remote procedure call, or RPC, might be the single most important invention in the history of modern computing. The ability to reach out from a
42:32
YouTube
nanoPU: Redesigning the CPU-Network Interface to Minimize RPC Tail Latency - Steve Ibanez, Stanford
Presentation for the ONF 5G Connected Edge Cloud for Industry 4.0 Transformation.
ISCA '21
SARA: Scaling a Reconfigurable Dataflow Accelerator
Yaqi Zhang, Nathan Zhang, Tian Zhao, Matt Vilim, Muhammad Shahbaz, Kunle Olukotun
PDF
ISCA '21
PMNet: In-Network Data Persistence
Korakit Seemakhupt, Sihang Liu, Yasas Senevirathne, Muhammad Shahbaz, Samira Khan
PDF
15:53
YouTube
PMNet: In-Network Data Persistence
This is a talk for our ISCA paper --
PMNet: In-Network Data Persistence
The paper: https://www.cs.virginia.edu/~smk9u/PMNet_ISCA2021.pdf
2020 and earlier …