Packrat: Automatic Reconfiguration for Latency Minimization in CPU-based DNN Serving

Packrat is a serving system for online inference that automatically determines the number of threads that need to be allocated to model instances to minimize inference latency.

Data and Resources

Cite this as

Ankit Bhardwaj, Amar Phanishayee, Deepak Narayanan, Mihail Tarta, Ryan Stutsman (2025). Dataset: Packrat: Automatic Reconfiguration for Latency Minimization in CPU-based DNN Serving. https://doi.org/10.57702/7d1mx8dz

DOI retrieved: January 2, 2025

Additional Info

Field Value
Created January 2, 2025
Last update January 2, 2025
Defined In https://doi.org/10.48550/arXiv.2311.18174
Author Ankit Bhardwaj
More Authors
Amar Phanishayee
Deepak Narayanan
Mihail Tarta
Ryan Stutsman
Homepage https://arxiv.org/abs/2203.12379