-
Packrat: Automatic Reconfiguration for Latency Minimization in CPU-based DNN ...
Packrat is a serving system for online inference that automatically determines the number of threads that need to be allocated to model instances to minimize inference latency.