module onnxrt.validate.validate_latency
#
Short summary#
module mlprodict.onnxrt.validate.validate_latency
Command line about validation of prediction runtime.
Functions#
function |
truncated documentation |
---|---|
Measures the latency of a model (python API). |
|
Creates a dictionary of random inputs. |
Documentation#
Command line about validation of prediction runtime.
- mlprodict.onnxrt.validate.validate_latency._random_input(typ, shape, batch)#
- mlprodict.onnxrt.validate.validate_latency.latency(model, law='normal', size=1, number=10, repeat=10, max_time=0, runtime='onnxruntime', device='cpu', profiling=None)#
Measures the latency of a model (python API).
- Parameters:
model – ONNX graph
law – random law used to generate fake inputs
size – batch size, it replaces the first dimension of every input if it is left unknown
number – number of calls to measure
repeat – number of times to repeat the experiment
max_time – if it is > 0, it runs as many time during that period of time
runtime – available runtime
device – device, cpu, cuda:0
profiling – if True, profile the execution of every node, if can be sorted by name or type, the value for this parameter should e in (None, ‘name’, ‘type’),
- Returns:
dictionary or a tuple (dictionary, dataframe) if the profiling is enable
Measures model latency
The command generates random inputs and call many times the model on these inputs. It returns the processing time for one iteration.
Example:
python -m mlprodict latency --model "model.onnx"
<<<
python -m mlprodict latency --help
>>>
usage: latency [-h] [-m MODEL] [--law LAW] [-s SIZE] [-n NUMBER] [-r REPEAT] [-ma MAX_TIME] [-ru RUNTIME] [-d DEVICE] [--fmt FMT] [-p PROFILING] [-pr PROFILE_OUTPUT] Measures the latency of a model (python API). optional arguments: -h, --help show this help message and exit -m MODEL, --model MODEL ONNX graph (default: None) --law LAW random law used to generate fake inputs (default: normal) -s SIZE, --size SIZE batch size, it replaces the first dimension of every input if it is left unknown (default: 1) -n NUMBER, --number NUMBER number of calls to measure (default: 10) -r REPEAT, --repeat REPEAT number of times to repeat the experiment (default: 10) -ma MAX_TIME, --max_time MAX_TIME if it is > 0, it runs as many time during that period of time (default: 0) -ru RUNTIME, --runtime RUNTIME available runtime (default: onnxruntime) -d DEVICE, --device DEVICE device, `cpu`, `cuda:0` or a list of providers `CPUExecutionProvider, CUDAExecutionProvider (default: cpu) --fmt FMT None or `csv`, it then returns a string formatted like a csv file (default: ) -p PROFILING, --profiling PROFILING if True, profile the execution of every node, if can be sorted by name or type, the value for this parameter should e in `(None, 'name', 'type')` (default: ) -pr PROFILE_OUTPUT, --profile_output PROFILE_OUTPUT output name for the profiling if profiling is specified (default: profiling.csv)
- mlprodict.onnxrt.validate.validate_latency.random_feed(inputs, batch=10, empty_dimension=1)#
Creates a dictionary of random inputs.
- Parameters:
batch – dimension to use as batch dimension if unknown
empty_dimension – if a dimension is null, replaces it by this value
- Returns:
dictionary