module cli.latency_cli
¶
Short summary¶
module mlprodict.cli.latency_cli
Command line about validation of prediction runtime.
Functions¶
function |
truncated documentation |
---|---|
Measures the latency of a model (python API). |
|
Creates a dictionary of random inputs. |
Documentation¶
Command line about validation of prediction runtime.
- mlprodict.cli.latency_cli._random_input(typ, shape, batch)¶
- mlprodict.cli.latency_cli.latency(model, law='normal', size=1, number=10, repeat=10, max_time=0, runtime='onnxruntime', device='cpu', fmt=None, profiling=None, profile_output='profiling.csv')¶
Measures the latency of a model (python API).
- Parameters
model – ONNX graph
law – random law used to generate fake inputs
size – batch size, it replaces the first dimension of every input if it is left unknown
number – number of calls to measure
repeat – number of times to repeat the experiment
max_time – if it is > 0, it runs as many time during that period of time
runtime – available runtime
device – device, cpu, cuda:0
fmt – None or csv, it then returns a string formatted like a csv file
profiling – if True, profile the execution of every node, if can be by name or type.
profile_output – output name for the profiling if profiling is specified
Measures model latency
The command generates random inputs and call many times the model on these inputs. It returns the processing time for one iteration.
Example:
python -m mlprodict latency --model "model.onnx"
<<<
python -m mlprodict latency --help
>>>
usage: latency [-h] [-m MODEL] [--law LAW] [-s SIZE] [-n NUMBER] [-r REPEAT] [-ma MAX_TIME] [-ru RUNTIME] [-d DEVICE] [--fmt FMT] [-p PROFILING] [-pr PROFILE_OUTPUT] Measures the latency of a model (python API). optional arguments: -h, --help show this help message and exit -m MODEL, --model MODEL ONNX graph (default: None) --law LAW random law used to generate fake inputs (default: normal) -s SIZE, --size SIZE batch size, it replaces the first dimension of every input if it is left unknown (default: 1) -n NUMBER, --number NUMBER number of calls to measure (default: 10) -r REPEAT, --repeat REPEAT number of times to repeat the experiment (default: 10) -ma MAX_TIME, --max_time MAX_TIME if it is > 0, it runs as many time during that period of time (default: 0) -ru RUNTIME, --runtime RUNTIME available runtime (default: onnxruntime) -d DEVICE, --device DEVICE device, `cpu`, `cuda:0` (default: cpu) --fmt FMT None or `csv`, it then returns a string formatted like a csv file (default: ) -p PROFILING, --profiling PROFILING if True, profile the execution of every node, if can be by name or type. (default: ) -pr PROFILE_OUTPUT, --profile_output PROFILE_OUTPUT output name for the profiling if profiling is specified (default: profiling.csv)
- mlprodict.cli.latency_cli.random_feed(inputs, batch=10)¶
Creates a dictionary of random inputs.
- Parameters
batch – dimension to use as batch dimension if unknown
- Returns
dictionary