module `onnxrt.validate.validate_latency`#

Short summary#

module mlprodict.onnxrt.validate.validate_latency

Command line about validation of prediction runtime.

Functions#

function	truncated documentation
`_random_input`
`latency`	Measures the latency of a model (python API).
`random_feed`	Creates a dictionary of random inputs.

Documentation#

Command line about validation of prediction runtime.

source on GitHub

mlprodict.onnxrt.validate.validate_latency._random_input(typ, shape, batch)#

mlprodict.onnxrt.validate.validate_latency.latency(model, law='normal', size=1, number=10, repeat=10, max_time=0, runtime='onnxruntime', device='cpu', profiling=None)#

Measures the latency of a model (python API).

Parameters:

model – ONNX graph
law – random law used to generate fake inputs
size – batch size, it replaces the first dimension of every input if it is left unknown
number – number of calls to measure
repeat – number of times to repeat the experiment
max_time – if it is > 0, it runs as many time during that period of time
runtime – available runtime
device – device, cpu, cuda:0
profiling – if True, profile the execution of every node, if can be sorted by name or type, the value for this parameter should e in (None, ‘name’, ‘type’),

Returns:

dictionary or a tuple (dictionary, dataframe) if the profiling is enable

Measures model latency

The command generates random inputs and call many times the model on these inputs. It returns the processing time for one iteration.

Example:

python -m mlprodict latency --model "model.onnx"

<<<

python -m mlprodict latency --help

>>>

usage: latency [-h] [-m MODEL] [--law LAW] [-s SIZE] [-n NUMBER] [-r REPEAT]
               [-ma MAX_TIME] [-ru RUNTIME] [-d DEVICE] [--fmt FMT]
               [-p PROFILING] [-pr PROFILE_OUTPUT]

Measures the latency of a model (python API).

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        ONNX graph (default: None)
  --law LAW             random law used to generate fake inputs (default:
                        normal)
  -s SIZE, --size SIZE  batch size, it replaces the first dimension of every
                        input if it is left unknown (default: 1)
  -n NUMBER, --number NUMBER
                        number of calls to measure (default: 10)
  -r REPEAT, --repeat REPEAT
                        number of times to repeat the experiment (default: 10)
  -ma MAX_TIME, --max_time MAX_TIME
                        if it is > 0, it runs as many time during that period
                        of time (default: 0)
  -ru RUNTIME, --runtime RUNTIME
                        available runtime (default: onnxruntime)
  -d DEVICE, --device DEVICE
                        device, `cpu`, `cuda:0` or a list of providers
                        `CPUExecutionProvider, CUDAExecutionProvider (default:
                        cpu)
  --fmt FMT             None or `csv`, it then returns a string formatted like
                        a csv file (default: )
  -p PROFILING, --profiling PROFILING
                        if True, profile the execution of every node, if can
                        be sorted by name or type, the value for this
                        parameter should e in `(None, 'name', 'type')`
                        (default: )
  -pr PROFILE_OUTPUT, --profile_output PROFILE_OUTPUT
                        output name for the profiling if profiling is
                        specified (default: profiling.csv)

source on GitHub

mlprodict.onnxrt.validate.validate_latency.random_feed(inputs, batch=10, empty_dimension=1)#

Creates a dictionary of random inputs.

Parameters:

batch – dimension to use as batch dimension if unknown
empty_dimension – if a dimension is null, replaces it by this value

Returns:

dictionary

source on GitHub

module onnxrt.validate.validate_latency#

Short summary#

Functions#

Documentation#

module `onnxrt.validate.validate_latency`#