module onnxrt.validate.validate_latency#

Short summary#

module mlprodict.onnxrt.validate.validate_latency

Command line about validation of prediction runtime.

source on GitHub

Functions#

function

truncated documentation

_random_input

latency

Measures the latency of a model (python API).

random_feed

Creates a dictionary of random inputs.

Documentation#

Command line about validation of prediction runtime.

source on GitHub

mlprodict.onnxrt.validate.validate_latency._random_input(typ, shape, batch)#
mlprodict.onnxrt.validate.validate_latency.latency(model, law='normal', size=1, number=10, repeat=10, max_time=0, runtime='onnxruntime', device='cpu', profiling=None)#

Measures the latency of a model (python API).

Parameters:
  • model – ONNX graph

  • law – random law used to generate fake inputs

  • size – batch size, it replaces the first dimension of every input if it is left unknown

  • number – number of calls to measure

  • repeat – number of times to repeat the experiment

  • max_time – if it is > 0, it runs as many time during that period of time

  • runtime – available runtime

  • device – device, cpu, cuda:0

  • profiling – if True, profile the execution of every node, if can be sorted by name or type, the value for this parameter should e in (None, ‘name’, ‘type’),

Returns:

dictionary or a tuple (dictionary, dataframe) if the profiling is enable

Measures model latency

The command generates random inputs and call many times the model on these inputs. It returns the processing time for one iteration.

Example:

python -m mlprodict latency --model "model.onnx"

<<<

python -m mlprodict latency --help

>>>

usage: latency [-h] [-m MODEL] [--law LAW] [-s SIZE] [-n NUMBER] [-r REPEAT]
               [-ma MAX_TIME] [-ru RUNTIME] [-d DEVICE] [--fmt FMT]
               [-p PROFILING] [-pr PROFILE_OUTPUT]

Measures the latency of a model (python API).

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        ONNX graph (default: None)
  --law LAW             random law used to generate fake inputs (default:
                        normal)
  -s SIZE, --size SIZE  batch size, it replaces the first dimension of every
                        input if it is left unknown (default: 1)
  -n NUMBER, --number NUMBER
                        number of calls to measure (default: 10)
  -r REPEAT, --repeat REPEAT
                        number of times to repeat the experiment (default: 10)
  -ma MAX_TIME, --max_time MAX_TIME
                        if it is > 0, it runs as many time during that period
                        of time (default: 0)
  -ru RUNTIME, --runtime RUNTIME
                        available runtime (default: onnxruntime)
  -d DEVICE, --device DEVICE
                        device, `cpu`, `cuda:0` or a list of providers
                        `CPUExecutionProvider, CUDAExecutionProvider (default:
                        cpu)
  --fmt FMT             None or `csv`, it then returns a string formatted like
                        a csv file (default: )
  -p PROFILING, --profiling PROFILING
                        if True, profile the execution of every node, if can
                        be sorted by name or type, the value for this
                        parameter should e in `(None, 'name', 'type')`
                        (default: )
  -pr PROFILE_OUTPUT, --profile_output PROFILE_OUTPUT
                        output name for the profiling if profiling is
                        specified (default: profiling.csv)

source on GitHub

mlprodict.onnxrt.validate.validate_latency.random_feed(inputs, batch=10, empty_dimension=1)#

Creates a dictionary of random inputs.

Parameters:
  • batch – dimension to use as batch dimension if unknown

  • empty_dimension – if a dimension is null, replaces it by this value

Returns:

dictionary

source on GitHub