module cli.latency_cli

Short summary

module mlprodict.cli.latency_cli

Command line about validation of prediction runtime.

source on GitHub

Functions

function

truncated documentation

_random_input

latency

Measures the latency of a model (python API).

random_feed

Creates a dictionary of random inputs.

Documentation

Command line about validation of prediction runtime.

source on GitHub

mlprodict.cli.latency_cli._random_input(typ, shape, batch)
mlprodict.cli.latency_cli.latency(model, law='normal', size=1, number=10, repeat=10, max_time=0, runtime='onnxruntime', device='cpu', fmt=None, profiling=None, profile_output='profiling.csv')

Measures the latency of a model (python API).

Parameters
  • model – ONNX graph

  • law – random law used to generate fake inputs

  • size – batch size, it replaces the first dimension of every input if it is left unknown

  • number – number of calls to measure

  • repeat – number of times to repeat the experiment

  • max_time – if it is > 0, it runs as many time during that period of time

  • runtime – available runtime

  • device – device, cpu, cuda:0

  • fmt – None or csv, it then returns a string formatted like a csv file

  • profiling – if True, profile the execution of every node, if can be by name or type.

  • profile_output – output name for the profiling if profiling is specified

Measures model latency

The command generates random inputs and call many times the model on these inputs. It returns the processing time for one iteration.

Example:

python -m mlprodict latency --model "model.onnx"

<<<

python -m mlprodict latency --help

>>>

usage: latency [-h] [-m MODEL] [--law LAW] [-s SIZE] [-n NUMBER] [-r REPEAT]
               [-ma MAX_TIME] [-ru RUNTIME] [-d DEVICE] [--fmt FMT]
               [-p PROFILING] [-pr PROFILE_OUTPUT]

Measures the latency of a model (python API).

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        ONNX graph (default: None)
  --law LAW             random law used to generate fake inputs (default:
                        normal)
  -s SIZE, --size SIZE  batch size, it replaces the first dimension of every
                        input if it is left unknown (default: 1)
  -n NUMBER, --number NUMBER
                        number of calls to measure (default: 10)
  -r REPEAT, --repeat REPEAT
                        number of times to repeat the experiment (default: 10)
  -ma MAX_TIME, --max_time MAX_TIME
                        if it is > 0, it runs as many time during that period
                        of time (default: 0)
  -ru RUNTIME, --runtime RUNTIME
                        available runtime (default: onnxruntime)
  -d DEVICE, --device DEVICE
                        device, `cpu`, `cuda:0` (default: cpu)
  --fmt FMT             None or `csv`, it then returns a string formatted like
                        a csv file (default: )
  -p PROFILING, --profiling PROFILING
                        if True, profile the execution of every node, if can
                        be by name or type. (default: )
  -pr PROFILE_OUTPUT, --profile_output PROFILE_OUTPUT
                        output name for the profiling if profiling is
                        specified (default: profiling.csv)

source on GitHub

mlprodict.cli.latency_cli.random_feed(inputs, batch=10)

Creates a dictionary of random inputs.

Parameters

batch – dimension to use as batch dimension if unknown

Returns

dictionary

source on GitHub