Parallelization#
onnxruntime is already parallelization the computation
on multiple cores if the execution runs on CPU only and obvioulsy
on GPU. Recent machines have multiple GPUs but onnxruntime
usually runs on one single GPUs. These examples tries to take
advantage of that configuration. The first parallelize the execution
of the same model on each GPU. It assumes a single GPU can host the
whole model. The second model explores a way to split the model
into pieces when the whole model does not hold in one single GPUs.
This is done through function
split_onnx
.
The tutorial was tested with following version:
<<<
import sys
import numpy
import scipy
import onnx
import onnxruntime
import onnxcustom
import sklearn
import torch
print("python {}".format(sys.version_info))
mods = [numpy, scipy, sklearn, onnx,
onnxruntime, onnxcustom, torch]
mods = [(m.__name__, m.__version__) for m in mods]
mx = max(len(_[0]) for _ in mods) + 1
for name, vers in sorted(mods):
print("{}{}{}".format(name, " " * (mx - len(name)), vers))
>>>
python sys.version_info(major=3, minor=9, micro=1, releaselevel='final', serial=0)
numpy 1.24.1
onnx 1.13.0
onnxcustom 0.4.344
onnxruntime 1.14.92+cpu
scipy 1.10.0
sklearn 1.2.0
torch 1.13.1+cu117