2022-12-02 Build onnxruntime on WSL (Windows Linux Subsystem)#

I tried to build onnxruntime-training for GPU on WSL (Windows Linux Subsystem). I took the distribution Ubuntu 22.04. Paths should be updated according to your installation.

some useful commands once installed

nvidia-smi
nsys

Let’s assume WSL is installed, otherwise, here are some useful commands.

# see all local distributions
wsl -s -l

# see available distributions online
wsl --list --online

# install one distribution or download it
wsl --install -d Ubuntu-22.04

The CUDA driver must be installed as well. it can be downloaded from NVIDIA Driver Downloads. Make sure you are using the one from your graphics card. Installation of required packages.

sudo apt-get update
sudo apt-get upgrade -y
sudo apt autoremove -y
sudo apt-get install -y cmake zlib1g-dev libssl-dev python3-dev libhwloc-dev libevent-dev libcurl4-openssl-dev libopenmpi-dev clang unzip

Let’s install :epkg:`gcc`:

sudo apt-get update
sudo apt-get upgrade -y
sudo apt autoremove -y
sudo apt install -y libcurl4 ca-certificates
sudo apt-get install -y gcc g++
gcc --version

Installation of ninja:

wget https://github.com/ninja-build/ninja/releases/download/v1.11.1/ninja-linux.zip
unzip ninja-linux.zip
sudo cp ./ninja /usr/local/bin/
sudo chmod a+x /usr/local/bin/ninja

Installation of cmake.

mkdir install
cd install
curl -OL https://github.com/Kitware/CMake/releases/download/v3.25.1/cmake-3.25.1.tar.gz
tar -zxvf cmake-3.25.1.tar.gz
cd cmake-3.25.1
./bootstrap --system-curl
make
sudo make install
export PATH=~/install/cmake-3.25.1/bin/:$PATH
cmake --version

Installation of CUDA (choose a compatible version with pytorch, 11.8 for example).

See CUDA on WSL User Guide

export CUDA_VERSION=12.0
export CUDA_VERSION_=12-0
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/${CUDA_VERSION}.0/local_installers/cuda-repo-wsl-ubuntu-${CUDA_VERSION_}-local_${CUDA_VERSION}.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-${CUDA_VERSION_}-local_${CUDA_VERSION}.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-0-local/cuda-2E27EA96-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

Now you may run nvidia-smi -L to list the available GPUs.

Installation of cudnn (after it is downloaded):

sudo dpkg -i cudnn-local-repo-ubuntu2204-8.7.0.84_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2204-8.7.0.84/cudnn-local-BF23AD8A-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get install libcudnn8  libcudnn8-dev

Installation of nccl

See Install NCCL.

sudo dpkg -i nccl-local-repo-ubuntu2204-2.15.5-cuda11.8_1.0-1_amd64.deb
sudo cp /var/nccl-local-repo-ubuntu2204-2.15.5-cuda11.8/nccl-local-1F5D0FB9-keyring.gpg /usr/share/keyrings/
sudo apt update
sudo apt install libnccl2 libnccl-dev

Installation of pip and update python packages:

sudo apt-get install -y python3-pybind11 libpython3.10-dev
wget https://bootstrap.pypa.io/get-pip.py
sudo python3 get-pip.py
sudo python3 -m pip install --upgrade numpy jupyter pandas statsmodels scipy scikit-learn pybind11 cython flatbuffers mpi4py notebook nbconvert flatbuffers pylint autopep8 sphinx sphinx-gallery cffi black py-spy fire pytest

Installation of pytorch of it is available for CUDA 11.8:

python3 -m pip install torch torchvision torchaudio

Otherwise, it has to be built from sources:

wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh
bash Anaconda3-2022.10-Linux-x86_64.sh
conda create -p ~/install/acond10 python=3.10
conda activate ~/install/acond10
conda install -y astunparse numpy ninja pyyaml setuptools cmake cffi typing_extensions future six requests dataclasses
conda install -y mkl mkl-include
conda install -c pytorch magma-cuda118
mkdir ~/github
cd ~/github
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
# python tools/amd_build/build_amd.py
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
export CUDA_VERSION=11.8
export CUDACXX=/usr/local/cuda-${CUDA_VERSION}/bin/nvcc
export USE_ITT=0
export USE_KINETO=0
export BUILD_TEST=0
export USE_MPI=0
export BUILD_CAFFE2=0
export BUILD_CAFFE2_OPS=0
export USE_DISTRIBUTED=0
export MAX_JOBS=1
python setup.py build

Then to check CUDA is available:

import torch
print(torch.cuda.is_available())

Build onnxruntime-training before onnx to build protobuf as well.

alias python=python3
export CUDA_VERSION=11.8
export CUDACXX=/usr/local/cuda-${CUDA_VERSION}/bin/nvcc
export CMAKE_CUDA_COMPILER=/usr/local/cuda-${CUDA_VERSION}/bin/nvcc
python3 ./tools/ci_build/build.py --skip_tests --build_dir ./build/linux_gpu --config Release --use_mpi true --enable_training --enable_training_torch_interop --use_cuda --cuda_version=${CUDA_VERSION} --cuda_home /usr/local/cuda-${CUDA_VERSION}/ --cudnn_home /usr/local/cuda-${CUDA_VERSION}/ --build_wheel --parallel

Option --parallel 1 can be used to fix the parallelism while building onnxruntime. Option –use_mpi false can be replaced by –mpi_home /usr/local/lib/openmpi.

Another option is to use a docker: Running Existing GPU Accelerated Containers on WSL 2.

Then onnx built inplace:

git clone https://github.com/onnx/onnx.git
cd onnx
python setup.py build
python setup.py build_ext --inplace

Some useful commands:

export PYTHONPATH=~/github/onnx:~/github/onnxruntime/build/linux_gpu/Release/Release
export PYTHONPATH=$PYTHONPATH:~/github/onnxcustom:~/github/mlprodict