module benchmark.bench_helper

Short summary

module pymlbenchmark.benchmark.bench_helper

Implements a benchmark about performance.

source on GitHub

Functions

function

truncated documentation

bench_pivot

Merges all results for one set of parameters in one row.

enumerate_options

Enumerates all possible options.

remove_almost_nan_columns

Automatically removes columns with more than 1/3 nan values.

Documentation

Implements a benchmark about performance.

source on GitHub

pymlbenchmark.benchmark.bench_helper.bench_pivot(data, experiment='lib', value='mean', index=None)

Merges all results for one set of parameters in one row.

Parameters:
  • dataDataFrame

  • experiment – column which identifies an experiment

  • value – value to plot

  • index – set of parameters which identifies an experiment, if None, guesses it

Returns:

DataFrame

<<<

import pandas
from pymlbenchmark.datasets import experiment_results
from pymlbenchmark.benchmark.bench_helper import bench_pivot

df = experiment_results('onnxruntime_LogisticRegression')
piv = bench_pivot(df)
print(piv.head())

>>>

    lib                                           ort       skl
    N count dim fit_intercept method                           
    1 100   1   False         predict        0.000021  0.000041
                              predict_proba  0.000023  0.000049
                True          predict        0.000025  0.000071
                              predict_proba  0.000026  0.000051
            5   False         predict        0.000022  0.000042

source on GitHub

pymlbenchmark.benchmark.bench_helper.enumerate_options(options, filter_fct=None)

Enumerates all possible options.

Parameters:
  • options – dictionary {name: list of values}

  • filter_fct – filters out some configurations

Returns:

list of dictionary {name: value}

<<<

from pymlbenchmark.benchmark.bench_helper import enumerate_options
options = dict(c1=[0, 1], c2=["aa", "bb"])
for row in enumerate_options(options):
    print("no-filter", row)


def filter_out(**opt):
    return not (opt["c1"] == 1 and opt["c2"] == "aa")


for row in enumerate_options(options, filter_out):
    print("filter", row)

>>>

    no-filter {'c1': 0, 'c2': 'aa'}
    no-filter {'c1': 0, 'c2': 'bb'}
    no-filter {'c1': 1, 'c2': 'aa'}
    no-filter {'c1': 1, 'c2': 'bb'}
    filter {'c1': 0, 'c2': 'aa'}
    filter {'c1': 0, 'c2': 'bb'}
    filter {'c1': 1, 'c2': 'bb'}

source on GitHub

pymlbenchmark.benchmark.bench_helper.remove_almost_nan_columns(df, keep=None, fill_keep=True)

Automatically removes columns with more than 1/3 nan values.

Parameters:
  • df – dataframe

  • keep – columns to skip

  • fill_keep – if not None, fill nan value

Returns:

clean dataframe

source on GitHub