module onnx_conv.helpers.lgbm_helper#

Short summary#

module mlprodict.onnx_conv.helpers.lgbm_helper

Helpers to speed up the conversion of Lightgbm models or transform it.

source on GitHub

Functions#

function

truncated documentation

dump_booster_model

Dumps Booster to JSON format. Parameters ———- self: booster num_iteration : int or None, optional …

dump_lgbm_booster

Dumps a Lightgbm booster into JSON.

modify_tree_for_rule_in_set

LightGBM produces sometimes a tree with a node set to use rule == to a set of values (= in set), the values …

restore_lgbm_info

Restores speed up information to help modifying the structure of the tree.

Documentation#

Helpers to speed up the conversion of Lightgbm models or transform it.

source on GitHub

mlprodict.onnx_conv.helpers.lgbm_helper.dump_booster_model(self, num_iteration=None, start_iteration=0, importance_type='split', verbose=0)#

Dumps Booster to JSON format.

Parameters#

self: booster num_iteration : int or None, optional (default=None)

Index of the iteration that should be dumped. If None, if the best iteration exists, it is dumped; otherwise, all iterations are dumped. If <= 0, all iterations are dumped.

start_iterationint, optional (default=0)

Start index of the iteration that should be dumped.

importance_typestring, optional (default=”split”)

What type of feature importance should be dumped. If “split”, result contains numbers of times the feature is used in a model. If “gain”, result contains total gains of splits which use the feature.

verbose: dispays progress (usefull for big trees)

Returns#

json_reprdict

JSON format of Booster.

Note

This function is inspired from the lightgbm (dump_model. It creates intermediate structure to speed up the conversion into ONNX of such model. The function overwrites the json.load to fastly extract nodes.

source on GitHub

mlprodict.onnx_conv.helpers.lgbm_helper.dump_lgbm_booster(booster, verbose=0)#

Dumps a Lightgbm booster into JSON.

Parameters:
  • booster – Lightgbm booster

  • verbose – verbosity

Returns:

json, dictionary with more information

source on GitHub

mlprodict.onnx_conv.helpers.lgbm_helper.modify_tree_for_rule_in_set(gbm, use_float=False, verbose=0, count=0, info=None)#

LightGBM produces sometimes a tree with a node set to use rule == to a set of values (= in set), the values are separated by ||. This function unfold theses nodes.

Parameters:
  • gbm – a tree coming from lightgbm dump

  • use_float – use float otherwise int first then float if it does not work

  • verbose – verbosity, use tqdm to show progress

  • count – number of nodes already changed (origin) before this call

  • info – addition information to speed up this search

Returns:

number of changed nodes (include count)

A child looks like the following:

<<<

import pprint
from mlprodict.onnx_conv.operator_converters.conv_lightgbm import modify_tree_for_rule_in_set

tree = {'decision_type': '==',
        'default_left': True,
        'internal_count': 6805,
        'internal_value': 0.117558,
        'left_child': {'leaf_count': 4293,
                       'leaf_index': 18,
                       'leaf_value': 0.003519117642745049},
        'missing_type': 'None',
        'right_child': {'leaf_count': 2512,
                        'leaf_index': 25,
                        'leaf_value': 0.012305307958365394},
        'split_feature': 24,
        'split_gain': 12.233599662780762,
        'split_index': 24,
        'threshold': '10||12||13'}

modify_tree_for_rule_in_set(tree)

pprint.pprint(tree)

>>>

    {'decision_type': '==',
     'default_left': True,
     'internal_count': 6805,
     'internal_value': 0.117558,
     'left_child': {'leaf_count': 4293,
                    'leaf_index': 18,
                    'leaf_value': 0.003519117642745049},
     'missing_type': 'None',
     'right_child': {'decision_type': '==',
                     'default_left': True,
                     'internal_count': 6805,
                     'internal_value': 0.117558,
                     'left_child': {'leaf_count': 4293,
                                    'leaf_index': 18,
                                    'leaf_value': 0.003519117642745049},
                     'missing_type': 'None',
                     'right_child': {'decision_type': '==',
                                     'default_left': True,
                                     'internal_count': 6805,
                                     'internal_value': 0.117558,
                                     'left_child': {'leaf_count': 4293,
                                                    'leaf_index': 18,
                                                    'leaf_value': 0.003519117642745049},
                                     'missing_type': 'None',
                                     'right_child': {'leaf_count': 2512,
                                                     'leaf_index': 25,
                                                     'leaf_value': 0.012305307958365394},
                                     'split_feature': 24,
                                     'split_gain': 12.233599662780762,
                                     'split_index': 24,
                                     'threshold': 13},
                     'split_feature': 24,
                     'split_gain': 12.233599662780762,
                     'split_index': 24,
                     'threshold': 12},
     'split_feature': 24,
     'split_gain': 12.233599662780762,
     'split_index': 24,
     'threshold': 10}

source on GitHub

mlprodict.onnx_conv.helpers.lgbm_helper.restore_lgbm_info(tree)#

Restores speed up information to help modifying the structure of the tree.

source on GitHub