# module ml.roc¶

## Short summary¶

module mlstatpy.ml.roc

source on GitHub

## Classes¶

class truncated documentation
ROC Helper to draw a ROC curve

## Properties¶

property truncated documentation
Data returns the underlying dataframe

## Methods¶

method truncated documentation
__init__ Initialisation with a dataframe and two or three columns:
__len__ usual
__repr__ show first elements, precision rate
__str__ show first elements, precision rate
auc Computes the area under the curve.
auc_interval Determines a confidence interval for the AUC with bootstrap.
compute_roc_curve Computes a ROC curve with nb points avec nb, if nb == -1, there are as many as points as the data contains, …
confusion Computes the confusion matrix for a specific score or all if score is None.
plot Plot a ROC curve.
precision Computes the precision.
random_cloud resample among the data
roc_intersect The :epkg:ROC curve is defined by a set of points. This function interpolates those points to determine …
roc_intersect_interval Computes a confidence interval for the value returned by roc_intersect().

## Documentation¶

source on GitHub

class mlstatpy.ml.roc.ROC(y_true=None, y_score=None, sample_weight=None, df=None)[source]

Bases : object

Helper to draw a ROC curve

source on GitHub

Initialisation with a dataframe and two or three columns:

• column 1: score (y_score)
• column 2: expected answer (boolean) (y_true)
• column 3: weight (optional) (sample_weight)
Paramètres: y_true – if df is None, y_true, y_score, sample_weight must be filled, y_true is whether or None the answer is true. y_true means the prediction is right. y_score – score prediction sample_weight – weights df – dataframe or array or list, it must contains 2 or 3 columns always in the same order

source on GitHub

class CurveType[source]

Bases : enum.Enum

Curve types

• PROBSCORE: 1 - False Positive / True Positive
• ERRPREC: error / recall
• RECPREC: precision / recall
• ROC: False Positive / True Positive
• SKROC: False Positive / True Positive (scikit-learn)

source on GitHub

Data

returns the underlying dataframe

source on GitHub

__init__(y_true=None, y_score=None, sample_weight=None, df=None)[source]

Initialisation with a dataframe and two or three columns:

• column 1: score (y_score)
• column 2: expected answer (boolean) (y_true)
• column 3: weight (optional) (sample_weight)
Paramètres: y_true – if df is None, y_true, y_score, sample_weight must be filled, y_true is whether or None the answer is true. y_true means the prediction is right. y_score – score prediction sample_weight – weights df – dataframe or array or list, it must contains 2 or 3 columns always in the same order

source on GitHub

__len__()[source]

usual

source on GitHub

__repr__()[source]

show first elements, precision rate

source on GitHub

__str__()[source]

show first elements, precision rate

source on GitHub

auc(cloud=None)[source]

Computes the area under the curve.

Paramètres: cloud – data or None to use self.data, the function assumes the data is sorted. AUC

The first column is the label, the second one is the score, the third one is the weight.

source on GitHub

auc_interval(bootstrap=10, alpha=0.95)[source]

Determines a confidence interval for the AUC with bootstrap.

Paramètres: bootstrap – number of random estimation alpha – define the confidence interval dictionary of values

source on GitHub

compute_roc_curve(nb=100, curve=<CurveType.ROC: 5>, bootstrap=False)[source]

Computes a ROC curve with nb points avec nb, if nb == -1, there are as many as points as the data contains, if bootstrap == True, it draws random number to create confidence interval based on bootstrap method.

Paramètres: nb – number of points for the curve curve – see CurveType boostrap – builds the curve after resampling DataFrame (metrics and threshold)

If curve is SKROC, the parameter nb is not taken into account. It should be set to 0.

source on GitHub

confusion(score=None, nb=10, curve=<CurveType.ROC: 5>, bootstrap=False)[source]

Computes the confusion matrix for a specific score or all if score is None.

Paramètres: score – score or None. nb – number of scores (if score is None) curve – see CurveType boostrap – builds the curve after resampling One row if score is precised, many roww is score is None

source on GitHub

plot(nb=100, curve=<CurveType.ROC: 5>, bootstrap=0, ax=None, thresholds=False, **kwargs)[source]

Plot a ROC curve.

Paramètres: nb – number of points curve – see CurveType boostrap – number of curves for the boostrap (0 for None) ax – axis thresholds – use thresholds for the X axis kwargs – sent to pandas.plot ax

source on GitHub

precision()[source]

Computes the precision.

source on GitHub

random_cloud()[source]

resample among the data

Renvoie: DataFrame

source on GitHub

roc_intersect(roc, x)[source]

The :epkg:ROC curve is defined by a set of points. This function interpolates those points to determine y for any x.

Paramètres: roc – ROC curve x – x y

source on GitHub

roc_intersect_interval(x, nb, curve=<CurveType.ROC: 5>, bootstrap=10, alpha=0.05)[source]

Computes a confidence interval for the value returned by roc_intersect.

Paramètres: roc – ROC curve x – x curve – see CurveType dictionary

source on GitHub