module ml.roc

Inheritance diagram of mlstatpy.ml.roc

Short summary

module mlstatpy.ml.roc

About ROC

source on GitHub

Classes

class truncated documentation
ROC Helper to draw a ROC curve

Properties

property truncated documentation
Data returns the underlying dataframe

Methods

method truncated documentation
__init__ Initialisation with a dataframe and two or three columns:
__len__ usual
__repr__ show first elements, precision rate
__str__ show first elements, precision rate
auc Computes the area under the curve.
auc_interval Determines a confidence interval for the AUC with bootstrap.
compute_roc_curve Computes a ROC curve with nb points avec nb, if nb == -1, there are as many as points as the data contains, …
confusion Computes the confusion matrix for a specific score or all if score is None.
plot Plot a ROC curve.
precision Computes the precision.
random_cloud resample among the data
roc_intersect The :epkg:`ROC` curve is defined by a set of points. This function interpolates those points to determine …
roc_intersect_interval Computes a confidence interval for the value returned by roc_intersect().

Documentation

About ROC

source on GitHub

class mlstatpy.ml.roc.ROC(y_true=None, y_score=None, sample_weight=None, df=None)[source]

Bases : object

Helper to draw a ROC curve

source on GitHub

Initialisation with a dataframe and two or three columns:

  • column 1: score (y_score)
  • column 2: expected answer (boolean) (y_true)
  • column 3: weight (optional) (sample_weight)
Paramètres:
  • y_true – if df is None, y_true, y_score, sample_weight must be filled, y_true is whether or None the answer is true. y_true means the prediction is right.
  • y_score – score prediction
  • sample_weight – weights
  • df – dataframe or array or list, it must contains 2 or 3 columns always in the same order

source on GitHub

class CurveType[source]

Bases : enum.Enum

Curve types

  • PROBSCORE: 1 - False Positive / True Positive
  • ERRPREC: error / recall
  • RECPREC: precision / recall
  • ROC: False Positive / True Positive
  • SKROC: False Positive / True Positive (scikit-learn)

source on GitHub

Data

returns the underlying dataframe

source on GitHub

__init__(y_true=None, y_score=None, sample_weight=None, df=None)[source]

Initialisation with a dataframe and two or three columns:

  • column 1: score (y_score)
  • column 2: expected answer (boolean) (y_true)
  • column 3: weight (optional) (sample_weight)
Paramètres:
  • y_true – if df is None, y_true, y_score, sample_weight must be filled, y_true is whether or None the answer is true. y_true means the prediction is right.
  • y_score – score prediction
  • sample_weight – weights
  • df – dataframe or array or list, it must contains 2 or 3 columns always in the same order

source on GitHub

__len__()[source]

usual

source on GitHub

__repr__()[source]

show first elements, precision rate

source on GitHub

__str__()[source]

show first elements, precision rate

source on GitHub

auc(cloud=None)[source]

Computes the area under the curve.

Paramètres:cloud – data or None to use self.data, the function assumes the data is sorted.
Renvoie:AUC

The first column is the label, the second one is the score, the third one is the weight.

source on GitHub

auc_interval(bootstrap=10, alpha=0.95)[source]

Determines a confidence interval for the AUC with bootstrap.

Paramètres:
  • bootstrap – number of random estimation
  • alpha – define the confidence interval
Renvoie:

dictionary of values

source on GitHub

compute_roc_curve(nb=100, curve=<CurveType.ROC: 5>, bootstrap=False)[source]

Computes a ROC curve with nb points avec nb, if nb == -1, there are as many as points as the data contains, if bootstrap == True, it draws random number to create confidence interval based on bootstrap method.

Paramètres:
  • nb – number of points for the curve
  • curve – see CurveType
  • boostrap – builds the curve after resampling
Renvoie:

DataFrame (metrics and threshold)

If curve is SKROC, the parameter nb is not taken into account. It should be set to 0.

source on GitHub

confusion(score=None, nb=10, curve=<CurveType.ROC: 5>, bootstrap=False)[source]

Computes the confusion matrix for a specific score or all if score is None.

Paramètres:
  • score – score or None.
  • nb – number of scores (if score is None)
  • curve – see CurveType
  • boostrap – builds the curve after resampling
Renvoie:

One row if score is precised, many roww is score is None

source on GitHub

plot(nb=100, curve=<CurveType.ROC: 5>, bootstrap=0, ax=None, thresholds=False, **kwargs)[source]

Plot a ROC curve.

Paramètres:
  • nb – number of points
  • curve – see CurveType
  • boostrap – number of curves for the boostrap (0 for None)
  • ax – axis
  • thresholds – use thresholds for the X axis
  • kwargs – sent to pandas.plot
Renvoie:

ax

source on GitHub

precision()[source]

Computes the precision.

source on GitHub

random_cloud()[source]

resample among the data

Renvoie:DataFrame

source on GitHub

roc_intersect(roc, x)[source]

The :epkg:`ROC` curve is defined by a set of points. This function interpolates those points to determine y for any x.

Paramètres:
  • roc – ROC curve
  • x – x
Renvoie:

y

source on GitHub

roc_intersect_interval(x, nb, curve=<CurveType.ROC: 5>, bootstrap=10, alpha=0.05)[source]

Computes a confidence interval for the value returned by roc_intersect.

Paramètres:
  • roc – ROC curve
  • x – x
  • curve – see CurveType
Renvoie:

dictionary

source on GitHub