module ml.roc

Inheritance diagram of mlstatpy.ml.roc

Short summary

module mlstatpy.ml.roc

About ROC

source on GitHub

Classes

class

truncated documentation

ROC

Helper to draw a ROC curve.

Properties

property

truncated documentation

Data

Returns the underlying dataframe.

Methods

method

truncated documentation

__init__

Initialisation with a dataframe and two or three columns:

__len__

usual

__repr__

Shows first elements, precision rate.

__str__

Shows first elements, precision rate.

auc

Computes the area under the curve (:epkg:`AUC`).

auc_interval

Determines a confidence interval for the :epkg:`AUC` with bootstrap.

compute_roc_curve

Computes a ROC curve with nb points avec nb, if nb == -1, there are as many as points as the data contains, …

confusion

Computes the confusion matrix for a specific score or all if score is None.

plot

Plots a ROC curve.

precision

Computes the precision.

random_cloud

Resamples among the data.

roc_intersect

The ROC curve is defined by a set of points. This function interpolates those points to determine …

roc_intersect_interval

Computes a confidence interval for the value returned by roc_intersect().

Documentation

About ROC

source on GitHub

class mlstatpy.ml.roc.ROC(y_true=None, y_score=None, sample_weight=None, df=None)[source]

Bases : object

Helper to draw a ROC curve.

source on GitHub

Initialisation with a dataframe and two or three columns:

  • column 1: score (y_score)

  • column 2: expected answer (boolean) (y_true)

  • column 3: weight (optional) (sample_weight)

Paramètres
  • y_true – if df is None, y_true, y_score, sample_weight must be filled, y_true is whether or None the answer is true. y_true means the prediction is right.

  • y_score – score prediction

  • sample_weight – weights

  • df – dataframe or array or list, it must contains 2 or 3 columns always in the same order

source on GitHub

class CurveType[source]

Bases : enum.Enum

Curve types:

  • PROBSCORE: 1 - False Positive / True Positive

  • ERRPREC: error / recall

  • RECPREC: precision / recall

  • ROC: False Positive / True Positive

  • SKROC: False Positive / True Positive (scikit-learn)

source on GitHub

property Data

Returns the underlying dataframe.

source on GitHub

__init__(y_true=None, y_score=None, sample_weight=None, df=None)[source]

Initialisation with a dataframe and two or three columns:

  • column 1: score (y_score)

  • column 2: expected answer (boolean) (y_true)

  • column 3: weight (optional) (sample_weight)

Paramètres
  • y_true – if df is None, y_true, y_score, sample_weight must be filled, y_true is whether or None the answer is true. y_true means the prediction is right.

  • y_score – score prediction

  • sample_weight – weights

  • df – dataframe or array or list, it must contains 2 or 3 columns always in the same order

source on GitHub

__len__()[source]

usual

source on GitHub

__repr__()[source]

Shows first elements, precision rate.

source on GitHub

__str__()[source]

Shows first elements, precision rate.

source on GitHub

auc(cloud=None)[source]

Computes the area under the curve (:epkg:`AUC`).

Paramètres

cloud – data or None to use self.data, the function assumes the data is sorted.

Renvoie

AUC

The first column is the label, the second one is the score, the third one is the weight.

source on GitHub

auc_interval(bootstrap=10, alpha=0.95)[source]

Determines a confidence interval for the :epkg:`AUC` with bootstrap.

Paramètres
  • bootstrap – number of random estimation

  • alpha – define the confidence interval

Renvoie

dictionary of values

source on GitHub

compute_roc_curve(nb=100, curve=<CurveType.ROC: 5>, bootstrap=False)[source]

Computes a ROC curve with nb points avec nb, if nb == -1, there are as many as points as the data contains, if bootstrap == True, it draws random number to create confidence interval based on bootstrap method.

Paramètres
  • nb – number of points for the curve

  • curve – see CurveType

  • boostrap – builds the curve after resampling

Renvoie

DataFrame (metrics and threshold)

If curve is SKROC, the parameter nb is not taken into account. It should be set to 0.

source on GitHub

confusion(score=None, nb=10, curve=<CurveType.ROC: 5>, bootstrap=False)[source]

Computes the confusion matrix for a specific score or all if score is None.

Paramètres
  • score – score or None.

  • nb – number of scores (if score is None)

  • curve – see CurveType

  • boostrap – builds the curve after resampling

Renvoie

One row if score is precised, many roww is score is None

source on GitHub

plot(nb=100, curve=<CurveType.ROC: 5>, bootstrap=0, ax=None, thresholds=False, **kwargs)[source]

Plots a ROC curve.

Paramètres
  • nb – number of points

  • curve – see CurveType

  • boostrap – number of curves for the boostrap (0 for None)

  • ax – axis

  • thresholds – use thresholds for the X axis

  • kwargs – sent to pandas.plot

Renvoie

ax

source on GitHub

precision()[source]

Computes the precision.

source on GitHub

random_cloud()[source]

Resamples among the data.

Renvoie

DataFrame

source on GitHub

roc_intersect(roc, x)[source]

The ROC curve is defined by a set of points. This function interpolates those points to determine y for any x.

Paramètres
  • roc – ROC curve

  • x – x

Renvoie

y

source on GitHub

roc_intersect_interval(x, nb, curve=<CurveType.ROC: 5>, bootstrap=10, alpha=0.05)[source]

Computes a confidence interval for the value returned by roc_intersect.

Paramètres
  • roc – ROC curve

  • x – x

  • curve – see CurveType

Renvoie

dictionary

source on GitHub