Benchmark#

Links: notebook, html, PDF, python, slides, GitHub

Ce notebook compare différents modèles depuis un notebook.

from jyquickhelper import add_notebook_menu
add_notebook_menu()

Si le message Widget Javascript not detected. It may not be installed or enabled properly. apparaît, vous devriez exécuter la commande jupyter nbextension enable --py --sys-prefix widgetsnbextension depuis la ligne de commande. Le code suivant vous permet de vérifier que cela a été fait.

from tqdm import tnrange, tqdm_notebook
from time import sleep

for i in tnrange(3, desc='1st loop'):
    for j in tqdm_notebook(range(20), desc='2nd loop'):
        sleep(0.01)

%matplotlib inline

Petit bench sur le clustering #

Définition du bench #

import dill
from tqdm import tnrange
from sklearn.cluster import AgglomerativeClustering, KMeans
from sklearn.datasets import make_blobs
from mlstatpy.ml import MlGridBenchMark

params = [dict(model=lambda : KMeans(n_clusters=3), name="KMeans-3", shortname="km-3"),
          dict(model=lambda : AgglomerativeClustering(), name="AgglomerativeClustering", shortname="aggclus")]

datasets = [dict(X=make_blobs(100, centers=3)[0], Nclus=3,
                 name="blob-100-3", shortname="b-100-3", no_split=True),
            dict(X=make_blobs(100, centers=5)[0], Nclus=5,
                 name="blob-100-5", shortname="b-100-5", no_split=True) ]

bench = MlGridBenchMark("TestName", datasets, fLOG=None, clog=None,
                        cache_file="cache.pickle", pickle_module=dill,
                        repetition=3, progressbar=tnrange,
                        graphx=["_time", "time_train", "Nclus"],
                        graphy=["silhouette", "Nrows"])

Lancer le bench #

bench.run(params)

0/|/2017-03-19 20:11:11 [BenchMark.run] number of cached run: 4:   0%|| 0/4 [00:00<?, ?it/s]

3/|/2017-03-19 20:11:13 [BenchMark.run] done.:  75%|| 3/4 [00:02<00:00,  1.10it/s]                                     11it/s]_train': 0.02142968022685221, 'time_test': 0.0025012412126208527, '_btry': 'aggclus-b-100-5', '_iexp': 2, 'model_name': 'AgglomerativeClustering', 'ds_name': 'blob-100-5', 'Nrows': 100, 'Nfeat': 2, 'Nclus': 5, 'no_split': True, '_date': datetime.datetime(2017, 3, 19, 20, 11, 11, 647355), '_time': 0.1007650830318858, '_span': datetime.timedelta(0, 0, 112581), '_i': 3, '_name': 'TestName'}:  75%|| 3/4 [00:00<00:00,  4.22it/s]]0:00,  3.53it/s]]

Récupérer les résultats #

df = bench.to_df()
df

	_btry	_date	_i	_iexp	_name	_span	_time	Nclus	Nfeat	Nrows	ds_name	model_name	no_split	own_score	silhouette	time_preproc	time_test	time_train
0	km-3-b-100-3	2017-03-19 20:11:11.132135	0	0	TestName	0:00:00.147610	0.147594	3	2	100	blob-100-3	KMeans-3	True	-175.396944	0.700618	0.009154	0.003195	0.044693
1	km-3-b-100-3	0:00:00.147610	0	1	TestName	2017-03-19 20:11:11.140141	0.147594	3	2	100	blob-100-3	KMeans-3	True	-175.396944	0.700618	0.006068	0.002803	0.037633
2	km-3-b-100-3	2017-03-19 20:11:11.140141	0	2	TestName	0:00:00.155620	0.147594	3	2	100	blob-100-3	KMeans-3	True	-175.396944	0.700618	0.006230	0.002630	0.035106
3	aggclus-b-100-3	2017-03-19 20:11:11.317283	1	0	TestName	0:00:00.096081	0.096700	3	2	100	blob-100-3	AgglomerativeClustering	True	NaN	0.662345	0.008147	0.002508	0.026997
4	aggclus-b-100-3	0:00:00.096081	1	1	TestName	2017-03-19 20:11:11.325288	0.096700	3	2	100	blob-100-3	AgglomerativeClustering	True	NaN	0.662345	0.009511	0.004156	0.016807
5	aggclus-b-100-3	2017-03-19 20:11:11.325288	1	2	TestName	0:00:00.106088	0.096700	3	2	100	blob-100-3	AgglomerativeClustering	True	NaN	0.662345	0.007018	0.003252	0.018227
6	km-3-b-100-5	2017-03-19 20:11:11.452688	2	0	TestName	0:00:00.145130	0.145012	5	2	100	blob-100-5	KMeans-3	True	-466.829200	0.790511	0.007587	0.002748	0.033610
7	km-3-b-100-5	0:00:00.145130	2	1	TestName	2017-03-19 20:11:11.463199	0.145012	5	2	100	blob-100-5	KMeans-3	True	-466.829200	0.790511	0.007471	0.002278	0.036098
8	km-3-b-100-5	2017-03-19 20:11:11.463199	2	2	TestName	0:00:00.153136	0.145012	5	2	100	blob-100-5	KMeans-3	True	-466.829200	0.790511	0.011576	0.004463	0.039103
9	aggclus-b-100-5	2017-03-19 20:11:11.640351	3	0	TestName	0:00:00.101573	0.100765	5	2	100	blob-100-5	AgglomerativeClustering	True	NaN	0.636241	0.009483	0.002418	0.020562
10	aggclus-b-100-5	0:00:00.101573	3	1	TestName	2017-03-19 20:11:11.647355	0.100765	5	2	100	blob-100-5	AgglomerativeClustering	True	NaN	0.636241	0.011532	0.001634	0.021456
11	aggclus-b-100-5	2017-03-19 20:11:11.647355	3	2	TestName	0:00:00.112581	0.100765	5	2	100	blob-100-5	AgglomerativeClustering	True	NaN	0.636241	0.009643	0.002501	0.021430

df.plot(x="time_train", y="silhouette", kind="scatter")

<matplotlib.axes._subplots.AxesSubplot at 0x122b8004748>

Dessin, Graphs #

bench.plot_graphs(figsize=(12,12))

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x00000122B8269A90>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000122B82E1DA0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x00000122B83512E8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000122B83A1828>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x00000122B8409D68>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000122B8462588>]], dtype=object)

Liens

Contenu

Information

Sujet précédent

Sujet suivant

Benchmark#

Petit bench sur le clustering #

Définition du bench #

Lancer le bench #

Récupérer les résultats #

Dessin, Graphs #

Liens

Contenu

Information

Sujet précédent

Sujet suivant

Benchmark#

Petit bench sur le clustering#

Définition du bench#

Lancer le bench#

Récupérer les résultats#

Dessin, Graphs#

Petit bench sur le clustering #

Définition du bench #

Lancer le bench #

Récupérer les résultats #

Dessin, Graphs #