{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# 2A.ml - Machine Learning et donn\u00e9es crypt\u00e9es\n", "\n", "Comment faire du machine learning avec des donn\u00e9es crypt\u00e9es ? Ce notebook propose d'en montrer un principe expos\u00e9 dans [CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy](http://proceedings.mlr.press/v48/gilad-bachrach16.pdf)."]}, {"cell_type": "code", "execution_count": 1, "metadata": {"collapsed": true}, "outputs": [], "source": ["%matplotlib inline"]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [{"data": {"text/html": ["
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 3, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Principe\n", "\n", "Le machine learning sur des donn\u00e9es crypt\u00e9es repose sur un algorithme de [chiffrement_homomorphe](https://fr.wikipedia.org/wiki/Chiffrement_homomorphe) ou [homomorphic encryption](https://en.wikipedia.org/wiki/Homomorphic_encryption). Ce concept a \u00e9t\u00e9 invent\u00e9 par Craig Gentry (lire [Fully Homomorphic Encryption Using Ideal Lattices](https://www.cs.cmu.edu/~odonnell/hits09/gentry-homomorphic-encryption.pdf), [Fully Homomorphic Encryption over the Integers](https://eprint.iacr.org/2009/616.pdf)). On note $x \\rightarrow \\varepsilon(x)$ une fonction de chiffrement compl\u00e8tement homomorphe. Il v\u00e9rifie :\n", "\n", "$$\\begin{array}{ll}\\varepsilon(x+y) = \\varepsilon(x) + \\varepsilon(y) \\\\ \\varepsilon(x*y) = \\varepsilon(x) * \\varepsilon(y)\\end{array}$$. Dans l'exemple qui suit, nous avons besoin que le syst\u00e8me de cryptage soit [partiellement homomorphe](https://fr.wikipedia.org/wiki/Chiffrement_homomorphe#Syst.C3.A8mes_partiellement_homomorphes) : seule l'addition est stable une fois l'entier crypt\u00e9.\n", "\n", "Un exemple : $\\varepsilon:\\mathbb{N} \\rightarrow \\mathbb{Z}/n\\mathbb{Z}$ et $\\varepsilon(x) = (x * a) \\mod n$. Cela veut dire que l'on peut crypter des donn\u00e9es, faire des calculs avec et d\u00e9crypter un r\u00e9sultat qui serait presque le m\u00eame que si les calculs avaient \u00e9t\u00e9 fait sur les donn\u00e9es non crypt\u00e9es."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 1 : \u00e9crire deux fonctions de cryptage, d\u00e9cryptage\n", "\n", "Il faut bien choisir $n$, $a$ pour impl\u00e9menter la fonction de cryptage :\n", "$\\varepsilon:\\mathbb{N} \\rightarrow \\mathbb{Z}/n\\mathbb{Z}$ et $\\varepsilon(x) = (x * a) \\mod n$. On v\u00e9rifie ensuite qu'elle conserve l'addition au module $n$ pr\u00e8s."]}, {"cell_type": "code", "execution_count": 3, "metadata": {"collapsed": true}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 2 : Entra\u00eener une r\u00e9gression lin\u00e9aire"]}, {"cell_type": "code", "execution_count": 4, "metadata": {"collapsed": true}, "outputs": [], "source": ["from sklearn.datasets import load_diabetes\n", "data = load_diabetes()"]}, {"cell_type": "code", "execution_count": 5, "metadata": {"collapsed": true}, "outputs": [], "source": ["X = data.data\n", "Y = data.target"]}, {"cell_type": "code", "execution_count": 6, "metadata": {"collapsed": true}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 3 : r\u00e9\u00e9crire la fonction de pr\u00e9diction pour une r\u00e9gression lin\u00e9aire\n"]}, {"cell_type": "code", "execution_count": 7, "metadata": {"collapsed": true}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 4 : assembler le tout\n", "\n", "Prendre une observation, crypter, pr\u00e9dire, d\u00e9crypter, comparer avec la version non crypt\u00e9e. Il faudra sans doute un peu ruser car la fonction de cryptage s'applique \u00e0 des entiers et le mod\u00e8le de pr\u00e9diction \u00e0 des r\u00e9els."]}, {"cell_type": "code", "execution_count": 8, "metadata": {"collapsed": true}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["## Questions\n", "\n", "* A quelle condition peut-on aussi entra\u00eener un mod\u00e8le sur des donn\u00e9es crypt\u00e9es ?\n", "* Et les arbres de d\u00e9cision ?"]}, {"cell_type": "code", "execution_count": 9, "metadata": {"collapsed": true}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1"}}, "nbformat": 4, "nbformat_minor": 2}