{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Analyse de survie"]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"data": {"text/html": ["
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Quelques donn\u00e9es\n", "\n", "On r\u00e9cup\u00e8re les donn\u00e9es disponibles sur *open.data.gouv.fr* [Donn\u00e9es hospitali\u00e8res relatives \u00e0 l'\u00e9pid\u00e9mie de COVID-19](https://www.data.gouv.fr/fr/datasets/donnees-hospitalieres-relatives-a-lepidemie-de-covid-19/). Ces donn\u00e9es ne permettent pas de construire la courbe de [Kaplan-Meier](https://fr.wikipedia.org/wiki/Estimateur_de_Kaplan-Meier). On sait combien de personnes rentrent et sortent chaque jour mais on ne sait pas quand une personne qui sort un 1er avril est entr\u00e9e."]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
jourraddc
02020-03-18NaNNaN
12020-03-19695.0207.0
22020-03-20806.0248.0
32020-03-21452.0151.0
42020-03-22608.0210.0
\n", "
"], "text/plain": [" jour rad dc\n", "0 2020-03-18 NaN NaN\n", "1 2020-03-19 695.0 207.0\n", "2 2020-03-20 806.0 248.0\n", "3 2020-03-21 452.0 151.0\n", "4 2020-03-22 608.0 210.0"]}, "execution_count": 3, "metadata": {}, "output_type": "execute_result"}], "source": ["import numpy.random as rnd\n", "\n", "import pandas\n", "df = pandas.read_csv(\"https://www.data.gouv.fr/en/datasets/r/63352e38-d353-4b54-bfd1-f1b3ee1cabd7\", sep=\";\")\n", "gr = df[[\"jour\", \"rad\", \"dc\"]].groupby([\"jour\"]).sum()\n", "diff = gr.diff().reset_index(drop=False)\n", "diff.head()"]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
entreesortieissue
518488-200190
541408-192270
476735-18720
587013-185420
476057-18010
\n", "
"], "text/plain": [" entree sortie issue\n", "518488 -200 19 0\n", "541408 -192 27 0\n", "476735 -187 2 0\n", "587013 -185 42 0\n", "476057 -180 1 0"]}, "execution_count": 4, "metadata": {}, "output_type": "execute_result"}], "source": ["\n", "def donnees_artificielles(hosp, mu=14, nu=21):\n", " dt = pandas.to_datetime(hosp['jour'])\n", " res = []\n", " for i in range(hosp.shape[0]):\n", " date = dt[i].dayofyear\n", " h = hosp.iloc[i, 1]\n", " delay = rnd.exponential(mu, int(h))\n", " for j in range(delay.shape[0]):\n", " res.append([date - int(delay[j]), date, 1])\n", " h = hosp.iloc[i, 2]\n", " delay = rnd.exponential(nu, int(h))\n", " for j in range(delay.shape[0]):\n", " res.append([date - int(delay[j]), date , 0])\n", " return pandas.DataFrame(res, columns=[\"entree\", \"sortie\", \"issue\"])\n", "\n", "\n", "data = donnees_artificielles(diff[1:].reset_index(drop=True)).sort_values('entree')\n", "data.head()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Chaque ligne est une personne, `entree` est le jour d'entr\u00e9e \u00e0 l'h\u00f4pital, `sortie` celui de la sortie, `issue`, 0 pour d\u00e9c\u00e8s, 1 pour en vie."]}, {"cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
entreesortieissue
count624130.000000624130.000000624130.000000
mean169.704510184.5328150.806729
std125.420957124.3431860.394864
min-200.0000001.0000000.000000
25%53.00000084.0000001.000000
50%133.000000144.0000001.000000
75%301.000000315.0000001.000000
max366.000000366.0000001.000000
\n", "
"], "text/plain": [" entree sortie issue\n", "count 624130.000000 624130.000000 624130.000000\n", "mean 169.704510 184.532815 0.806729\n", "std 125.420957 124.343186 0.394864\n", "min -200.000000 1.000000 0.000000\n", "25% 53.000000 84.000000 1.000000\n", "50% 133.000000 144.000000 1.000000\n", "75% 301.000000 315.000000 1.000000\n", "max 366.000000 366.000000 1.000000"]}, "execution_count": 5, "metadata": {}, "output_type": "execute_result"}], "source": ["data.describe()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Il y a environ 80% de survie dans ces donn\u00e9es."]}, {"cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": ["import numpy\n", "duree = data.sortie - data.entree\n", "deces = (data.issue == 0).astype(numpy.int32)"]}, {"cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [{"data": {"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlMAAAEGCAYAAABB6hAxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAgMUlEQVR4nO3de5TdZX3v8fc3FzMRQkZy6dFJxiCNEUxsaOc4IK7ULEGBFTJobRPGnpYjC6SFVq1lLUCPXEQrhyNWlpeVeECQgizEg6QaDCVCOSeGwKTFAIGYAZIwYzQXmZCETBiS5/wxew+byVx2Zu+ZfZn3ay1W9u8yv99DfmsPH77P83ueSCkhSZKkoRlT6gZIkiRVMsOUJElSAQxTkiRJBTBMSZIkFcAwJUmSVIBxpbrx1KlT06xZs0p1e0mSpLytX79+V0ppWl/HShamZs2aRUtLS6luL0mSlLeI2NrfMbv5JEmSCmCYkiRJKoBhSpIkqQCDjpmKiFuBRcCOlNLcPo4H8E3gHOBV4IKU0n8Uu6GSJFWrrq4u2tra6OzsLHVTRr2amhpmzJjB+PHj8/6ZfAag3wZ8C/hBP8fPBmZn/mkEvpv5U5Ik5aGtrY1JkyYxa9YsumsUKoWUErt376atrY0TTjgh758btJsvpfQo8PsBTmkCfpC6PQbURsTb826BJEmjXGdnJ1OmTDFIlVhEMGXKlKOuEBZjzFQd8FLOdltm3xEi4uKIaImIlp07dxbh1pIkVQeDVHkYynMY0XmmUkrLgeUADQ0NaTjvde2/PsPG37zSs900v47mxvrhvKUkSRqFihGm2oGZOdszMvtK7pXOLgC27n6VVzq7DFOSJKnoitHNtwL4q+h2KrAnpbS9CNctyNXnvpcHPrOABz6zgHl1k9m6+1WWLFvLXeu2lbppkiSVnWOPPbbn88qVK3n3u9/N1q1bueaaa4gIWltbe47/8z//MxFRtJVMfvKTn7Bx48ae7S996Us89NBDBV+3o6OD73znOwVfZzCDhqmI+CGwFpgTEW0RcWFEXBIRl2ROWQm8ALQC3wP+dthaO0RN8+t455S38lT7Hu54bEupmyNJUtlavXo1f//3f88DDzzAO9/5TgDmzZvH3Xff3XPOj370I9773vcW7Z69w9R1113HGWecUfB1RypMDdrNl1I6f5DjCbi0aC0aBs2N9TQ31rNk2Vqeat/DkmVrHUMlSSpLvcf8FsPJ7ziOq88dPPw8+uijXHTRRaxcuZITTzyxZ/95553H/fffzxe/+EWef/55Jk+ePOg8TA8++CBXX301Bw8e5MQTT+T73/8+xx57LFdccQUrVqxg3LhxfOQjH+HjH/84K1as4N///d+5/vrr+fGPf8yXv/xlFi1axCc+8QlmzZrF+eefzwMPPMC4ceNYvnw5V155Ja2trVx++eVccskl7Nu3j6amJl5++WW6urq4/vrraWpq4oorruD5559n/vz5nHnmmdx4443ceOON3HPPPRw8eJCPfexjXHvttQX//ZZsoeNSaJpfxyudXTzVvscxVJIk5Th48CDnnXcejzzyCO95z3vedOy4445j5syZPP3009x///0sWbKE73//+/1ea9euXVx//fU89NBDHHPMMdxwww3cdNNNXHrppdx3330899xzRAQdHR3U1tayePHinvDUl/r6ep588kk+97nPccEFF7BmzRo6OzuZO3cul1xyCTU1Ndx3330cd9xx7Nq1i1NPPZXFixfzta99jaeffponn3wS6A54mzdv5vHHHyelxOLFi3n00UdZsGBBQX93oypMWaGSJJW7fCpIw2H8+PF84AMf4JZbbuGb3/zmEceXLl3K3XffzapVq1i9evWAYeqxxx5j48aNnH766QC89tprnHbaaUyePJmamhouvPBCFi1axKJFi/Jq2+LFi4Hu7sZ9+/YxadIkJk2axIQJE+jo6OCYY47hqquu4tFHH2XMmDG0t7fzu9/97ojrPPjggzz44IOccsopAOzbt4/NmzcbpobCCpUkSW82ZswY7rnnHj784Q/z1a9+lauuuupNxxctWsTll19OQ0MDxx133IDXSilx5pln8sMf/vCIY48//jirV6/m3nvv5Vvf+ha/+MUvBm3bhAkTetqY/Zzdfv3117nzzjvZuXMn69evZ/z48cyaNavPiTdTSlx55ZV8+tOfHvSeR2NULnTc3Fh/xFt+vuknSRrt3vrWt/Kzn/2MO++8k1tuueWIYzfccANf+MIXBr3Oqaeeypo1a3reANy/fz+//vWv2bdvH3v27OGcc87hG9/4Br/61a8AmDRpEnv37h1yu/fs2cP06dMZP348Dz/8MFu3bu3zuh/96Ee59dZb2bdvHwDt7e3s2LFjyPfNGpWVqaxsheqVzi7nopIkCTj++OP5+c9/zoIFC5g2bdqbji1dujSva0ybNo3bbruN888/n4MHDwJw/fXXM2nSJJqamujs7CSlxE033dRz3Ysuuoibb76Ze++996jb/MlPfpJzzz2XefPm0dDQ0DPma8qUKZx++unMnTuXs88+mxtvvJFnn32W0047DeieDuJf/uVfmD59+lHfM1d0v4w38hoaGlKx5qcohiXL1rKhbQ91tTWcPnsq1y6eW+omSZJGiWeffZaTTjqp1M1QRl/PIyLWp5Qa+jp/VFemcjXNr6PjQBebfruX1p37eW77XgenS5KkQRmmMrJv+t21bhu3/fJFNrR10HHAbj9JkgbS2NjY05WXdccddzBv3rwStWjkGaZ6aW6s5+21NXzlpxvZlhmcboVKkjTcUkpERKmbcdTWrVtX6iYU1VCGP43Kt/kGs3DOdD71wXcx8/iJbGjr4Pa1W0rdJElSFaupqWH37t1D+g+5iielxO7du6mpqTmqn7My1Y++KlSAVSpJUtHNmDGDtrY2du7cWeqmjHo1NTXMmDHjqH7GMDWAhXOms72jkzse2+L0CZKkYTN+/HhOOOGEUjdDQ+TUCEchuwzNvLrJVqgkSRpFnBqhSFyGRpIk9eYA9KPQ1zI0LkEjSdLoZmVqCKxQSZKkLCtTQ2CFSpIkZVmZKoAVKkmSZGWqAFaoJEmSlakisEIlSdLo5TxTRZQ7DxU4W7okSdXCeaZGSLZC5WzpkiSNHlamhomzpUuSVD0Gqkw5AH2YNM2vY8bbJrKhrYPb124pdXMkSdIwMUwNk+bGeq485yTqaieyzTf9JEmqWoapYbRwznQ+9cF3MfN4K1SSJFUrw9Qws0IlSVJ1M0yNACtUkiRVL8PUCMmtUJXqDUpJklR8hqkRtHDOdKYcO4GXfn+AM77+CGd8/RGuXvF0qZslSZIK4KSdI6xpfh0dB7o4dOgw7R0HWPv87lI3SZIkFSCvylREnBURmyKiNSKu6ON4fUQ8HBH/GREbIuKc4je1OjQ31rPqswt46PMf4n0zanuqVFaoJEmqTIOGqYgYC3wbOBs4GTg/Ik7uddoXgXtSSqcAS4HvFLuh1ahpfh31U95qhUqSpAqWT2Xq/UBrSumFlNJrwN1AU69zEnBc5vNk4DfFa2L1ylaprFBJklS58hkzVQe8lLPdBjT2Ouca4MGI+DvgGOCMvi4UERcDFwPU17tWXVZ2HNW23futUEmSVGGK9Tbf+cBtKaUZwDnAHRFxxLVTSstTSg0ppYZp06YV6daVzwqVJEmVK5/KVDswM2d7RmZfrguBswBSSmsjogaYCuwoRiNHCytUkiRVnnwqU08AsyPihIh4C90DzFf0Omcb8GGAiDgJqAF2FrOho0FfFSqrVJIklbdBK1Mppdcj4jJgFTAWuDWl9ExEXAe0pJRWAJ8HvhcRn6N7MPoFyWm+h8y5qCRJqhxRqszT0NCQWlpaSnLvSrJk2Vo2tO2hrraG02dP5drFc0vdJEmSRp2IWJ9SaujrmDOglznHUUmSVN5cm6/M+aafJEnlzcpUhbBCJUlSebIyVSGsUEmSVJ6sTFUYK1SSJJUX3+arULlv+QG+6SdJ0jDybb4q5FxUkiSVBytTVWDJsrU81b6HeXWTaZpfR3Oji0hLklRMVqaqXNP8Ol7p7OKp9j280tllmJIkaQT5Nl8VaG6s54HPLGBe3WS27n6VJcvWcte6baVuliRJo4KVqSpihUqSpJFnZaqKWKGSJGnkGaaqUNP8Oma8bSIb2jq4fe2WUjdHkqSqZpiqQs2N9Vx5zknU1U5kmxUqSZKGlWGqSi2cM51PffBdzDzeCpUkScPJAehVrLmxnrfX1vCVn27sqVABzkUlSVIRGaaq3MI509ne0cltv3yR3fsO0t5xgI4DvuknSVKxGKZGgWyFau+BLm5evZlSzXovSVI1cszUKLFwznQWz69jyrETeOn3Bzjj649w9YqnS90sSZIqnpWpUSa7QPK23ftdHFmSpCKwMjXKNDfWs+qzC3jfjForVJIkFYGVqVHKCpUkScVhZWqUskIlSVJxWJka5axQSZJUmCjVa/INDQ2ppaWlJPfWkZYsW8uGtj3U1dYAcPrsqVy7eG6JWyVJUnmIiPUppYa+jlmZEvBGherQocO0dxywSiVJUp6sTOkIS5at5an2Pcyrm+zSM5IkYWVKR6lpfh2vdHbxVPseXul06RlJkgbi23w6QnNjPQ98ZgHz6iazNbNA8l3rtpW6WZIklSXDlPrVNL+OGW+byIa2Dm5fu6XUzZEkqSwZptSv5sZ6rjznJOpqJ7LNCpUkSX3KK0xFxFkRsSkiWiPiin7O+YuI2BgRz0TEXcVtpkpl4ZzpfOqD72Lm8VaoJEnqy6AD0CNiLPBt4EygDXgiIlaklDbmnDMbuBI4PaX0ckRMH64Ga+Q1N9bz9toavvLTjZTq7U9JkspVPpWp9wOtKaUXUkqvAXcDTb3OuQj4dkrpZYCU0o7iNlOltnDOdKYcO6Fn6RmXn5EkqVs+YaoOeClnuy2zL9e7gXdHxJqIeCwizurrQhFxcUS0RETLzp07h9ZilUzT/Drqp7wVwIk9JUnKKNYA9HHAbOBDwPnA9yKitvdJKaXlKaWGlFLDtGnTinRrjZTs4sgPff5DLpAsSVJGPmGqHZiZsz0jsy9XG7AipdSVUnoR+DXd4UpVKlulskIlSRrt8glTTwCzI+KEiHgLsBRY0eucn9BdlSIiptLd7fdC8ZqpcpOtUlmhkiSNdoO+zZdSej0iLgNWAWOBW1NKz0TEdUBLSmlF5thHImIjcAi4PKVkuWIUyC6QvG33fitUkqRRyYWOVRQujixJqmYDLXTsDOgqCpeekSSNVoN280n5yJ3Yc9vu/Zzx9UcAOH32VK5dPLe0jZMkaRhZmVLRZJeeqaudyPhxY/jNnk7WbN5V6mZJkjSsrEypqLIVqr0Hurh59WbaOzpZsmyt46gkSVXLMKWiWzine2nGfQcPcdsvX2RDWwcdB7oMU5KkqmQ3n4ZNc2M9V55zEnW1E9m2+1WWLFvLXeu2lbpZkiQVlWFKwyo7jmrm8b7pJ0mqToYpDTsrVJKkamaY0oiwQiVJqlaGKY2Y3ApVqWbelySp2AxTGlEL50xnyrETXBxZklQ1nBpBI87FkSVJ1cSFjlUyS5atZUPbHupqawCXnpEkla+BFjq2MqWSyVaoDh06THvHAatUkqSKZGVKZSG3SmWFSpJUbqxMqew5jkqSVKl8m09lobmxnlWfXcD7ZtT6pp8kqaJYmVJZyVaoNv12L6079/Pc9r00za9zkWRJUtkyTKmsNDfW09xYz13rtnHbL19kQ1sHHQe6DFOSpLJlN5/KkrOlS5IqhWFKZav3bOmOo5IklSO7+VTWcueiat2533FUkqSyY5hSWcuOoQIcRyVJKkt286li5I6j2rb7VZYsW8td67aVulmSpFHOMKWKsnDOdD71wXcx8/iJbGjr4Pa1W0rdJEnSKGeYUsWxQiVJKieGKVUkK1SSpHLhAHRVrObGet5eW8NXfrqxp0IF+KafJGlEGaZU0RbOmc72jk5u++WL7N53kPaOA77pJ0kaUYYpVbxshWrvgS5uXr25p0plhUqSNBIMU6oKC+dMB2DfwUPORSVJGlF5DUCPiLMiYlNEtEbEFQOc92cRkSKioXhNlPLnm36SpJE2aJiKiLHAt4GzgZOB8yPi5D7OmwR8BlhX7EZKR8M3/SRJIymfytT7gdaU0gsppdeAu4GmPs77MnAD0FnE9klD8uYK1X4XSZYkDZt8wlQd8FLOdltmX4+I+GNgZkrpZwNdKCIujoiWiGjZuXPnUTdWOhrZClVd7UR+s6eTNZt3lbpJkqQqVPAA9IgYA9wEXDDYuSml5cBygIaGhlTovaXB5M5F1d5xgDO+/ggAp8+eyrWL55a2cZKkqpBPmGoHZuZsz8jsy5oEzAUeiQiA/wKsiIjFKaWWYjVUGqrsXFS3r93CoUOHad25n9ad+3lu+16nT5AkFSyfMPUEMDsiTqA7RC0FmrMHU0p7gKnZ7Yh4BPhHg5TKSXNjfU9oumvdNqdPkCQVzaBjplJKrwOXAauAZ4F7UkrPRMR1EbF4uBsoFZvTJ0iSiimvMVMppZXAyl77vtTPuR8qvFnS8MpdhsYKlSSpEHlN2ilVIytUkqRiMExpVHOCT0lSoQxTGvWc4FOSVAgXOpZ4YwzVrf/vBSf4lCQdFcOUlOEEn5KkobCbT8qRuwTN+HFjrFJJkgZlZUrqJVuh2nugi5tXb+6pUlmhkiT1xTAl9WHhnOkA7Dt4iNvXbmHTb/fSunM/azbvMlRJkt7EMCUNILsMzV3rtjk4XZLUJ8dMSXlobqznC4tO5h2Ta3q6/Zw+QZIEVqakvDl9giSpL4Yp6Sg4fYIkqTfDlHSUcitU48eN4dntDk6XpNHMMCUNQe70CWtad/HElpft+pOkUSpSSiW5cUNDQ2ppaSnJvaVie3jTjp6uv7raiVaoJKnKRMT6lFJDX8esTElFkNv117pzP6079/Pc9r00za+jubG+1M2TJA0jw5RUJNmuv58/tZ31W19mQ1sHHQe6DFOSVOXs5pOGQe9uP/CNP0mqZHbzSSOsvzf+7PqTpOpjmJKGSe83/uz6k6TqZDefNEJ840+SKpfdfFIZyHb93b52C5t+60SfklQtDFPSCGpurKe5sZ671m1zjT9JqhKGKakEXONPkqqHYUoqkdxuv0OHDvdM9mnXnyRVFgegS2Ui2/XXunM/AH847RhDlSSVCQegSxUgdwZ1F06WpMoxptQNkPSGhXOmc8Mn/oj/ce7JvGNyTc94qqtXPF3qpkmS+mFlSipDfS2c7FgqSSpPhimpTPXu9ssNVeCbf5JULgxTUhlbOGc6C+dM5+FNO3pC1fhxY9i6+1XHU0lSmchrzFREnBURmyKiNSKu6OP4P0TExojYEBGrI+KdxW+qNHrljqX6mz890fFUklRGBq1MRcRY4NvAmUAb8ERErEgpbcw57T+BhpTSqxHxN8D/BJYMR4Ol0WzhnOkA7Dt4yGVpJKlM5NPN936gNaX0AkBE3A00AT1hKqX0cM75jwF/WcxGSnqz3svSOEhdkkonnzBVB7yUs90GNA5w/oXAA30diIiLgYsB6uvr82yipP44SF2SSq+oA9Aj4i+BBuBP+zqeUloOLIfuGdCLeW9ptOpvkPqz2+0ClKSRkE+Yagdm5mzPyOx7k4g4A/gC8KcppYPFaZ6kfOWGqr0HuljTuuuIapWhSpKKL58w9QQwOyJOoDtELQWac0+IiFOAZcBZKaUdRW+lpLxlB6kvnl/3pmqVoUqShsegYSql9HpEXAasAsYCt6aUnomI64CWlNIK4EbgWOBHEQGwLaW0eBjbLSkPfXUBuuafJBVXpFSaoUsNDQ2ppaWlJPeWRquHN+3gKz/dSHvHAepqJwIOUpekfETE+pRSQ1/HnAFdGkVy1/xzkLokFYdhShplstMpOEhdkorDMCWNQg5Sl6TiMUxJo1xfg9QNVZKUP8OUJODIULV+qzOqS1I+DFOS3iQbqoCetf8crC5J/TNMSeqXg9UlaXCGKUkDcrC6JA3MMCUpb4MNVs9lwJI0WhimJB21vkLV+HFjeo5nx1c9t30vTfPraG6sL2FrJWl4uZyMpII9vGkHew909Wyvad3V8zYgwB9OO8ZKlaSK5nIykoZVdlxV1kDjq8AuQEnVxTAlaVj01xXoFAuSqo1hStKwyg1VTrEgqRoZpiSNiHymWMhlwJJUKQxTkkZcX0vXABxOMGH8m7sCswxXksqVYUpSyeQuXQMc0RWYnW7BcVaSyplhSlLZ6N0VmJ1uoa9xVmPHjqF24ngA57KSVFLOMyWpIuSOs5owfgwHuw4zJnAuK0kjwnmmJFW83m8FZvVVtcoyXEkaCYYpSRVloAlCcweyv7Cr77cEs92Ddg1KKha7+SRVnd7hCt54U/Bg12Fe2PVG16BjryTlY6BuPsOUpFGhrwHt/Y29yrKbUFKWY6YkjXq53YO93xYEBpyOIZfdhJJ6szIlSRm9q1e9JxMFjugmzLKKJVU3K1OSlIfe1ausvqpY+Qx2z7KaJVU3K1OSVID+lsTJ1V81K5eVLam8WZmSpGHS35I4vfXXbQj9j8/KsrIllTcrU5I0wgbqNszKZ5xWX3KnegCne5CKxakRJKnCHG3gyspO9QB9T/fQl94BLJdhTOpmmJKkKtRfl2JWXwEsV+5EptkAlmuwMDZQCMsyjKlaFBymIuIs4JvAWOB/p5S+1uv4BOAHwJ8Au4ElKaUtA13TMCVJpTWUMNa7+7GvEJaVb2UsK59w1pthTSOloDAVEWOBXwNnAm3AE8D5KaWNOef8LfC+lNIlEbEU+FhKaclA1zVMSVLlGSyA5RqsMtbbmDHB2DHBocOJw4cH/x/9ow1ruYYS3MqB4bF0Cg1TpwHXpJQ+mtm+EiCl9E8556zKnLM2IsYBvwWmpQEubpiSJPUl38B2tGEtVza4FUu+AbAQhYTHcjCcAfbkdxzH1ee+d1iunVXo1Ah1wEs5221AY3/npJRej4g9wBTgTe/5RsTFwMUA9fUma0nSkXKnmhhI7sSqpXY0FbuhKiQ8loOUEq90Ds/f0fY9ncNy3XyN6DxTKaXlwHLorkyN5L0lSRou+QbAQpRTeNSbjRn8FNqBmTnbMzL7+jwn0803me6B6JIkSVUtnzD1BDA7Ik6IiLcAS4EVvc5ZAfx15vMngF8MNF5KkiSpWgzazZcZA3UZsIruqRFuTSk9ExHXAS0ppRXALcAdEdEK/J7uwCVJklT18hozlVJaCazste9LOZ87gT8vbtMkSZLKXz7dfJIkSeqHYUqSJKkAhilJkqQCGKYkSZIKkNdCx8Ny44idwNZhvs1Ues3Crori86tsPr/K5vOrXD674fHOlNK0vg6ULEyNhIho6W8dHZU/n19l8/lVNp9f5fLZjTy7+SRJkgpgmJIkSSpAtYep5aVugAri86tsPr/K5vOrXD67EVbVY6YkSZKGW7VXpiRJkoaVYUqSJKkAVRumIuKsiNgUEa0RcUWp26PBRcSWiHgqIp6MiJbMvuMj4t8iYnPmz7eVup3qFhG3RsSOiHg6Z1+fzyu63Zz5Pm6IiD8uXcvVz7O7JiLaM9+/JyPinJxjV2ae3aaI+GhpWq2siJgZEQ9HxMaIeCYiPpPZ7/evRKoyTEXEWODbwNnAycD5EXFyaVulPC1MKc3PmSPlCmB1Smk2sDqzrfJwG3BWr339Pa+zgdmZfy4GvjtCbVTfbuPIZwfwjcz3b35KaSVA5nfnUuC9mZ/5TuZ3rErndeDzKaWTgVOBSzPPye9fiVRlmALeD7SmlF5IKb0G3A00lbhNGpom4PbM59uB80rXFOVKKT0K/L7X7v6eVxPwg9TtMaA2It4+Ig3VEfp5dv1pAu5OKR1MKb0ItNL9O1YlklLanlL6j8znvcCzQB1+/0qmWsNUHfBSznZbZp/KWwIejIj1EXFxZt8fpJS2Zz7/FviD0jRNeervefmdrAyXZbqBbs3pUvfZlbGImAWcAqzD71/JVGuYUmX6YErpj+kuSV8aEQtyD6bueTycy6NC+LwqzneBE4H5wHbg6yVtjQYVEccCPwY+m1J6JfeY37+RVa1hqh2YmbM9I7NPZSyl1J75cwdwH91dCb/LlqMzf+4oXQuVh/6el9/JMpdS+l1K6VBK6TDwPd7oyvPZlaGIGE93kLozpfR/Mrv9/pVItYapJ4DZEXFCRLyF7sGTK0rcJg0gIo6JiEnZz8BHgKfpfm5/nTntr4H7S9NC5am/57UC+KvMW0WnAntyuiNUBnqNofkY3d8/6H52SyNiQkScQPcg5sdHun16Q0QEcAvwbErpppxDfv9KZFypGzAcUkqvR8RlwCpgLHBrSumZEjdLA/sD4L7u3xGMA+5KKf08Ip4A7omIC4GtwF+UsI3KERE/BD4ETI2INuBq4Gv0/bxWAufQPXj5VeC/j3iD1aOfZ/ehiJhPd9fQFuDTACmlZyLiHmAj3W+RXZpSOlSCZusNpwP/DXgqIp7M7LsKv38l43IykiRJBajWbj5JkqQRYZiSJEkqgGFKkiSpAIYpSZKkAhimJEmSCmCYklRSEVEbEX+b+fyOiLi3SNe9JiL+MfP5uog4oxjXlaTenBpBUkll1hb7aUppbpGvew2wL6X0v4p5XUnqzcqUpFL7GnBiRDwZET+KiKcBIuKCiPhJRPxbRGyJiMsi4h8i4j8j4rGIOD5z3okR8fPMAtn/NyLe0/sGEXFbRHwi83lLRFwbEf8REU9lz8/Mwn9rRDyeuUfTCP4dSKpghilJpXYF8HxKaT5wea9jc4GPA/8V+ArwakrpFGAt8FeZc5YDf5dS+hPgH4Hv5HHPXZlFtb+b+RmALwC/SCm9H1gI3JhZ2kiSBlSVy8lIqhoPp5T2AnsjYg/wr5n9TwHvi4hjgQ8AP8osRQQwIY/rZheGXU93WIPu9SAXZ8dZATVAPfBsYf8KkqqdYUpSOTuY8/lwzvZhun9/jQE6MlWtoVz3EG/8Hgzgz1JKm4bWVEmjld18kkptLzBpKD+YUnoFeDEi/hwguv3RENuxCvi7yJS4IuKUIV5H0ihjmJJUUiml3cCazMDzG4dwiU8CF0bEr4BngKEOHP8yMB7YEBHPZLYlaVBOjSBJklQAK1OSJEkFMExJkiQVwDAlSZJUAMOUJElSAQxTkiRJBTBMSZIkFcAwJUmSVID/D60sxofYkOBnAAAAAElFTkSuQmCC\n", "text/plain": ["
"]}, "metadata": {"needs_background": "light"}, "output_type": "display_data"}], "source": ["import numpy\n", "import matplotlib.pyplot as plt\n", "from lifelines import KaplanMeierFitter\n", "fig, ax = plt.subplots(1, 1, figsize=(10, 4))\n", "kmf = KaplanMeierFitter()\n", "kmf.fit(duree, deces)\n", "kmf.plot(ax=ax)\n", "ax.legend();"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## R\u00e9gression de Cox\n", "\n", "On reprend les donn\u00e9es artificiellement g\u00e9n\u00e9r\u00e9es et on ajoute une variable identique \u00e0 la dur\u00e9e plus un bruit mais quasi nul "]}, {"cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dureedecesX1X2
5184882191125.653230-125.666662
5414082191126.006024-125.327549
4767351891107.920779-108.358230
5870132271129.788930-130.045019
4760571811103.642440-103.793008
\n", "
"], "text/plain": [" duree deces X1 X2\n", "518488 219 1 125.653230 -125.666662\n", "541408 219 1 126.006024 -125.327549\n", "476735 189 1 107.920779 -108.358230\n", "587013 227 1 129.788930 -130.045019\n", "476057 181 1 103.642440 -103.793008"]}, "execution_count": 8, "metadata": {}, "output_type": "execute_result"}], "source": ["import pandas\n", "data_simple = pandas.DataFrame({'duree': duree, 'deces': deces,\n", " 'X1': duree * 0.57 * deces + numpy.random.randn(duree.shape[0]),\n", " 'X2': duree * (-0.57) * deces + numpy.random.randn(duree.shape[0])})\n", "data_simple.head()"]}, {"cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": ["from sklearn.model_selection import train_test_split\n", "data_train, data_test = train_test_split(data_simple, test_size=0.8)"]}, {"cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["\r", "Iteration 1: norm_delta = 0.13943, step_size = 0.9000, log_lik = -250658.36250, newton_decrement = 889.93933, seconds_since_start = 0.0\n", "\r", "Iteration 2: norm_delta = 0.00660, step_size = 0.9000, log_lik = -249862.37270, newton_decrement = 2.81312, seconds_since_start = 0.0\n", "\r", "Iteration 3: norm_delta = 0.00073, step_size = 0.9000, log_lik = -249859.57376, newton_decrement = 0.03357, seconds_since_start = 0.1\n", "\r", "Iteration 4: norm_delta = 0.00000, step_size = 1.0000, log_lik = -249859.54017, newton_decrement = 0.00000, seconds_since_start = 0.1\n", "Convergence success after 4 iterations.\n"]}, {"data": {"text/plain": [""]}, "execution_count": 10, "metadata": {}, "output_type": "execute_result"}], "source": ["from lifelines.fitters.coxph_fitter import CoxPHFitter\n", "cox = CoxPHFitter()\n", "cox.fit(data_train[['duree', 'deces', 'X1']], duration_col=\"duree\", event_col=\"deces\",\n", " show_progress=True)"]}, {"cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
modellifelines.CoxPHFitter
duration col'duree'
event col'deces'
baseline estimationbreslow
number of observations124826
number of events observed24072
partial log-likelihood-249859.54
time fit was run2021-02-24 23:48:57 UTC
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coefexp(coef)se(coef)coef lower 95%coef upper 95%exp(coef) lower 95%exp(coef) upper 95%zp-log2(p)
X10.021.020.000.020.021.021.0242.23<0.005inf

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Concordance0.69
Partial AIC499721.08
log-likelihood ratio test1597.64 on 1 df
-log2(p) of ll-ratio testinf
\n", "
"], "text/latex": ["\\begin{tabular}{lrrrrrrrrrr}\n", "\\toprule\n", "{} & coef & exp(coef) & se(coef) & coef lower 95\\% & coef upper 95\\% & exp(coef) lower 95\\% & exp(coef) upper 95\\% & z & p & -log2(p) \\\\\n", "covariate & & & & & & & & & & \\\\\n", "\\midrule\n", "X1 & 0.02 & 1.02 & 0.00 & 0.02 & 0.02 & 1.02 & 1.02 & 42.23 & 0.00 & inf \\\\\n", "\\bottomrule\n", "\\end{tabular}\n"], "text/plain": ["\n", " duration col = 'duree'\n", " event col = 'deces'\n", " baseline estimation = breslow\n", " number of observations = 124826\n", "number of events observed = 24072\n", " partial log-likelihood = -249859.54\n", " time fit was run = 2021-02-24 23:48:57 UTC\n", "\n", "---\n", " coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%\n", "covariate \n", "X1 0.02 1.02 0.00 0.02 0.02 1.02 1.02\n", "\n", " z p -log2(p)\n", "covariate \n", "X1 42.23 <0.005 inf\n", "---\n", "Concordance = 0.69\n", "Partial AIC = 499721.08\n", "log-likelihood ratio test = 1597.64 on 1 df\n", "-log2(p) of ll-ratio test = inf"]}, "metadata": {}, "output_type": "display_data"}], "source": ["cox.print_summary()"]}, {"cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["\r", "Iteration 1: norm_delta = 0.13946, step_size = 0.9000, log_lik = -250658.36250, newton_decrement = 888.92089, seconds_since_start = 0.0\n", "\r", "Iteration 2: norm_delta = 0.00667, step_size = 0.9000, log_lik = -249863.61089, newton_decrement = 2.86434, seconds_since_start = 0.0\n", "\r", "Iteration 3: norm_delta = 0.00074, step_size = 0.9000, log_lik = -249860.76079, newton_decrement = 0.03426, seconds_since_start = 0.1\n", "\r", "Iteration 4: norm_delta = 0.00000, step_size = 1.0000, log_lik = -249860.72650, newton_decrement = 0.00000, seconds_since_start = 0.1\n", "Convergence success after 4 iterations.\n"]}, {"data": {"text/html": ["
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
modellifelines.CoxPHFitter
duration col'duree'
event col'deces'
baseline estimationbreslow
number of observations124826
number of events observed24072
partial log-likelihood-249860.73
time fit was run2021-02-24 23:48:59 UTC
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coefexp(coef)se(coef)coef lower 95%coef upper 95%exp(coef) lower 95%exp(coef) upper 95%zp-log2(p)
X2-0.020.980.00-0.02-0.020.980.98-42.21<0.005inf

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Concordance0.69
Partial AIC499723.45
log-likelihood ratio test1595.27 on 1 df
-log2(p) of ll-ratio testinf
\n", "
"], "text/latex": ["\\begin{tabular}{lrrrrrrrrrr}\n", "\\toprule\n", "{} & coef & exp(coef) & se(coef) & coef lower 95\\% & coef upper 95\\% & exp(coef) lower 95\\% & exp(coef) upper 95\\% & z & p & -log2(p) \\\\\n", "covariate & & & & & & & & & & \\\\\n", "\\midrule\n", "X2 & -0.02 & 0.98 & 0.00 & -0.02 & -0.02 & 0.98 & 0.98 & -42.21 & 0.00 & inf \\\\\n", "\\bottomrule\n", "\\end{tabular}\n"], "text/plain": ["\n", " duration col = 'duree'\n", " event col = 'deces'\n", " baseline estimation = breslow\n", " number of observations = 124826\n", "number of events observed = 24072\n", " partial log-likelihood = -249860.73\n", " time fit was run = 2021-02-24 23:48:59 UTC\n", "\n", "---\n", " coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%\n", "covariate \n", "X2 -0.02 0.98 0.00 -0.02 -0.02 0.98 0.98\n", "\n", " z p -log2(p)\n", "covariate \n", "X2 -42.21 <0.005 inf\n", "---\n", "Concordance = 0.69\n", "Partial AIC = 499723.45\n", "log-likelihood ratio test = 1595.27 on 1 df\n", "-log2(p) of ll-ratio test = inf"]}, "metadata": {}, "output_type": "display_data"}], "source": ["cox2 = CoxPHFitter()\n", "cox2.fit(data_train[['duree', 'deces', 'X2']], duration_col=\"duree\", event_col=\"deces\",\n", " show_progress=True)\n", "cox2.print_summary()"]}, {"cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
62172511035213998672623248121
0.00.0088060.0086730.0085430.0087170.008661
1.00.0182180.0179420.0176730.0180350.017918
2.00.0267350.0263300.0259360.0264660.026295
3.00.0360020.0354570.0349260.0356400.035409
4.00.0455430.0448540.0441820.0450850.044793
..................
184.02.0557362.0246231.9942772.0350702.021872
189.02.0920022.0603402.0294592.0709712.057541
197.02.1386872.1063182.0747472.1171862.103457
201.02.2055132.1721332.1395762.1833412.169182
217.02.3306292.2953562.2609522.3071992.292237
\n", "

165 rows \u00d7 5 columns

\n", "
"], "text/plain": [" 621725 110352 139986 72623 248121\n", "0.0 0.008806 0.008673 0.008543 0.008717 0.008661\n", "1.0 0.018218 0.017942 0.017673 0.018035 0.017918\n", "2.0 0.026735 0.026330 0.025936 0.026466 0.026295\n", "3.0 0.036002 0.035457 0.034926 0.035640 0.035409\n", "4.0 0.045543 0.044854 0.044182 0.045085 0.044793\n", "... ... ... ... ... ...\n", "184.0 2.055736 2.024623 1.994277 2.035070 2.021872\n", "189.0 2.092002 2.060340 2.029459 2.070971 2.057541\n", "197.0 2.138687 2.106318 2.074747 2.117186 2.103457\n", "201.0 2.205513 2.172133 2.139576 2.183341 2.169182\n", "217.0 2.330629 2.295356 2.260952 2.307199 2.292237\n", "\n", "[165 rows x 5 columns]"]}, "execution_count": 13, "metadata": {}, "output_type": "execute_result"}], "source": ["cox.predict_cumulative_hazard(data_test[:5])"]}, {"cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
62172511035213998672623248121
0.00.9912330.9913650.9914940.9913210.991377
1.00.9819470.9822180.9824820.9821270.982242
2.00.9736190.9740130.9743980.9738810.974048
3.00.9646380.9651640.9656770.9649880.965211
4.00.9554780.9561370.9567800.9559160.956196
..................
184.00.1279990.1320440.1361120.1306710.132407
189.00.1234400.1274110.1314070.1260630.127768
197.00.1178090.1216850.1255880.1203700.122034
201.00.1101940.1139340.1177050.1126640.114271
217.00.0972350.1007260.1042510.0995400.101040
\n", "

165 rows \u00d7 5 columns

\n", "
"], "text/plain": [" 621725 110352 139986 72623 248121\n", "0.0 0.991233 0.991365 0.991494 0.991321 0.991377\n", "1.0 0.981947 0.982218 0.982482 0.982127 0.982242\n", "2.0 0.973619 0.974013 0.974398 0.973881 0.974048\n", "3.0 0.964638 0.965164 0.965677 0.964988 0.965211\n", "4.0 0.955478 0.956137 0.956780 0.955916 0.956196\n", "... ... ... ... ... ...\n", "184.0 0.127999 0.132044 0.136112 0.130671 0.132407\n", "189.0 0.123440 0.127411 0.131407 0.126063 0.127768\n", "197.0 0.117809 0.121685 0.125588 0.120370 0.122034\n", "201.0 0.110194 0.113934 0.117705 0.112664 0.114271\n", "217.0 0.097235 0.100726 0.104251 0.099540 0.101040\n", "\n", "[165 rows x 5 columns]"]}, "execution_count": 14, "metadata": {}, "output_type": "execute_result"}], "source": ["cox.predict_survival_function(data_test[:5])"]}, {"cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.7"}}, "nbformat": 4, "nbformat_minor": 4}