.. image:: pystat.png
    :height: 20
    :alt: Statistique
    :target: http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/td_2a_notions.html#pour-un-profil-plutot-data-scientist

.. _l-td2a-reinforcement-learning:

Reinforcement Learning
++++++++++++++++++++++

ou *apprentissage par renforcement*

*(année prochaine)*

*Lectures*

* `Deep	Reinforcement Learning through Policy Optmization <http://people.eecs.berkeley.edu/~pabbeel/nips-tutorial-policy-optimization-Schulman-Abbeel.pdf>`_
  (vu dans `Highlights of NIPS 2016: Adversarial learning, Meta-learning, and more <http://sebastianruder.com/highlights-nips-2016/index.html>`_)
* `The Nuts and Bolts of Deep RL Research <http://rll.berkeley.edu/deeprlcourse/docs/nuts-and-bolts.pdf>`_
* `A Comprehensive Survey on Safe Reinforcement Learning <http://www.jmlr.org/papers/volume16/garcia15a/garcia15a.pdf>`_
* `RLPy: A Value-Function-Based Reinforcement Learning Framework for Education and Research <http://www.jmlr.org/papers/volume16/geramifard15a/geramifard15a.pdf>`_
* `UCL Course on RL <http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html>`_
* `Reinforcement Learning Part I <http://www.labri.fr/perso/nrougier/downloads/Chile-2014-Lecture-1.pdf>`_
  `Reinforcement Learning Part II <http://www.labri.fr/perso/nrougier/downloads/Chile-2014-Lecture-2.pdf>`_
* `Strategic Attentive Writer for Learning Macro-Actions <https://arxiv.org/pdf/1606.04695.pdf>`_
* `Temporal difference learning <https://en.wikipedia.org/wiki/Temporal_difference_learning>`_
* `Renewal Monte Carlo: Renewal theory based reinforcement learning <https://arxiv.org/abs/1804.01116>`_
* `AlphaGo Zero - How and Why it Works <http://tim.hibal.org/blog/alpha-zero-how-and-why-it-works/>`_
* `Open AI Spinning up <https://spinningup.openai.com/en/latest/index.html>`_

*Lectures sur des applications*

* `Universal Planning Networks <https://arxiv.org/abs/1804.00645>`_ :
  utilisation de l'apprentissage par renforcement pour caler la progression
  du bras d'un robot vers la saisie d'une pièce lorsque le chemin est obstrué.

*Exemples*

* `pg-pong.py <https://gist.github.com/karpathy/a4166c7fe253700972fcbc77e4ea32c5>`_
* `Reinforcement_Toys <https://github.com/JbRemy/Reinforcement_Toys>`_,
  notebooks pour découvrir l'apprentissage par renforcement

*Modules*

* `Ray RLlib <http://ray.readthedocs.io/en/latest/rllib.html>`_ (ray - rllib)
* `keras-rl <https://github.com/keras-rl/keras-rl>`_
* `gym <https://github.com/openai/gym>`_

*Environnements*

* `OpenAI Gym <https://gym.openai.com/>`_, l'outil propose une
  formalisation qui permet de tester
  les algorithmes d'apprentissage par renforcements pour ses propres
  expériences ou pour des contextes ou jeu prédéfinies.
  Cela peut aboutir à ce type d'expérience :
  `OpenAI Gym for NES games + DQN with Keras to learn Mario Bros. from raw pixels <https://naereen.github.io/gym-nes-mario-bros/>`_.