{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# 2A.ml - Analyse de sentiments\n", "\n", "C'est d\u00e9sormais un probl\u00e8me classique de machine learning. D'un c\u00f4t\u00e9, du texte, de l'autre une appr\u00e9ciation, le plus souvent binaire, positive ou n\u00e9gative mais qui pourrait \u00eatre graduelle."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": ["%matplotlib inline"]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [{"data": {"text/html": ["
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 3, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Les donn\u00e9es\n", "\n", "On r\u00e9cup\u00e8re les donn\u00e9es depuis le site UCI [Sentiment Labelled Sentences Data Set](https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences) o\u00f9 on utilise la fonction ``load_sentiment_dataset``."]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sentancesentimentsource
0So there is no way for me to plug it in here i...0amazon_cells_labelled
1Good case, Excellent value.1amazon_cells_labelled
2Great for the jawbone.1amazon_cells_labelled
3Tied to charger for conversations lasting more...0amazon_cells_labelled
4The mic is great.1amazon_cells_labelled
\n", "
"], "text/plain": [" sentance sentiment source\n", "0 So there is no way for me to plug it in here i... 0 amazon_cells_labelled\n", "1 Good case, Excellent value. 1 amazon_cells_labelled\n", "2 Great for the jawbone. 1 amazon_cells_labelled\n", "3 Tied to charger for conversations lasting more... 0 amazon_cells_labelled\n", "4 The mic is great. 1 amazon_cells_labelled"]}, "execution_count": 4, "metadata": {}, "output_type": "execute_result"}], "source": ["from ensae_teaching_cs.data import load_sentiment_dataset\n", "df = load_sentiment_dataset()\n", "df.head()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 1 : approche td-idf\n", "\n", "La cible est la colonne *sentiment*, les deux autres colonnes sont les features. Il faudra utiliser les pr\u00e9traitements [LabelEncoder](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html), [OneHotEncoder](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html), [TF-IDF](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfTransformer.html). L'un d'entre eux n'est pas n\u00e9cessaire depuis la version [0.20.0](http://scikit-learn.org/stable/whats_new.html#sklearn-preprocessing) de *scikit-learn*."]}, {"cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 2 : word2vec\n", "\n", "On utilise l'approche [word2vec](https://en.wikipedia.org/wiki/Word2vec) du module [gensim](https://radimrehurek.com/gensim/models/word2vec.html) ou [spacy](https://spacy.io/usage/vectors-similarity)."]}, {"cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 3 : comparer les deux approches\n", "\n", "Avec une courbe [ROC](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html) par exemple."]}, {"cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0"}}, "nbformat": 4, "nbformat_minor": 2}