.. _pigparamsazurecorrectionrst: ====================================== PIG et Paramètres (Azure) (correction) ====================================== .. only:: html **Links:** :download:`notebook `, :downloadlink:`html `, :download:`PDF `, :download:`python `, :downloadlink:`slides `, :githublink:`GitHub|_doc/notebooks/pig_hive/pig_params_azure_correction.ipynb|*` Correction. .. code:: ipython3 from jyquickhelper import add_notebook_menu add_notebook_menu() .. contents:: :local: Connexion au cluster -------------------- On prend le cluster `Cloudera `__. Il faut exécuter ce script pour pouvoir notifier au notebook que la variable ``params`` existe. .. code:: ipython3 from pyquickhelper.ipythonhelper import open_html_form params={"blob_storage":"", "password1":"", "hadoop_server":"", "password2":"", "username":"alias"} open_html_form(params=params,title="server + hadoop + credentials", key_save="blobhp") .. raw:: html

server + hadoop + credentials
blob_storage
hadoop_server
password1
password2
username

.. code:: ipython3 import pyensae %load_ext pyensae %load_ext pyenbc blobstorage = blobhp["blob_storage"] blobpassword = blobhp["password1"] hadoop_server = blobhp["hadoop_server"] hadoop_password = blobhp["password2"] username = blobhp["username"] client, bs = %hd_open client, bs .. parsed-literal:: (, ) Exercice 1 : min, max --------------------- On ajoute deux paramètres pour construire l’histogramme entre deux valeurs ``a``,\ ``b``. Ajouter ces deux paramètres au nom du fichier de sortie peut paraître raisonnable mais l’interpréteur a du mal à identifier les paramètres ``Undefined parameter : bins_``. On utilise des tirets. .. code:: ipython3 %%PIG histogramab.pig values = LOAD '$CONTAINER/$PSEUDO/random/random.sample.txt' USING PigStorage('\t') AS (x:double); values_f = FILTER values BY x >= $a AND x <= $b ; -- ligne ajoutée values_h = FOREACH values_f GENERATE x, ((int)(x / $bins)) * $bins AS h ; hist_group = GROUP values_h BY h ; hist = FOREACH hist_group GENERATE group, COUNT(values_h) AS nb ; STORE hist INTO '$CONTAINER/$PSEUDO/random/histo_$bins-$a-$b.txt' USING PigStorage('\t') ; .. code:: ipython3 if client.exists(bs, client.account_name, "$PSEUDO/random/histo_0.1-0.2-0.8.txt"): r = client.delete_folder (bs, client.account_name, "$PSEUDO/random/histo_0.1-0.2-0.8.txt") print(r) .. code:: ipython3 jid = client.pig_submit(bs, client.account_name, "histogramab.pig", params = dict(bins="0.1", a="0.2", b="0.8") ) jid .. parsed-literal:: {'id': 'job_1416874839254_0202'} .. code:: ipython3 st = %hd_job_status jid["id"] st["id"],st["percentComplete"],st["status"]["jobComplete"] .. parsed-literal:: ('job_1416874839254_0202', '100% complete', True) .. code:: ipython3 %hd_tail_stderr jid["id"] .. raw:: html


    Job DAG:
    job_1416874839254_0203


    2014-12-03 22:17:28,903 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - No FileSystem for scheme: wasb. Not creating success file
    2014-12-03 22:17:28,903 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at headnodehost/100.74.20.101:9010
    2014-12-03 22:17:28,965 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2014-12-03 22:17:29,784 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!

.. code:: ipython3 import os if os.path.exists("histo.txt") : os.remove("histo.txt") %blob_downmerge /$PSEUDO/random/histo_0.1-0.2-0.8.txt histo.txt .. parsed-literal:: 'histo.txt' .. code:: ipython3 import matplotlib.pyplot as plt plt.style.use('ggplot') import pandas df = pandas.read_csv("histo.txt", sep="\t",names=["bin","nb"]) df.plot(x="bin",y="nb",kind="bar") .. parsed-literal:: .. image:: pig_params_azure_correction_12_1.png