module filehelper.pig_helper

Short summary

module pyenbc.filehelper.pig_helper

Hadoop uses a java implementation of Python: Jython. This provides provides helper around that.

Functions

function

truncated documentation

download_pig_standalone

Downloads the standalone :epkg:`jython`. If it does not exists, we should version HADOOP_VERSION by default …

get_hadoop_jars

Returns the list of jars to include into the command line in order to run :epkg:`HADOOP`.

get_hadoop_path

This function assumes a folder pig hadoopjar is present in this directory, the function returns the folder.

get_pig_jars

Returns the list of jars to include into the command line in order to run :epkg:`PIG`.

get_pig_path

This function assumes a folder pig pigjar is present in this directory, the function returns the folder

run_pig

Runs a :epkg:`pig` script and returns the standard output and error.

Documentation

Hadoop uses a java implementation of Python: Jython. This provides provides helper around that.

New in version 1.1.

source on GitHub

pyenbc.filehelper.pig_helper.download_pig_standalone(pig_version='0.17.0', hadoop_version='2.9.1', fLOG=<function noLOG>)[source]

Downloads the standalone :epkg:`jython`. If it does not exists, we should version HADOOP_VERSION by default in order to fit the cluster’s version.

param pig_version

pig_version

param hadoop_version

hadoop_version

param fLOG

logging function

return

location

This function might need to be run twice if the first try fails, it might to due to very long path when unzipping the downloaded file.

t :epkg:`Hadoop` is downloaded from one of the websites

referenced at Apache Software Foundation. Check the source to see which one was chosen.

source on GitHub

pyenbc.filehelper.pig_helper.get_hadoop_jars()[source]

Returns the list of jars to include into the command line in order to run :epkg:`HADOOP`.

Returns

list of jars

source on GitHub

pyenbc.filehelper.pig_helper.get_hadoop_path()[source]

This function assumes a folder pig hadoopjar is present in this directory, the function returns the folder.

Returns

absolute path

source on GitHub

pyenbc.filehelper.pig_helper.get_pig_jars()[source]

Returns the list of jars to include into the command line in order to run :epkg:`PIG`.

Returns

list of jars

source on GitHub

pyenbc.filehelper.pig_helper.get_pig_path()[source]

This function assumes a folder pig pigjar is present in this directory, the function returns the folder

Returns

absolute path

source on GitHub

pyenbc.filehelper.pig_helper.run_pig(pigfile, argv=None, pig_path=None, hadoop_path=None, jython_path=None, timeout=None, logpath='logs', pig_version='0.17.0', hadoop_version='2.9.1', jar_no_hadoop=True, fLOG=<function noLOG>)[source]

Runs a :epkg:`pig` script and returns the standard output and error.

Parameters
  • pigfile – pig file

  • argv – arguments to sned to the command line

  • pig_path – path to pig 0.XX.0

  • hadoop_path – path to hadoop

  • timeout – timeout

  • logpath – path to the logs

  • pig_version – PIG version (if pig_path is not defined)

  • hadoop_version – Hadoop version (if hadoop_path is not defined)

  • jar_no_hadoop – use :epkg:`pig` without :epkg:`hadoop`

  • fLOG – logging function

Returns

out, err

If pig_path is None, the function looks into this directory.

source on GitHub