module faq.faq_web

Short summary

module ensae_teaching_cs.faq.faq_web

A few functions about scrapping

source on GitHub

Functions

function

truncated documentation

_get_selenium_browser

Returns the associated driver with some custom settings. The function automatically gets chromedriver if not present …

webhtml

Uses the module selenium to retrieve the html content of a website.

webshot

Uses the module selenium to take a picture of a website. If url and img are lists, the function goes …

Documentation

A few functions about scrapping

source on GitHub

ensae_teaching_cs.faq.faq_web._get_selenium_browser(navigator, fLOG=<function noLOG>)[source]

Returns the associated driver with some custom settings.

The function automatically gets chromedriver if not present (Windows only). On Linux, package chromium-driver should be installed: apt-get install chromium-driver.

Issue with Selenium and Firefox

Firefox >= v47 does not work on Windows. See Selenium WebDriver and Firefox 47.

Voir ChromeDriver download, Error message: “chromedriver” executable needs to be available in the path.

See Selenium - Remote WebDriver example, see also Running the remote driver with Selenium and python.

source on GitHub

ensae_teaching_cs.faq.faq_web.webhtml(url, navigator='opera', fLOG=<function noLOG>)[source]

Uses the module selenium to retrieve the html content of a website.

Paramètres
  • url – url

  • navigator – firefox, chrome, (ie: does not work well)

  • fLOG – logging function

Renvoie

list of [ ( url, html) ]

Check the list of available webdriver at selenium/webdriver and add one to the code if needed.

source on GitHub

ensae_teaching_cs.faq.faq_web.webshot(img, url, navigator='opera', add_date=False, size=None, fLOG=<function noLOG>)[source]

Uses the module selenium to take a picture of a website. If url and img are lists, the function goes through all the urls and save webshots.

Paramètres
  • img – list of image names

  • url – url

  • navigator – firefox, chrome, (ie: does not work well)

  • add_date – add a date to the image filename

  • size – to resize the webshot (if not None)

  • fLOG – logging function

Renvoie

list of [ ( url, image name) ]

Check the list of available webdriver at selenium/webdriver and add one to the code if needed.

Chrome requires the chromedriver. See function install_chromedriver.

source on GitHub