module faq.faq_web

Short summary

module ensae_teaching_cs.faq.faq_web

A few functions about scrapping

source on GitHub

Functions

function truncated documentation
_get_selenium_browser Returns the associated driver with some custom settings. The function automatically gets chromedriver if not present …
webhtml Uses the modules selenium to retrieve the html of a website (or the module …
webshot Uses the modules selenium to take a picture of a website (or the module …

Documentation

A few functions about scrapping

source on GitHub

ensae_teaching_cs.faq.faq_web._get_selenium_browser(navigator, fLOG=<function noLOG>)[source]

Returns the associated driver with some custom settings.

The function automatically gets chromedriver if not present (Windows only). On Linux, package chromium-driver should be installed: apt-get install chromium-driver.

Issue with Selenium and Firefox

Firefox >= v47 does not work on Windows. See Selenium WebDriver and Firefox 47.

Voir ChromeDriver download, Error message: “chromedriver” executable needs to be available in the path.

source on GitHub

ensae_teaching_cs.faq.faq_web.webhtml(url, navigator='opera', module='selenium', fLOG=<function noLOG>)[source]

Uses the modules selenium to retrieve the html of a website (or the module splinter - does not work with IE). The function was only tested with Firefox.

Paramètres:
  • url – url
  • navigator – firefox, chrome, (ie: does not work well)
  • module – module to use (selenium or splinter or None if you need to keep the first one available)
  • fLOG – logging function
Renvoie:

list of [ ( url, html) ]

Check the list of available webdriver at selenium/webdriver and add one to the code if needed.

source on GitHub

ensae_teaching_cs.faq.faq_web.webshot(img, url, navigator='opera', add_date=False, module='selenium', size=None, fLOG=<function noLOG>)[source]

Uses the modules selenium to take a picture of a website (or the module splinter - does not work with IE). The function was only tested with Firefox. If url and img are lists, the function goes through all the urls and save webshots.

Paramètres:
  • img – list of image names
  • url – url
  • navigator – firefox, chrome, (ie: does not work well)
  • add_date – add a date to the image filename
  • module – module to use (selenium or splinter or None if you need to keep the first one available)
  • size – to resize the webshot (if not None)
  • fLOG – logging function
Renvoie:

list of [ ( url, image name) ]

Check the list of available webdriver at selenium/webdriver and add one to the code if needed.

Chrome requires the chromedriver. See function install_chromedriver.

source on GitHub