.. _imagesgetsrst: =============================== Récupération d’images avec Bing =============================== .. only:: html **Links:** :download:`notebook `, :downloadlink:`html `, :download:`PDF `, :download:`python `, :downloadlink:`slides `, :githublink:`GitHub|_doc/notebooks/hackathon_2018/images_gets.ipynb|*` Material for the hackathon ENSAE / BRGM / 2018. Les images sont extraites de tweets mais sont retweetées sans être retweetées. .. code:: ipython3 %matplotlib inline import matplotlib.pyplot as plt .. code:: ipython3 from jyquickhelper import add_notebook_menu add_notebook_menu() Récupération d’images --------------------- Télécharger des images depuis ImageNet ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sources possibles: `ImageNet `__ .. code:: ipython3 from ensae_projects.hackathon.image_helper import stream_download_images dest_folder = "c:/temp/suricatenat_images/imagenet7" res = list(stream_download_images("c:/temp/suricatenat_images/imagenet.synset7.txt", dest_folder, fLOG=print, skip=100)) len(res) Télécharger des images depuis Bing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ On peut s’inscrire sur l’\ `API Bing `__ ou télécharger quelques images. Les exemples suivant ont été récupérant en sauvant la page `inondations 2016 `__ après avoir fait défiler les résultats pour en afficher beaucoup. .. code:: ipython3 from ensae_projects.hackathon.web_search_helper import extract_bing_result urls = extract_bing_result("c:/temp/suricatenat_clean/urls/small bridge france - Bing images.html") print(len(urls)) urls[:5] .. parsed-literal:: 735 .. parsed-literal:: ['https://d2v9y0dukr6mq2.cloudfront.net/video/thumbnail/NyjYWBnugijqylcxa/videoblocks-eiffel-tower-sunrise-timelapse-with-boats-on-seine-river-and-in-paris-france-view-from-grenelle-bridge_sohfmyy2w_thumbnail-small01.jpg', 'https://frank.itlab.us/france_2005/small_france/dsc_1739.jpg', 'http://www.nickbooth.id.au/Europe11/images/Honfleur/H-reflects.jpg', 'http://www.tauck.com/~/media/Images/Brand+Landing/Bridges/Bridges+Home+page+Banner/BridgesHome_ZS.ashx', 'https://frank.itlab.us/france_2005/small/jul_23_paris_eurosat.jpg'] Télécharger des images depuis une liste d’urls ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 from ensae_projects.hackathon.image_helper import stream_download_images got = list(stream_download_images(urls, dest_folder="c:/temp/suricatenat_clean/bing/", fLOG=print)) got[:5] .. parsed-literal:: [stream_download_images] ... 1/735: 'https://d2v9y0dukr6mq2.cloudfront.net/video/thumbnail/NyjYWBnugijqylcxa/videoblocks-eiffel-tower-sunrise-timelapse-with-boats-on-seine-river-and-in-paris-france-view-from-grenelle-bridge_sohfmyy2w_thumbnail-small01.jpg' [stream_download_images] error 403 for url 'http://www.nickbooth.id.au/Europe11/images/Honfleur/H-reflects.jpg'. [stream_download_images] error 404 for url 'http://www.tauck.com/~/media/Images/Brand+Landing/Bridges/Bridges+Home+page+Banner/BridgesHome_ZS.ashx'. [stream_download_images] wrong filename for url 'https://a0.muscache.com/im/pictures/0d274072-bb89-45d2-9188-20b9535a36c4.jpg?aki_policy=x_large'. [stream_download_images] wrong filename for url 'https://laughingsquid.com/wp-content/uploads/2017/09/a-logging-truck-turns-and-impressively-crosses-a-small-bridge-in-france-with-a-full-load.png?w=750'. [stream_download_images] error 503 for url 'http://stuffpoint.com/bridges-architectural-wonders-around-the-world/image/406101-bridges-architectural-wonders-around-the-world-lovely-bridge-reaching-a-small-landmass-in-the-bay-of.jpg'. [stream_download_images] wrong filename for url 'http://i41.photobucket.com/albums/e257/riruriru/Fronkreich%202013/DSC01594_small_zps6d259b99.jpg~original'. [stream_download_images] wrong filename for url 'https://a0.muscache.com/im/pictures/76fb1fb6-1b26-4c85-9e8e-af4f13bc15e1.jpg?aki_policy=x_large'. [stream_download_images] error 400 for url 'https://www.wallpaperup.com/uploads/wallpapers/2014/03/26/309890/0e773b5b7d8aa2f0a2bf1fb339ce61ee.jpg'. [stream_download_images] wrong filename for url 'http://i1.wp.com/romancolosseum.org/wp-content/uploads/2012/01/pont-du-gard.jpg?resize=500%2C375'. [stream_download_images] error 404 for url 'http://iphonewalls.net/wp-content/uploads/2016/03/Venice+Italy+Small+Bridge+iPhone+6+Wallpaper.jpg'. [stream_download_images] error 403 for url 'http://www.planetminecraft.com/files/resource_media/screenshot/1123/canyon-spawn_86247.jpg'. [stream_download_images] ... 101/735: 'https://i.dailymail.co.uk/i/pix/2015/10/13/18/2D5FF8F500000578-3270916-Originally_the_Carrick_a_Rede_Rope_Bridge_in_Northern_Ireland_on-m-83_1444757867849.jpg' [stream_download_images] cannot load image for url 'http://sca.salemsattic.com/lwblog/gallery/2/2007-02-09+Swiss+watching+construction+on+the+foot+bridge.jpg' due to cannot identify image file <_io.BytesIO object at 0x000001F8129F0FC0> [stream_download_images] fails for url 'https://pull-liveandinvestoverseas-liveandinvestove.netdna-ssl.com/wp-content/uploads/2017/03/paris-bridge-small.jpg' due to HTTPSConnectionPool(host='pull-liveandinvestoverseas-liveandinvestove.netdna-ssl.com', port=443): Max retries exceeded with url: /wp-content/uploads/2017/03/paris-bridge-small.jpg (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 11001] getaddrinfo failed')). [stream_download_images] wrong filename for url 'https://a0.muscache.com/im/pictures/51376541/33c949fc_original.jpg?aki_policy=x_large'. [stream_download_images] wrong filename for url 'http://i2.wp.com/traveluto.com/wp-content/uploads/2016/03/pont_du_gard.jpg?resize=810%2C536'. [stream_download_images] wrong filename for url 'http://media.gettyimages.com/photos/construction-of-a-boat-bridge-over-the-small-rhine-french-army-in-picture-id859849280?s=612x612'. [stream_download_images] wrong filename for url 'http://i41.photobucket.com/albums/e257/riruriru/Fronkreich%202013/DSC01604_small_zps0380043e.jpg~original'. [stream_download_images] wrong filename for url 'https://i2.wp.com/www.wayfaringwithwagner.com/wp-content/uploads/2017/07/colmar-france-wayfaring-with-wagner-8.jpg?resize=1000%2C667&ssl=1'. [stream_download_images] wrong filename for url 'https://render.fineartamerica.com/images/rendered/small/tote-bag/images/artworkimages/medium/1/bridge-of-alexandre-iii-in-paris-france-anastasy-yarmolovich.jpg?transparent=0&targetx=-181&targety=0&imagewidth=1126&imageheight=763&modelwidth=763&modelheight=763&backgroundcolor=9DA2AC&orientation=0&producttype=totebag-18-18&imageid=3275803'. [stream_download_images] ... 201/735: 'https://www.telegraph.co.uk/content/dam/Travel/2017/February/procida-italy.jpg?imwidth=450' [stream_download_images] wrong filename for url 'https://www.telegraph.co.uk/content/dam/Travel/2017/February/procida-italy.jpg?imwidth=450'. [stream_download_images] wrong filename for url 'http://images.adsttc.com/media/images/5327/bbb2/c07a/805c/d800/033c/large_jpg/IMG_9818-9.jpg?1395112873'. [stream_download_images] error 404 for url 'http://www.ventanasvoyage.com/images/France+selects/bridge_small.jpg'. [stream_download_images] wrong filename for url 'http://cdn.newsapi.com.au/image/v1/1cc6dfda05ccc935fdac0b9157038216?width=1024'. [stream_download_images] error 404 for url 'http://www.preservationpa.org/uploads/historic+properties+for+sale/Compass-Mill_Lititz_for-sale_R.Harris.jpg'. [stream_download_images] wrong filename for url 'https://images.adsttc.com/media/images/58ed/1d1e/e58e/ce8f/0300/030e/large_jpg/FRONT_RENDERING_2.jpg?1491934486'. [stream_download_images] error 404 for url 'https://www.orbitz.com/blog/wp-content/uploads/2015/08/Annecy.jpg'. [stream_download_images] wrong filename for url 'https://i1.wp.com/travelsfrance.com/wp-content/uploads/2016/05/Carbonnière-Tower-Camargue-B.jpg?fit=1064%2C710&ssl=1'. [stream_download_images] wrong filename for url 'https://media-cdn.sygictraveldata.com/media/800x600/612664395a40232133447d33247d3835333831393034'. [stream_download_images] cannot load image for url 'http://sca.salemsattic.com/lwblog/gallery/2/2007-02-09+foot+bridge+over+the+Rhein+from+Germany+to+France+shore.jpg' due to cannot identify image file <_io.BytesIO object at 0x000001F812D22CA8> [stream_download_images] wrong filename for url 'https://s-media-cache-ak0.pinimg.com/236x/79/b8/e6/79b8e64e1854dd69a04bb276fc0bc73f.jpg?noindex=1'. [stream_download_images] error 400 for url 'https://www.wallpaperup.com/uploads/wallpapers/2013/02/15/40252/c0906da8d3d30169c38c44c7cb07de2c.jpg'. [stream_download_images] wrong filename for url 'http://i41.photobucket.com/albums/e257/riruriru/Fronkreich%202013/DSC01596_small_zpsa8d0f4b3.jpg~original'. [stream_download_images] ... 301/735: 'https://render.fineartamerica.com/images/rendered/small/poster/images-square-real-5/1-valentre-bridge-in-cahors-france-elena-elisseeva.jpg' [stream_download_images] error 400 for url 'https://www.wallpaperup.com/uploads/wallpapers/2014/12/21/560794/1ac7056f6b90748b083fba5cbee8f601-700.jpg'. [stream_download_images] wrong filename for url 'https://cdn.theatlantic.com/assets/media/img/2015/07/29/RTR4ILKC/1920.jpg?1438183123'. [stream_download_images] wrong filename for url 'https://qph.fs.quoracdn.net/main-qimg-164aa27c9198a9eff4c036c56849340e-c'. [stream_download_images] wrong filename for url 'https://sheerluxe.com/sites/default/files/styles/sl_free_responsive/public/media/2018/11/provision.jpg?itok=ZfO_9gsd'. [stream_download_images] wrong filename for url 'https://www.vangoghmuseum.nl/download/c82c2fcd-90e8-4de6-b23d-f28c2af907d6.jpg?size=m'. [stream_download_images] wrong filename for url 'http://media.gettyimages.com/photos/small-boat-passes-under-pegasus-bridge-normandy-france-europe-site-of-picture-id599378690?s=612x612'. [stream_download_images] wrong filename for url 'https://surfaceview.s3.amazonaws.com/spree/products/187/small/TRU0016-AP-E.jpg?1408016791'. [stream_download_images] wrong filename for url 'http://media.gettyimages.com/photos/normandy-france-picture-id166836920?s=170667a'. [stream_download_images] wrong filename for url 'https://a0.muscache.com/im/pictures/e5a8895a-47c0-40eb-a8ae-94cfd5039607.jpg?aki_policy=x_large'. [stream_download_images] ... 401/735: 'http://www.eutouring.com/place_dauphine_m13_DSC01821FP_lrg.JPG' [stream_download_images] wrong filename for url 'https://qph.ec.quoracdn.net/main-qimg-ff7fe9b391213392474d394c3245f3ca-c?convert_to_webp=true'. [stream_download_images] wrong filename for url 'https://surfaceview.s3.amazonaws.com/spree/products/179/small/BLI0124-AP-E.jpg?1408016006'. [stream_download_images] fails for url 'https://frenchmoments.eu/wp-content/uploads/2015/05/Pont-de-la-Concorde-04-©-French-Moments.jpg' due to 'ascii' codec can't encode character '\xa9' in position 55: ordinal not in range(128). [stream_download_images] cannot load image for url 'http://www.mountain-lifestyle.com/images/graphics/small/ski-area-planards-pistes-winter-chamonix-01.jpg' due to cannot identify image file <_io.BytesIO object at 0x000001F812D22990> [stream_download_images] wrong filename for url 'http://il4.picdn.net/shutterstock/videos/4559468/thumb/1.jpg?i10c=img.resize(height:160)'. [stream_download_images] wrong filename for url 'https://hikeminded.files.wordpress.com/2018/11/dsc_0240.jpg?w=1200'. [stream_download_images] error 404 for url 'http://www.gagnac-sur-cere.com/photos/Millau+Viaduct.jpg'. [stream_download_images] ... 501/735: 'https://i.pinimg.com/736x/95/7f/d3/957fd3418f8bf8bfc09dd17eaac063c5--garden-ponds-koi-ponds.jpg' [stream_download_images] wrong filename for url 'https://render.fineartamerica.com/images/rendered/small/flat/weekender-tote-bag/images-medium-5/2-valentre-bridge-in-cahors-france-elena-elisseeva.jpg?transparent=0&targetx=0&targety=-328&imagewidth=779&imageheight=1163&modelwidth=779&modelheight=506&backgroundcolor=28190E&orientation=0&producttype=totebagweekender-24-16-white'. [stream_download_images] wrong filename for url 'https://render.fineartamerica.com/images/rendered/small/shower-curtain/images-medium-5/1-valentre-bridge-in-cahors-france-elena-elisseeva.jpg?transparent=0&targetx=-218&targety=0&imagewidth=1223&imageheight=819&modelwidth=787&modelheight=819&backgroundcolor=CADFF3&orientation=0&producttype=showercurtain'. [stream_download_images] wrong filename for url 'http://media.gettyimages.com/photos/small-bridge-over-a-canal-by-the-chateau-de-chenonceau-on-the-river-picture-id558031533?s=594x594'. [stream_download_images] wrong filename for url 'https://render.fineartamerica.com/images/rendered/small/greeting-card/images/artworkimages/medium/1/old-stone-bridge-in-france-menega-sabidussi.jpg?transparent=0&targetx=0&targety=-12&imagewidth=700&imageheight=525&modelwidth=700&modelheight=500&backgroundcolor=70500B&orientation=0&producttype=greetingcard&imageid=50379'. [stream_download_images] wrong filename for url 'https://smallhousebliss.files.wordpress.com/2015/03/covert-cabin-woodsman-exterior4-via-smallhousebliss.jpg?w=1200'. [stream_download_images] ... 601/735: 'http://n450v.alamy.com/450v/bgx4e4/woman-bicycle-small-bridge-canal-du-midi-trebes-by-carcassonne-aude-bgx4e4.jpg' [stream_download_images] wrong filename for url 'http://images.mentalfloss.com/sites/default/files/istock-496707858.jpg?resize=1100x740'. [stream_download_images] wrong filename for url 'https://render.fineartamerica.com/images/rendered/small/flat/pouch/images-medium-5/2-valentre-bridge-in-cahors-france-elena-elisseeva.jpg?transparent=0&targetx=0&targety=-343&imagewidth=777&imageheight=1160&modelwidth=777&modelheight=474&backgroundcolor=28190E&orientation=0&producttype=pouch-regularbottom-medium'. [stream_download_images] wrong filename for url 'https://render.fineartamerica.com/images/rendered/small/flat/weekender-tote-bag/images-medium-5/1-valentre-bridge-in-cahors-france-elena-elisseeva.jpg?transparent=0&targetx=0&targety=-7&imagewidth=779&imageheight=521&modelwidth=779&modelheight=506&backgroundcolor=CADFF3&orientation=0&producttype=totebagweekender-24-16-white'. [stream_download_images] wrong filename for url 'https://cdn-image.travelandleisure.com/sites/default/files/styles/1600x1000/public/1465330137/Infinity-Pool-Viking-CRUISE0516.jpg?itok=2TUiXPrV'. [stream_download_images] fails for url 'https://frenchmoments.eu/wp-content/uploads/2015/07/Quais-de-la-Seine-©-French-Moments-Paris-34.jpg' due to 'ascii' codec can't encode character '\xa9' in position 50: ordinal not in range(128). [stream_download_images] wrong filename for url 'https://assets.atlasobscura.com/media/W1siZiIsInVwbG9hZHMvcGxhY2VfaW1hZ2VzLzQ2ZjRjMzZkZjk1NWE1OTczOF80Mjg5NTUxMDE3XzQ3MjBlZTdmMTFfYi5qcGciXSxbInAiLCJ0aHVtYiIsIngzOTA-Il0sWyJwIiwiY29udmVydCIsIi1xdWFsaXR5IDgxIC1hdXRvLW9yaWVudCJdXQ'. [stream_download_images] error 400 for url 'https://www.wallpaperup.com/uploads/wallpapers/2014/12/21/560794/1ac7056f6b90748b083fba5cbee8f601.jpg'. [stream_download_images] wrong filename for url 'https://laughingsquid.com/wp-content/uploads/2017/09/a-logging-truck-turns-and-impressively-crosses-a-small-bridge-in-france-with-a-full-load.gif?w=750'. [stream_download_images] ... 701/735: 'https://2.bp.blogspot.com/-7wREwzl42q4/V9awba-EwFI/AAAAAAAAIAs/QhvzwWEyMFgb6_7NUtWjdNEZ3k2Csq8GwCEw/s1600/Spring%2Bon%2Bthe%2BRidge%2B1200.jpg' [stream_download_images] wrong filename for url 'http://newimages.yachtworld.com/resize/1/40/38/4904038_20141231070343240_1_XLARGE.jpg?f=/1/40/38/4904038_20141231070343240_1_XLARGE.jpg&w=5315&h=2762&t=1420009880000'. [stream_download_images] wrong filename for url 'https://frankeber.files.wordpress.com/2011/08/pont-des-arts-iledelacitc3a9-2011-by-frankeber.jpg?w=625'. [stream_download_images] error 403 for url 'http://www.infrancia.org/parigi/paris/paris/images/pontdelaseine_mirabeau.jpg'. [stream_download_images] wrong filename for url 'http://www.driverguidefrance.com/sites/default/files/styles/slide-tour_960x360/public/Chinon%20bridge%20Private%20guided%20tours%20to%20the%20Loire%20valley%20by%20car%20or%20minibus%20from%20Paris%20for%20individuals%20or%20small%20groups%20by%20Driver%20Guide%20France.jpg?itok=A5IveabR'. [stream_download_images] error 400 for url 'http://www.fastdates.com/PitBoardFeatures/2005EdelweissFrance/France06.050bridge864.jpg'. [stream_download_images] fails for url 'http://traveltalesoflife.com/wp-content/uploads/2015/04/img_6500-900x675.jpg' due to HTTPConnectionPool(host='127.0.0.1', port=80): Max retries exceeded with url: / (Caused by NewConnectionError(': Failed to establish a new connection: [WinError 10061] Aucune connexion n’a pu être établie car l’ordinateur cible l’a expressément refusée')). [stream_download_images] error 400 for url 'https://www.wallpaperup.com/uploads/wallpapers/2014/12/21/560794/1ac7056f6b90748b083fba5cbee8f601-500.jpg'. [stream_download_images] wrong filename for url 'https://i0.wp.com/www.annees-de-pelerinage.com/wp-content/uploads/2017/07/colmar-france.jpg?fit=1200%2C800'. [stream_download_images] wrong filename for url 'https://i2.wp.com/www.wayfaringwithwagner.com/wp-content/uploads/2017/07/colmar-france-wayfaring-with-wagner-8.jpg?ssl=1'. .. parsed-literal:: ['videoblocks-eiffel-tower-sunrise-timelapse-with-boats-on-seine-river-and-in-paris-france-view-from-grenelle-bridge_sohfmyy2w_thumbnail-small01.jpg', 'dsc_1739.jpg', 'jul_23_paris_eurosat.jpg', 'Bridge_Stowe_Landscape_Gardens_BY_ROBERT_KILPIN.jpg', 'falais_03.jpg'] Echantillon aléatoire ~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 from ensae_projects.hackathon.image_helper import last_element, stream_random_sample rnd = last_element(stream_random_sample("c:/temp/suricatenat_images/")) rnd[:5] .. code:: ipython3 len(rnd) .. code:: ipython3 from ensae_projects.hackathon.image_helper import plot_gallery_random_images ax, rnd = plot_gallery_random_images("c:/temp/suricatenat_images/") ax; .. code:: ipython3 rnd .. code:: ipython3 ax, rnd = plot_gallery_random_images("c:/temp/suricatenat_images/imagenet4/") ax; .. code:: ipython3 rnd Renommer les images de l’échantillon ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 from ensae_projects.hackathon.image_helper import enumerate_image_class ech = list(enumerate_image_class("c:/temp/sample_labelled/")) .. code:: ipython3 len(ech) .. parsed-literal:: 309 .. code:: ipython3 import os echnames = set(os.path.split(n[0])[-1] for n in ech) list(sorted(echnames))[:5] .. parsed-literal:: ['1051720569702555648_Dph2SGiXUAAImpp.jpg', '1051787907109974016_Dpizg3kW4AAdpaM.jpg', '1051812613062098946_DpjJ5NWXUAAtfeR.jpg', '1051914579549282309_DpkmtpOX4AAunhI.jpg', '106994_5349_big_200907_voyager11.jpg'] .. code:: ipython3 images = list(enumerate_image_class("c:/temp/suricatenat_clean/")) .. code:: ipython3 rename = [] for img, sub in images: name = os.path.split(img)[-1] if name in echnames: rename.append(img) rename[:5] .. parsed-literal:: ['c:/temp/suricatenat_clean/bing\\1200px-Chateau_de_Chenonceau_2008E.jpg', 'c:/temp/suricatenat_clean/bing\\1970699-presse-papiers-1.jpg', 'c:/temp/suricatenat_clean/bing\\2243060331_6496d792a2_o_d.jpg', 'c:/temp/suricatenat_clean/bing\\28080031282_6d00c773d6_b.jpg', 'c:/temp/suricatenat_clean/bing\\28183949335_04578a6bdd_b.jpg'] .. code:: ipython3 len(rename) .. parsed-literal:: 306 .. code:: ipython3 for img in rename: os.rename(img + "~", img + ".ech") Split train test ~~~~~~~~~~~~~~~~ .. code:: ipython3 from ensae_projects.hackathon.image_helper import folder_split_train_test tr, te = folder_split_train_test("c:/temp/sample_labelled/", "c:/temp/sample_labelled_train/", "c:/temp/sample_labelled_test/", test_size=0.66) len(tr), len(te) .. parsed-literal:: (105, 204)