Hackathon Helpers#
2018#
Functions about images#
ensae_projects.hackathon.image_helper.enumerate_batch_features
(folder, batch_or_image = False)
Enumerates all batches saved in a folder.
ensae_projects.hackathon.image_helper.enumerate_image_class
(folder, abspath = True, ext = {‘.png’, ‘.jpg’})
Lists all images in one folder assuming subfolders indicates the class of each image belongs to.
ensae_projects.hackathon.image_helper.folder_split_train_test
(src_folder, dest_train, dest_test, seed = None, ext = {‘.png’, ‘.jpg’}, test_size = 0.25)
Splits images from a folder into train and test. The function saves images into two separate folders.
ensae_projects.hackathon.image_helper.histogram_image_size
(folder, ext = {‘.png’, ‘.jpg’})
Computes the distribution of images size.
ensae_projects.hackathon.image_helper.img2gray
(img, mode = ‘L’)
Converts an image (PIL) to gray scale.
ensae_projects.hackathon.image_helper.image_zoom
(img, new_size, kwargs)
Resizes an image (from PIL).
ensae_projects.hackathon.image_helper.last_element
(iter)
Returns the last element of sequence assuming they were generated by an iterator or a generator.
ensae_projects.hackathon.image_helper.load_batch_features
(batch_file)
Loads a batch file saved by
stream_image2features
.
ensae_projects.hackathon.image_helper.plot_gallery_random_images
(folder, n = 12, seed = None, ext = {‘.png’, ‘.jpg’}, kwargs)
Plots a gallery of images using matplotlib. Extracts a random sample from a folder which contains many images. Relies on fonction
enumerate_image_class
. Calls plot_gallery_images to build the gallery.
ensae_projects.hackathon.image_helper.read_image
(filename_or_bytes)
Reads an image.
ensae_projects.hackathon.image_helper.stream_apply_image_transform
(src_folder, dest_folder, transform, ext = {‘.png’, ‘.jpg’}, fLOG = None)
Applies a transform on every image in a folder, saves it in another one. It keeps the same subfolders.
ensae_projects.hackathon.image_helper.stream_copy_images
(src_folder, dest_folder, valid, ext = {‘.png’, ‘.jpg’}, fLOG = None)
Copies all images from src_folder to dest_folder if valid(name) is True.
ensae_projects.hackathon.image_helper.stream_download_images
(urls, dest_folder, fLOG = None, use_request = None, skipif_done = True, dummys = None, skip = 0)
Downloads images based on their urls.
ensae_projects.hackathon.image_helper.stream_image2features
(src_folder, dest_folder, transform, batch_size = 1000, prefix = ‘batch’, ext = {‘.png’, ‘.jpg’}, fLOG = None)
ensae_projects.hackathon.image_helper.stream_random_sample
(folder, n = 1000, seed = None, abspath = True, ext = {‘.png’, ‘.jpg’})
Extracts a random sample from a folder which contains many images. Relies on fonction
enumerate_image_class
.
Some of these functions are used in notebook Image et doublons. Many examples can be found in unit test test_image.py.
Functions or classes to analyse#
ensae_projects.hackathon.image_knn.ImageNearestNeighbors
(self, transform = ‘gray’, image_size = (10, 10), kwargs)
Builds a model on the top of NearestNeighbors in order to find close images.
Functions about performance#
ensae_projects.hackathon.perf2018.MLStoragePerf2018
(self, storage, examples, cache_file = ‘cache_file.csv’)
Computes the performances the a hackathon.
ensae_projects.hackathon.perf2018.MLStoragePerf2018Image
(self, storage, examples, cache_file = ‘cache_file.csv’)
Overloads compute_perf for images. Example of use…
2017#
ensae_projects.hackathon.extract_images_from_json_2017
(filename, encoding = None, fLOG = <function noLOG at 0x7fb16874a3a0>)
Extracts fields from a JSON files such as images.
ensae_projects.hackathon.resize_image
(filename_or_bytes, maxdim = 512, dest = None, format = None)
Resizes an image until one of its dimension becomes smaller than maxdim after dividing the dimensions by two many times.
2016#
ensae_projects.ml.competitions.AUC
(answers, scores)
Compute the AUC.
ensae_projects.ml.competitions.AUC_multi
(answers, scores, ignored = None)
Compute the AUC.
ensae_projects.ml.competitions.AUC_multi_multi
(nb, answers, scores, ignored = None)
Compute the AUC.
2015#
ensae_projects.datainc.change_encoding
(infile, outfile, enc1, enc2 = ‘utf-8’, process = None, fLOG = <function noLOG at 0x7fb16874a3a0>)
Changes the encoding of a text file and removes quotes. By default process is
process_line()
.
ensae_projects.datainc.change_encoding_improve
(infile, outfile, enc1, enc2 = ‘utf-8’, process = None, fLOG = <function noLOG at 0x7fb16874a3a0>)
Changes the encoding of a text file, removes quotes. By default process is
process_line()
but the function has access to the distribution of the number of columns in the previous lines.
ensae_projects.datainc.clean_column_name_sql_dump
(i, line, hist, sep = ‘;’)
Removes quotes in a line which looks like…
ensae_projects.datainc.convert_dates
(sd, option = None, exc = False)
Converts a string into a date.
ensae_projects.datainc.enumerate_text_lines
(filename, sep = ‘ ‘, encoding = ‘utf-8’, quotes_as_str = False, header = True, clean_column_name = None, convert_float = False, option = None, skip = 0, take = -1, fLOG = <function noLOG at 0x7fb16874a3a0>)
Enumerates all lines from a text file and does some cleaning (see the list of parameters).