Data Manipulation#
ensae_projects.datainc.data_bikes.add_missing_time
(df, column, values, delay = 10)
After aggregation, it usually happens that the series is sparse. This function adds rows for missing time.
ensae_projects.datainc.change_encoding_improve
(infile, outfile, enc1, enc2 = ‘utf-8’, process = None, fLOG = <function noLOG at 0x7f7bbb0bf700>)
Changes the encoding of a text file, removes quotes. By default process is
process_line()
but the function has access to the distribution of the number of columns in the previous lines.
ensae_projects.datainc.data_bikes.df_crossjoin
(df1, df2, kwargs)
Makes a cross join (cartesian product) between two dataframes by using a constant temporary key. Also sets a MultiIndex which is the cartesian product of the indices of the input dataframes. Source: Cross join / cartesian product between pandas DataFrames.
ensae_projects.hackathon.enumerate_json_items
(filename, encoding = None, fLOG = <function noLOG at 0x7f7bbb0bf700>)
Enumerates items from a JSON file or string.
ensae_projects.datainc.enumerate_text_lines
(filename, sep = ‘ ‘, encoding = ‘utf-8’, quotes_as_str = False, header = True, clean_column_name = None, convert_float = False, option = None, skip = 0, take = -1, fLOG = <function noLOG at 0x7f7bbb0bf700>)
Enumerates all lines from a text file and does some cleaning (see the list of parameters).
ensae_projects.datainc.data_geo_streets.shapely_records
(filename, kwargs)
Uses pyshp to return shapes and records from shapefiles.