module datainc.data_cresus

Short summary

module ensae_projects.datainc.data_cresus

Script to process the date from Cresus for the hackathon 2016

source on GitHub

Functions

function truncated documentation
cresus_dummy_file  
prepare_cresus_data Prepares the data for the challenge.
process_cresus_sql Processes the database sent by cresus and produces a list of flat files.
process_cresus_whole_process Processes the database from Cresus until it splits the data into two two sets of files.
split_train_test_cresus_data Splits the tables into two sets for tables (based on users).
split_XY_bind_dataset_cresus_data Splits XY for the blind set.

Documentation

Script to process the date from Cresus for the hackathon 2016

source on GitHub

ensae_projects.datainc.data_cresus.cresus_dummy_file()[source]
Returns:local filename

source on GitHub

ensae_projects.datainc.data_cresus.prepare_cresus_data(dbfile, outfold=None, fLOG=<function fLOG>)[source]

Prepares the data for the challenge.

Parameters:
  • dbfile – database file
  • outfold – output folder
  • fLOG – logging function
Returns:

dictionary of table files

source on GitHub

ensae_projects.datainc.data_cresus.process_cresus_sql(infile, out_clean_sql=None, outdb=None, fLOG=<function fLOG>)[source]

Processes the database sent by cresus and produces a list of flat files.

Parameters:
  • infile – dump of a sql database
  • out_clean_sql – filename which contains the cleaned sql
  • outdb – sqlite3 file (removed if it exists)
  • fLOG – logging function
Returns:

dataframe with a list

source on GitHub

ensae_projects.datainc.data_cresus.process_cresus_whole_process(infile, outfold, ratio=0.2, fLOG=<function fLOG>)[source]

Processes the database from Cresus until it splits the data into two two sets of files.

source on GitHub

ensae_projects.datainc.data_cresus.split_XY_bind_dataset_cresus_data(filename, fLOG=<function fLOG>)[source]

Splits XY for the blind set.

Parameters:
  • filename – table to split
  • fLOG – logging function
Returns:

dictionary of created files

It assumes the targets are columns orientation, nature.

source on GitHub

ensae_projects.datainc.data_cresus.split_train_test_cresus_data(tables, outfold, ratio=0.2, fLOG=<function fLOG>)[source]

Splits the tables into two sets for tables (based on users).

Parameters:
  • tables – dictionary of tables, prepare_cresus_data
  • outfold – if not None, output all tables in this folder
  • fLOG – logging function
Returns:

couple of dictionaries of table files

source on GitHub