module homeblog.table_formula#

Inheritance diagram of ensae_teaching_cs.homeblog.table_formula

Short summary#

module ensae_teaching_cs.homeblog.table_formula

Implements TableFormula.

source on GitHub

Classes#

class

truncated documentation

TableFormula

This class aims at representating a table, it provides some « SQL like » functionalities such groupby or innerjoin, select, …

Properties#

property

truncated documentation

dataframe

creates a pandas dataframe

json

returns a json format

np_array

returns the values as a numpy array

np_matrix

returns the values as a numpy matrix

shape

returns the size(nb rows, nb columns)

size

returns the size(nb rows, nb columns)

Static Methods#

staticmethod

truncated documentation

add_header_if_not_present

the function checks if the first line contains the column in header otherwise, it modifies the file and add them …

bootstrap

return a confidence interval for a statistics

correlation_bicolumn

assume values is a matrix with two columns

random_split_file

split a file in nb buckets by random(lines are sent to a random file as they come)

ratio

return a ratio between two real values or an empty string if the denominateur is null

save_multiple_as_excel

saves multiple table in one Excel file

Methods#

method

truncated documentation

__add__

do an addition, add values if types are matching

__copy__

operator copy

__deepcopy__

operator deepcopy

__eq__

check if two tables are equal by value

__getitem__

operator [], accepts slices

__html__

Converts the table into a html string.

__init__

It can either take a filename, an object TableFormula, a list of columns and values.

__iter__

iterator on all rows, it returns a dictionary { column:value }

__len__

returns the number of rows

__mul__

do a multiplication(by a number)

__rst__

convert the table into rst format

__setitem__

operator [], just accepts tuple(to change a value)

__str__

convert the table into a string

_auto_conversion

private set up the column type based on the column name

_fill_by_row

fill the table

_interpret_row

private returns each row as a dictionary { column_name:value }

_private_getclass

the class often creates another class of the same type, this function returns the class object

_read_file

private

_readlines

private

add_column

Adds a column.

add_column_cumulative

also called the Gini function Example:

add_column_index

Example:

add_column_recursive

Example:

add_column_recursive_row

Example:

add_column_smooth

Example:

add_column_vector

add a column defined by vector(list of values for each row)

addc

@see me add_column

aggregate

Example:

aggregate_column

Example:

avg_std

returns the average and standard deviation

center_reduce

center and reduce a set of columns(or all if columnsSet is None)

change_header

change the column names

concatenate

concatenates two tables by columns

copy

call copy.deepcopy(self)

correlation

Computes the correlation matrix, the first column will contains the column names.

correlation_col

Computes the correlation between two columns.

correlation_row

computes the correlation between two columns

covariance

Computes the covariance matrix, the first column will contains the column names.

covariance_col

Computes the correlation between two columns.

covariance_row

computes the correlation between two columns

create_index

this method creates an index, to get an indexes row, use method get Example:

dcast

@see me multiply_column_by_row_instance

delta

returns a list of differences between self and others

extract_columns

extract some columns

fill_sql_table

returns a Database object, creates the database if it does not exists, same for the table

filter

Example:

filter_quantile

sort all rows using criteria defined by function and remove rows at the extremes

get

use the index created by method create_index Example:

get_column_values

private returns all values for one column

get_distinct_values

private

groupby

Example:

groupby_implicit

use prefix of a column name to know which function to use as an aggregated(sum, avg, len, key, none, max, min) …

histogram

computes an histograms on one vector

histograms

computes a common histograms on all columns

innerjoin

modify_all

apply the same modification to every number

mu_sigma

computes the average and the standard deviation a vector of values

mu_sigma_each_column

returns a table with the average and the standard deviation for each columns

multiplication_term_term

do a multiplication term by term(similar to an addition), add values if types are matching

multiply_column_by_row_instance

random

select n random row from the table, returns a table

reduce_dict

@see me todict

remove_columns

remove some columns

rename_column

rename a column

replicate

replicates all rows a given number of times

save

saves the tables in a text file, first row is the column names

save_as_excel

saves the table as a new Excel file, you can use .xls or .xlsx if filename is None, the function returns …

schema_database

returns the schema for a database which would contains this database

select

sort

Example:

strtype

displays the type of values(not the values)

todict

convert the table as a dictionary { key:value } each of them is defined by functions.

transpose

Computes the transpose.

union

union_columns

computes the union of all values from all columns present in columnSet

values_to_date

converts all values into dates

values_to_float

converts all values into float

values_to_str

converts all values into str

where

@see me filter

Documentation#

@file @brief Implements TableFormula.

class ensae_teaching_cs.homeblog.table_formula.TableFormula(file, numeric_column=None, sep='\t', encoding=None, read_n_lines=-1, sheet=0, **options)#

Bases : _TableFormulaStat

This class aims at representating a table, it provides some « SQL like » functionalities such groupby or innerjoin, select, where… This was a custom implementation of a DataFrame before I discover pandas.

The class provides an easy to go through the row table by converting each row in a dictionary { column_name: value } on the run. Example:

tbl = TableFormula(...)
newtbl = tbl.filter(lambda v: v["criteria"] == 5)

See @see op __init__ for others ways to create a table.

@var header list of column names @var values list of rows(each row contains as many value as the number of columns) @var index dictionary { column name: position }, changing header means also changing header.

Example:

table = TableFormula("name d_a d_b d_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n"))
print(table)
print("one value 0,1:", table[0,1])

print("---------")
dist = table.get_distinct_values("name")
for k in sorted(dist): print("*%d: %s"%(int(dist[k]),k))

print("---------")
table.add_column("has_A", lambda v: 1. if "A" in v["name"] else 0.)
print(table)

x = 1./3
print("------------- smoothing", x)
table.add_column_smooth("has_A_smooth", lambda v: v["has_A"], [-1,0,1], [x,x,x])
print(table)

print("--------- filter")
fil = table.filter(lambda v: v["d_b"] == 2)
print(fil)

print("--------- random")
rnd = table.random(5)
print(rnd)

print("--------- random unique")
rnd = table.random(1, True)
print(rnd)

print("--------- filter quantile")
fil = table.filter_quantile(lambda v: v["d_b"], 0, 0.4)
print(fil)

print("--------- aggregate_column")
total = table.aggregate(lambda v: v["d_c"])
print(total)

print("--------- sort")
table.sort(lambda v: v["d_b"] + v["d_c"])
print(table)

print("--------- union")
union = table.union(table)
print(union)

print("--------- group sum")
group = table.groupby(lambda v: v["name"],
                        [lambda v: v["d_a"],
                          lambda v: v["d_b"]],
                        ["name", "sum_d_a", "sum_d_b"])
print(group)

print("--------- group max")
groupmax = table.groupby(lambda v: v["name"],
                        [lambda v: v["d_a"],
                          lambda v: v["d_b"]],
                        ["name", "max_d_a", "max_d_b"],
                        [max, max])
print(groupmax)

print("--------- group sum with weights")
group = table.groupby(lambda v: v["name"],
                        [lambda v: v["d_a"]],
                        ["name", "weight", "sum_d_a"],
                        [lambda vec,w:  sum(vec) / w],
                        lambda v: v ["d_b"])

print("--------- innerjoin")
innerjoin = table.innerjoin(group, lambda v: v["name"],
                                   lambda v: v["name"], "group")
print(innerjoin)

print("------------- extraction")
ext = table.extract_columns(["name", "d_a"])
print(ext)

print("------------- remove")
ext = table.remove_columns(["d_a"])
print(ext)

print("------------- todict")
d = table.todict(lambda v: v["name"], lambda v: v["d_b"], True)
print(d)

print("------------- select")
d = table.select(lambda v:(v["name"], v["d_b"]))
print(list(d))

print("------------- use of an index")
table.create_index(lambda v:(v["name"], v["d_a"]))
row = table.get(('A', 1.1))
print(row)
value = table.get(('A', 1.1), 2)
print(value)

print("------------- multiply_column_by_row_instance ")
table = TableFormula("name d_a d_b d_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n"))
table.add_column("key_add", lambda v:"unique")
print(table)
mul = table.multiply_column_by_row_instance(
                    lambda v: v["key_add"],
                    lambda v: v["name"])
print(mul)

if os.path.exists("BNP.PA.txt"):
    print("--------------- financial stock")
    table = TableFormula("BNP.PA.txt", sep=",")
    table.sort(lambda v: v["Date"])
    print(table[:10])

print("--------------- groupby_implicit")
table = TableFormula("key_name sum_a len_b avg_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n"))
print(table)
gr = table.groupby_implicit(lambda v: v ["key_name"])
print(gr)

print("--------------- covariance")
values = [random.random() for i in range(0,100)]
values = [[x, x + random.random()/2] for x in values]
tbl = TableFormula(["x", "y"], values).values_to_float()
cov = tbl.covariance()
print(cov)

print("--------------- histogram")
hist = tbl.histogram(lambda v:(v["x"],1), 10)
print(hist)

print("--------------- histogram")
hist = tbl.values_to_float().histograms(["x", "y"], 10)
print(hist)

print("--------------- unions of columns")
hist = tbl.values_to_float().union_columns(["x", "y"])
print(hist)

It can either take a filename, an object TableFormula, a list of columns and values.

Paramètres:
  • file – filename or a list of column names or a dictionary, file can also be a pandas DataFrame.

  • numeric_column – depends on file types(see below examples)

  • sep – column separator if file is a filename

  • read_n_lines – read the first n lines(or all if it is -1)

  • sheet – in case the file is an Excel file, this parameter precises the sheet number or name

  • suffix_nb – if True, adds an integer to the column name if it is a duplicate

Example:

table = TableFormula("name d_a d_b d_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n"))

or

table = TableFormula("file.txt", ["nb"])

or

table  = TableFormula(["date", "Y", "Y2", "xl"], values)

or

data = [{ "one":1, "two":2 }, {"two":2.1, "three":3 }]
table = TableFormula(data)

or

data = { 1:{ "one":2.3, "two":2.2 }, 2:{"one":2.1, "two":3 }
table = TableFormula("__byrow__", data)

or

table = TableFormula(numpy.matrix(…))

or

table = TableFormula(numpy.array(...))

@warning In this second case, rows and header are not copied.

__add__(other)#

do an addition, add values if types are matching :param other: matrix or float or string :return: new matrix, keep the header of the first matrix

__copy__()#

operator copy

__deepcopy__(memo)#

operator deepcopy

__eq__(other)#

check if two tables are equal by value :param other: other table :return: boolean

__getitem__(irow)#

operator [], accepts slices :param irow: integer, tuple, slice or list :return: depends on irow

  • int –> a table with one row

  • slice –> a table with several rows

  • list –> a table with the selected rows

  • tuple –> a value

__hash__ = None#
__html__(class_table=None, class_td=None, class_tr=None, class_th=None)#

Converts the table into a html string.

Paramètres:
  • class_table – adds a class to the tag table (None for none)

  • class_td – adds a class to the tag td (None for none)

  • class_tr – adds a class to the tag tr (None for none)

  • class_th – adds a class to the tag th (None for none)

__init__(file, numeric_column=None, sep='\t', encoding=None, read_n_lines=-1, sheet=0, **options)#

It can either take a filename, an object TableFormula, a list of columns and values.

Paramètres:
  • file

    filename or a list of column names or a dictionary, file can also be a pandas DataFrame.

  • numeric_column – depends on file types(see below examples)

  • sep – column separator if file is a filename

  • read_n_lines – read the first n lines(or all if it is -1)

  • sheet – in case the file is an Excel file, this parameter precises the sheet number or name

  • suffix_nb – if True, adds an integer to the column name if it is a duplicate

Example:

table = TableFormula("name d_a d_b d_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n"))

or

table = TableFormula("file.txt", ["nb"])

or

table  = TableFormula(["date", "Y", "Y2", "xl"], values)

or

data = [{ "one":1, "two":2 }, {"two":2.1, "three":3 }]
table = TableFormula(data)

or

data = { 1:{ "one":2.3, "two":2.2 }, 2:{"one":2.1, "two":3 }
table = TableFormula("__byrow__", data)

or

table = TableFormula(numpy.matrix(…))

or

table = TableFormula(numpy.array(...))

@warning In this second case, rows and header are not copied.

__iter__()#

iterator on all rows, it returns a dictionary { column:value } @return dictionary

__len__()#

returns the number of rows

__mul__(other)#

do a multiplication(by a number) :param other: matrix or float or string :return: new matrix, keep the header of the first matrix

__rst__(add_line=True)#

convert the table into rst format

+------------------------+------------+----------+----------+
| Header row, column 1   | Header 2   | Header 3 | Header 4 |
| (header rows optional) |            |          |          |
+========================+============+==========+==========+
| body row 1, column 1   | column 2   | column 3 | column 4 |
+------------------------+------------+----------+----------+
| body row 2             | ...        | ...      |          |
+------------------------+------------+----------+----------+
Paramètres:

add_line – add a line separator between each row

__setitem__(irow, value)#

operator [], just accepts tuple(to change a value) :param irow: 2-uple :param value: new value

__str__()#

convert the table into a string :return: string

_auto_conversion(others_columns)#

private set up the column type based on the column name

_fill_by_row(values)#

fill the table :param values: dictionary { <int_row_index>: { <column name>: value} }

_interpret_row(row)#

private returns each row as a dictionary { column_name:value }

_private_getclass()#

the class often creates another class of the same type, this function returns the class object

_read_file(file, numeric_column, sep, encoding, read_n_lines, sheet=0)#

private

_readlines(lines, numeric_column, sep)#

private

add_column(colname, function, position=-1)#

Adds a column. :param colname: column name or columns name if it is a list or a tuple :param function: function which will gives the values(or a list of functions, or a function which return a tuple) :param position: where to insert the column, -1 for the end

Example:

table.add_column("has_A", lambda v: 1 if "A" in v["name"] else 0, 0)

table.add_column(("has_A", "has_B"),(lambda v: 1 if "A" in v["name"] else 0,
                                        lambda v: 1 if "B" in v["name"] else 0))

table.add_column(("has_A", "has_B"),(lambda v:(1 if "A" in v["name"] else 0, 1 if "B" in v["name"] else 0))
add_column_cumulative(column_index, column_name, functionIndex, functionValue, normalize=False, reverse=False, cumulative=True, functionSort=None)#

also called the Gini function Example:

table.add_column_cumulative("index_%s" % col, "dist_%s" % col,
                            lambda v: v["sum_nbclient"], lambda v: v[col],
                            functionSort = lambda v: v [col] / v["sum_nbclient"],
                            normalize=True)
add_column_index(colname='index', start=0)#

Example:

table.add_column("index_row")
add_column_recursive(colname, functionValue, functionAgg)#

Example:

table.add_column_recursive(lambda v: v ["norm_%s" % loi],
                           lambda li, v: li[-1] + v)
add_column_recursive_row(colname, functionAgg)#

Example:

table.add_column_recursive_row("w_%s" % loi,
                lambda li, v: li[-1] + v ["norm_%s" % loi]                             if len(li)> 0 else v ["norm_%s" % loi])
add_column_smooth(colname, function, position, weights)#

Example:

x = 1./3
table.add_column_smooth("has_A_smooth", lambda v: v["has_A"], [-1,0,1], [x,x,x])
add_column_vector(colname, vector)#

add a column defined by vector(list of values for each row)

Paramètres:
  • colname – column to add

  • vector – (list) list of values to add to each row

:return self

static add_header_if_not_present(filename, header, encoding=None, logFunction=<function noLOG>)#

the function checks if the first line contains the column in header otherwise, it modifies the file and add them on the first line

@param filename filename @param header list of column name(all strings) @param encoding encoding @param logFunction use this function to log information about what is happening

addc(colname, function, position=-1)#

@see me add_column

aggregate(function, aggregated_function=<built-in function sum>)#

Example:

total = table.aggregate_column(lambda v: v["d_c"], len)
aggregate_column(colname, aggregated_function=<built-in function sum>)#

Example:

total = table.aggregate_column("d_c", sum)
avg_std(functionValue, functionWeight=<function TableFormula.<lambda>>)#

returns the average and standard deviation

static bootstrap(values, function, nbdraws=-1, alpha=0.05)#

return a confidence interval for a statistics @param values values @param function produces the statistics over a random set of observations chosen in values @param nbdraws number of draws, if it is equal to -1, is equal to len(values) @param alpha confidence level @return average, min, lower bound, higher bound, max

center_reduce(columnsSet=None, op=None, removeExtreme=None, mu_sigma=None)#

center and reduce a set of columns(or all if columnsSet is None) @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @param columnsSet set of column to deal with @param op if can be:

  • None: substract mean and normalize,

  • « mean »: substract mean only,

  • « norm »: normalize only

@param mu_sigma matrix with two rows(one for mean, second for sigma), if None,

if computes that from the matrix self, columns must have the same order that columnSet

@return the same table(with only the considered columns)

change_header(new_header)#

change the column names

Paramètres:

new_header – a list or a function which modifies the header

Example:

tbl.change_header(lambda h: h if h != "column" else "new_name")

Avertissement

Do not do that yourself, the class holds a dictionary up to date with the column index.

concatenate(table, addPrefix='')#

concatenates two tables by columns @param table table @param addPrefix add a prefix to each column from table @return table (with the same number of rows as the longest one)

copy()#

call copy.deepcopy(self)

correlation(useBootstrap=False, collapseFormat=True, nbdraws=-1, alpha=0.05, functionKeepValue=<function TableFormula.<lambda>>)#

Computes the correlation matrix, the first column will contains the column names.

@param useBootstrap if True, use a bootstrap method to estimate the correlation @param collapseFormat if True and useBootstrap is True, produces a format

average|lower bound|higher bound(at a definite confidence level)

@param nbdraws number of draws(if -1, then it will be equal to the number of observations) @param alpha confidence level @param functionKeepValue if collapseFormat is True, this function is used to collapse val,low,high in a single string @return new table

static correlation_bicolumn(values, deviations=False, noCenter=False)#

assume values is a matrix with two columns @param values 2 column matrix @param deviations if True, returns cor, sigma1, sigma2 @param noCenter if True, do not remove the average before computing the covariance,

it means we assume variables are already centered

@return correlation factor or correlation, sigma1, sigma2 if deviations is True

correlation_col(col1, col2, noCenter=False)#

Computes the correlation between two columns. @param col1 column 1 @param col2 column 2 @param noCenter does the computation without removing the average @return float(covariance)

correlation_row(row1, row2, noCenter=False)#

computes the correlation between two columns @param row1 row 1(integer) @param row2 row 2(integer) @param noCenter does the computation without removing the average @return float(covariance)

covariance()#

Computes the covariance matrix, the first column will contains the column names. @return new table

covariance_col(col1, col2, noCenter=False)#

Computes the correlation between two columns. @param col1 column 1 @param col2 column 2 @param noCenter does the computation without removing the average @return float(covariance)

covariance_row(row1, row2, noCenter=False)#

computes the correlation between two columns @param row1 row 1(integer) @param row2 row 2(integer) @param noCenter does the computation without removing the average @return float(covariance)

create_index(functionIndex)#

this method creates an index, to get an indexes row, use method get Example:

table.create_index(lambda v:(v["name"], v["d_a"]))
row = table.get(('A', 1.1))
value = table.get(('A', 1.1), 2)
property dataframe#

creates a pandas dataframe @return pandas.dataframe

dcast(functionKey, functionInstance, full=True)#

@see me multiply_column_by_row_instance

delta(other)#

returns a list of differences between self and others

Paramètres:

other – TableFormula

Renvoie:

list of differences(first one)

extract_columns(listColumns)#

extract some columns

@param listColumns list of columns to remove or a function

which returns True if the column has to be extracted based on its name

@return table

Example:

ext = table.extract_columns(["name", "d_a"])
fill_sql_table(filename_or_database, tablename, add_id='idr')#

returns a Database object, creates the database if it does not exists, same for the table

@param filename_or_database filename or Database object,

in that second case, we assume method connect was called before

@param tablename table name @param add_id if is not None, then the function adds an id, it first takes the

max(id) and goes on incrementing it;

@return Database object(new or the one from the parameters),

in both case, the database is not disconnected

filter(condition_function)#

Example:

fil = table.filter(lambda v: v["d_b"] == 2)

@warning Rows are not copied.

filter_quantile(function, alpha_min=0.025, alpha_max=0.025)#

sort all rows using criteria defined by function and remove rows at the extremes

@param function values used to estimate the quantiles @param alpha_min lower quantile @param alpha_max higher quantile @return a table containing all the rows where the criterium

is within the two quantiles

Example:

fil = table.filter_quantile(lambda v: v["d_b"], 0, 0.4)

@warning Rows are not copied.

get(rowIndex, column=None)#

use the index created by method create_index Example:

table.create_index(lambda v:(v["name"], v["d_a"]))
row = table.get(('A', 1.1))
value = table.get(('A', 1.1), 2)
get_column_values(col)#

private returns all values for one column

get_distinct_values(col)#

private

groupby(functionKey, functionsValue, columns=None, functionsAgg=None, functionWeight=None)#

Example:

group = table.groupby(lambda v: v["name"],
                        [lambda v: v["d_a"],
                          lambda v: v["d_b"]],
                        ["name", "sum_d_a", "sum_d_b"])

or

groupmax = table.groupby(lambda v: v["name"],
                        [lambda v: v["d_a"],
                          lambda v: v["d_b"]],
                        ["name", "max_d_a", "max_d_b"],
                        [max, max])
groupby_implicit(functionKey, functionWeight=None, logging=None)#

use prefix of a column name to know which function to use as an aggregated(sum, avg, len, key, none, max, min) Example:

group = table.groupby_implicit(lambda v: v["name"])
histogram(functionValue, nbDiv=100, secondColumnIsWeight=False, normalize=True, removeExtreme=0.05)#

computes an histograms on one vector @param functionValue function which produces the value to histogram @param nbDiv number of divisions for this histograms(boundaries are min and max) @param secondColumnIsWeight if True, the second column is the weight @param normalize if True, normalize by the sum of weights @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @return table with two columns

histograms(columnsSet, nbDiv=100, secondColumnIsWeight=False, normalize=True, removeExtreme=0.05, histxName='histKey')#

computes a common histograms on all columns @param columnsSet set of columns @param nbDiv number of divisions for this histograms(boundaries are min and max) @param secondColumnIsWeight if True, the second column is the weight @param normalize if True, normalize by the sum of weights @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @param histxName column name given to the x axis shared by every histogram @return table with two columns

@warning The function skips any NaN of Inf value.

innerjoin(table, functionKey1, functionKey2, nameKey='key', addSuffixAnyWay=False, prefixToAdd=None, full=False, keepKey=True, putKeyInColumn=None, missingValue=None, uniqueKey=False)#

@param table other table to join with @param functionKey1 key for the first table(a function) @param functionKey2 key for the second table(a function) innerjoin …. ON … @param addSuffixAnyWay add a suffix to every column from the second table even

if names are different(suffix is « + »)

@param prefixToAdd prefix to add the the columns of the second table @param full add all items even if there is no common keys(FULL OUTER JOIN),

otherwise keep only common keys

@param keepKey keep the key as a column in the result(column is key), otherwise not @param putKeyInColumn private parameter: keepKey has to be true and in this case,

put the key in this column(integer)

@param missingValue when there is not key on one side, this default value will be put in place @param uniqueKey if True, the function assumes there is a bijection between rows

and keys(one row <–> one key) on both tables, otherwise, it will not.

@return a table

Example:

innerjoin = table.innerjoin(group, lambda v: v["name"],
                                   lambda v: v["name"], "group")
property json#

returns a json format @return string

modify_all(modification_function)#

apply the same modification to every number @param modification_function modification to apply to every number @return new table

The signature of the function is the following one:

def function(value, column_name):
    # ....
    return new_value

Example:

tbl = tbl.modify_all(lambda v,c: {"string":"", "numerical":0}.get(c,None) if v is None else v)
mu_sigma(functionValues, removeExtreme=None)#

computes the average and the standard deviation a vector of values @param functionValues function produces the vector of values @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @return (average, standard deviation)

mu_sigma_each_column(columnsSet=None, removeExtreme=None)#

returns a table with the average and the standard deviation for each columns @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @param columnsSet set of column to deal with @return table with two rows: average and standard deviation

multiplication_term_term(other)#

do a multiplication term by term(similar to an addition), add values if types are matching

Paramètres:

other – matrix or float or string

Renvoie:

new matrix, keep the header of the first matrix

multiply_column_by_row_instance(functionKey, functionInstance, full=True)#

@param functionKey defines a key(function) @param functionInstance defines a second key(will be moved to the columns dimension) @param full introduces missing values for not found combinations @return a table

If a column contains a finite set of value, for example, we have the temperature for several cities organized like if it were a table from a database: city, date, temperatue. We would like to get another table where we have: date temparature_city1 temperature_city2…

Then we would type: Example:

mul = table.multiply_column_by_row_instance(
                    lambda v: v["date"],
                    lambda v: v["city"])

The input table would be like:

city   date
A      jan
A      feb
B      feb

It returns:

KEY     A|city      A|date  B|city  B|date
feb     A       feb     B       feb
jan     A       jan     None        None
property np_array#

returns the values as a numpy array @return numpy array

property np_matrix#

returns the values as a numpy matrix @return numpy matrix

random(n, unique=False)#

select n random row from the table, returns a table

@param n number of desired random rows @param unique draws unique rows or non unique rows

(tirage sans remise ou avec remise)

@return a table

Example:

rnd = table.random(10)
static random_split_file(filename, outfileprefix, nb, has_header=True, encoding=None, logFunction=<function noLOG>)#

split a file in nb buckets by random(lines are sent to a random file as they come)

@param filename filename to split @param nb number of buckets @param outfileprefix output files will start with outfileprefix + “%04d.txt” % i @param encoding encoding @param has_header the header will be replicated in each created file @param logFunction to display information @return list of created files

static ratio(x, y)#

return a ratio between two real values or an empty string if the denominateur is null @return a real of an empty string

reduce_dict(functionKey, functionValue, uselist=False)#

@see me todict

remove_columns(listColumns)#

remove some columns

@param listColumns list of columns to remove or a function

which returns True if the column has to be removed based on its name

@return table

Example:

rem = table.remove("d_a")
rename_column(old_name, new_name)#

rename a column

Paramètres:
  • old_name – old name

  • new_name – new name

replicate(times)#

replicates all rows a given number of times :param times: number of multiplication :return: new matrix, keep the header of the first matrix

save(filename, sep='\t', encoding=None, newline='\n')#

saves the tables in a text file, first row is the column names

Paramètres:
  • filename – filename

  • sep – column separator

  • encoding – encoding

  • newline – line separator

save_as_excel(filename, font='Calibri', sheet_name='sheet0', close=True, encoding=None)#

saves the table as a new Excel file, you can use .xls or .xlsx if filename is None, the function returns an object(xslx) and does not save it.

@param filename Excel filename @param sheet_name name of the sheet to add @param font font name @param close if True, close the file, otherwise, the user will have to @param encoding encoding @return object Workbook

static save_multiple_as_excel(filename, list_table, font='Calibri', close=True, encoding=None)#

saves multiple table in one Excel file

@param filename filename(can be None) @param list_table list of 2uple(« name », tbl) @param font font name @param close if True, close the file, otherwise, the user will have to @param encoding encoding @return object Workbook

schema_database(add_id=True)#

returns the schema for a database which would contains this database

@param add_id if True, adds an index « PRIMARYKEY » @return dictionary { index_column:(name, type) }

select(functionRow)#

@param functionRow fonction @return table

Example:

d = table.select(lambda v:(v["name"], v["d_b"]))
print(list(d))
property shape#

returns the size(nb rows, nb columns)

property size#

returns the size(nb rows, nb columns)

sort(functionValue, reverse=False)#

Example:

table.sort(lambda v: v["d_b"] + v["d_c"])
strtype()#

displays the type of values(not the values)

todict(functionKey, functionValue, useList=False)#

convert the table as a dictionary { key:value } each of them is defined by functions.

@param functionKey defines the key @param functionValue defines the value @param useList if there are multiple rows sharing the same key, it should be true,

all values are stored in a list

@return a dictionary { key:row } or { key: [row1, row2, …] }

Example:

d = table.todict(lambda v: v["name"], lambda v: v["d_b"], True)
transpose(labelC=None, labelAsRow=True)#

Computes the transpose. @param labelC proposes labels for the column,

if None, take « r%d » % i, if it is a string, the function assumes it is a column name

@param labelAsRow add the label as a row @return new table

union(table)#

@param table table @return table(with the same number of columns)

concatenates two tables by rows, they must have the same header, rows of both tables are merged into a single matrix Example:

union = table.union(table2)
union_columns(columnsSet)#

computes the union of all values from all columns present in columnSet @param columnsSet set of columns @return table

values_to_date(format=None, only_if_possible=False, subset_columns=None)#

converts all values into dates @param only_if_possible if True, converts all possible values and catches exception,

if False, converts everything, raises an exception when not possible

@param format date format see fn str_to_datetime @param subset_columns if None, takes all of them, otherwise, try to convert

only for the listed columns

@return table

values_to_float(only_if_possible=False, subset_columns=None)#

converts all values into float @param only_if_possible if True, converts all possible values and catches exception,

if False, converts everything, raises an exception when not possible

@param subset_columns if None, takes all of them, otherwise, try to convert

only for the listed columns

@return table

values_to_str(subset_columns=None, format=None)#

converts all values into str @param subset_columns if None, takes all of them, otherwise, try to convert

only for the listed columns

@param format format for the conversion, by None by default but it could be for exemple %1.2f. @return table

where(condition_function)#

@see me filter