module homeblog.table_formula

Inheritance diagram of ensae_teaching_cs.homeblog.table_formula

Short summary

module ensae_teaching_cs.homeblog.table_formula

Implements TableFormula.

source on GitHub

Classes

class truncated documentation
TableFormula This class aims at representating a table, it provides some « SQL like » functionalities such groupby or innerjoin, select, …

Properties

property truncated documentation
dataframe creates a pandas dataframe
json returns a json format
np_array returns the values as a numpy array
np_matrix returns the values as a numpy matrix
shape returns the size(nb rows, nb columns)
size returns the size(nb rows, nb columns)

Static Methods

staticmethod truncated documentation
add_header_if_not_present the function checks if the first line contains the column in header otherwise, it modifies the file and add them …
bootstrap return a confidence interval for a statistics
correlation_bicolumn assume values is a matrix with two columns
random_split_file split a file in nb buckets by random(lines are sent to a random file as they come)
ratio return a ratio between two real values or an empty string if the denominateur is null
save_multiple_as_excel saves multiple table in one Excel file

Methods

method truncated documentation
__add__ do an addition, add values if types are matching
__copy__ operator copy
__deepcopy__ operator deepcopy
__eq__ check if two tables are equal by value
__getitem__ operator [], accepts slices
__html__ convert the table into a html string
__init__ constructor, it can either take a filename, an object TableFormula, a list of columns and values.
__iter__ iterator on all rows, it returns a dictionary { column:value }
__len__ returns the number of rows
__mul__ do a multiplication(by a number)
__rst__ convert the table into rst format
__setitem__ operator [], just accepts tuple(to change a value)
__str__ convert the table into a string
_auto_conversion private set up the column type based on the column name
_fill_by_row fill the table
_interpret_row private returns each row as a dictionary { column_name:value }
_private_getclass the class often creates another class of the same type, this function returns the class object
_read_file private
_readlines private
add_column add a column
add_column_cumulative also called the Gini function Example:
add_column_index Example:
add_column_recursive Example:
add_column_recursive_row Example:
add_column_smooth Example:
add_column_vector add a column defined by vector(list of values for each row)
addc @see me add_column
aggregate Example:
aggregate_column Example:
avg_std returns the average and standard deviation
center_reduce center and reduce a set of columns(or all if columnsSet is None)
change_header change the column names
concatenate concatenates two tables by columns
copy call copy.deepcopy(self)
correlation computes the correlation matrix, the first column will contains the column names
correlation_col computes the correlation between two columns
correlation_row computes the correlation between two columns
covariance computes the covariance matrix, the first column will contains the column names
covariance_col computes the correlation between two columns
covariance_row computes the correlation between two columns
create_index this method creates an index, to get an indexes row, use method get Example:
dcast @see me multiply_column_by_row_instance
delta returns a list of differences between self and others
extract_columns extract some columns
fill_sql_table returns a Database object, creates the database if it does not exists, same for the table
filter Example:
filter_quantile sort all rows using criteria defined by function and remove rows at the extremes
get use the index created by method create_index Example:
get_column_values private returns all values for one column
get_distinct_values private
groupby Example:
groupby_implicit use prefix of a column name to know which function to use as an aggregated(sum, avg, len, key, none, max, min) …
histogram computes an histograms on one vector
histograms computes a common histograms on all columns
innerjoin  
modify_all apply the same modification to every number
mu_sigma computes the average and the standard deviation a vector of values
mu_sigma_each_column returns a table with the average and the standard deviation for each columns
multiplication_term_term do a multiplication term by term(similar to an addition), add values if types are matching
multiply_column_by_row_instance  
random select n random row from the table, returns a table
reduce_dict @see me todict
remove_columns remove some columns
rename_column rename a column
replicate replicates all rows a given number of times
save saves the tables in a text file, first row is the column names
save_as_excel saves the table as a new Excel file, you can use .xls or .xlsx if filename is None, the function returns …
schema_database returns the schema for a database which would contains this database
select  
sort Example:
strtype displays the type of values(not the values)
todict convert the table as a dictionary { key:value } each of them is defined by functions.
transpose computes the transpose
union  
union_columns computes the union of all values from all columns present in columnSet
values_to_date converts all values into dates
values_to_float converts all values into float
values_to_str converts all values into str
where @see me filter

Documentation

@file @brief Implements TableFormula.

class ensae_teaching_cs.homeblog.table_formula.TableFormula(file, numeric_column=[], sep='t', encoding=None, read_n_lines=-1, sheet=0, **options)[source]

Bases : ensae_teaching_cs.homeblog.table_formula_stat._TableFormulaStat

This class aims at representating a table, it provides some « SQL like » functionalities such groupby or innerjoin, select, where… This was a custom implementation of a DataFrame before I discover pandas.

The class provides an easy to go through the row table by converting each row in a dictionary { column_name: value } on the run. Example:

tbl = TableFormula(...)
newtbl = tbl.filter(lambda v: v["criteria"] == 5)

See @see op __init__ for others ways to create a table.

@var header list of column names @var values list of rows(each row contains as many value as the number of columns) @var index dictionary { column name: position }, changing header means also changing header.

Example:

table = TableFormula("name d_a d_b d_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n"))
print(table)
print("one value 0,1:", table[0,1])

print("---------")
dist = table.get_distinct_values("name")
for k in sorted(dist): print("*%d: %s"%(int(dist[k]),k))

print("---------")
table.add_column("has_A", lambda v: 1. if "A" in v["name"] else 0.)
print(table)

x = 1./3
print("------------- smoothing", x)
table.add_column_smooth("has_A_smooth", lambda v: v["has_A"], [-1,0,1], [x,x,x])
print(table)

print("--------- filter")
fil = table.filter(lambda v: v["d_b"] == 2)
print(fil)

print("--------- random")
rnd = table.random(5)
print(rnd)

print("--------- random unique")
rnd = table.random(1, True)
print(rnd)

print("--------- filter quantile")
fil = table.filter_quantile(lambda v: v["d_b"], 0, 0.4)
print(fil)

print("--------- aggregate_column")
total = table.aggregate(lambda v: v["d_c"])
print(total)

print("--------- sort")
table.sort(lambda v: v["d_b"] + v["d_c"])
print(table)

print("--------- union")
union = table.union(table)
print(union)

print("--------- group sum")
group = table.groupby(lambda v: v["name"],
                        [lambda v: v["d_a"],
                          lambda v: v["d_b"]],
                        ["name", "sum_d_a", "sum_d_b"])
print(group)

print("--------- group max")
groupmax = table.groupby(lambda v: v["name"],
                        [lambda v: v["d_a"],
                          lambda v: v["d_b"]],
                        ["name", "max_d_a", "max_d_b"],
                        [max, max])
print(groupmax)

print("--------- group sum with weights")
group = table.groupby(lambda v: v["name"],
                        [lambda v: v["d_a"]],
                        ["name", "weight", "sum_d_a"],
                        [lambda vec,w:  sum(vec) / w],
                        lambda v: v ["d_b"])

print("--------- innerjoin")
innerjoin = table.innerjoin(group, lambda v: v["name"],
                                   lambda v: v["name"], "group")
print(innerjoin)

print("------------- extraction")
ext = table.extract_columns(["name", "d_a"])
print(ext)

print("------------- remove")
ext = table.remove_columns(["d_a"])
print(ext)

print("------------- todict")
d = table.todict(lambda v: v["name"], lambda v: v["d_b"], True)
print(d)

print("------------- select")
d = table.select(lambda v:(v["name"], v["d_b"]))
print(list(d))

print("------------- use of an index")
table.create_index(lambda v:(v["name"], v["d_a"]))
row = table.get(('A', 1.1))
print(row)
value = table.get(('A', 1.1), 2)
print(value)

print("------------- multiply_column_by_row_instance ")
table = TableFormula("name d_a d_b d_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n"))
table.add_column("key_add", lambda v:"unique")
print(table)
mul = table.multiply_column_by_row_instance(
                    lambda v: v["key_add"],
                    lambda v: v["name"])
print(mul)

if os.path.exists("BNP.PA.txt"):
    print("--------------- financial stock")
    table = TableFormula("BNP.PA.txt", sep=",")
    table.sort(lambda v: v["Date"])
    print(table[:10])

print("--------------- groupby_implicit")
table = TableFormula("key_name sum_a len_b avg_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n"))
print(table)
gr = table.groupby_implicit(lambda v: v ["key_name"])
print(gr)

print("--------------- covariance")
values = [random.random() for i in range(0,100)]
values = [[x, x + random.random()/2] for x in values]
tbl = TableFormula(["x", "y"], values).values_to_float()
cov = tbl.covariance()
print(cov)

print("--------------- histogram")
hist = tbl.histogram(lambda v:(v["x"],1), 10)
print(hist)

print("--------------- histogram")
hist = tbl.values_to_float().histograms(["x", "y"], 10)
print(hist)

print("--------------- unions of columns")
hist = tbl.values_to_float().union_columns(["x", "y"])
print(hist)

constructor, it can either take a filename, an object TableFormula, a list of columns and values.

@param file filename or a list of column names or a dictionary,
file can also be a pandas DataFrame.

@param numeric_column depends on file types(see below examples) @param sep column separator if file is a filename @param read_n_lines read the first n lines(or all if it is -1) @param sheet in case the file is an Excel file, this parameter precises the sheet number or name @param suffix_nb if True, adds an integer to the column name if it is a duplicate

Example:

table = TableFormula("name d_a d_b d_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n"))

or

table = TableFormula("file.txt", ["nb"])

or

table  = TableFormula(["date", "Y", "Y2", "xl"], values)

or

data = [{ "one":1, "two":2 }, {"two":2.1, "three":3 }]
table = TableFormula(data)

or

data = { 1:{ "one":2.3, "two":2.2 }, 2:{"one":2.1, "two":3 }
table = TableFormula("__byrow__", data)

or

table = TableFormula(numpy.matrix(…))

or

table = TableFormula(numpy.array(...))

@warning In this second case, rows and header are not copied.

__add__(other)[source]

do an addition, add values if types are matching @param other matrix or float or string @return new matrix, keep the header of the first matrix

__copy__()[source]

operator copy

__deepcopy__(memo)[source]

operator deepcopy

__eq__(other)[source]

check if two tables are equal by value @param other other table @return boolean

__getitem__(irow)[source]

operator [], accepts slices @param irow integer, tuple, slice or list @return depends on irow

  • int –> a table with one row
  • slice –> a table with several rows
  • list –> a table with the selected rows
  • tuple –> a value
__hash__ = None
__html__(class_table=None, class_td=None, class_tr=None, class_th=None)[source]

convert the table into a html string

@param class_table adds a class to the tag ``table``(None for none) @param class_td adds a class to the tag ``td``(None for none) @param class_tr adds a class to the tag ``tr``(None for none) @param class_th adds a class to the tag ``th``(None for none)

__init__(file, numeric_column=[], sep='\t', encoding=None, read_n_lines=-1, sheet=0, **options)[source]

constructor, it can either take a filename, an object TableFormula, a list of columns and values.

@param file filename or a list of column names or a dictionary,
file can also be a pandas DataFrame.

@param numeric_column depends on file types(see below examples) @param sep column separator if file is a filename @param read_n_lines read the first n lines(or all if it is -1) @param sheet in case the file is an Excel file, this parameter precises the sheet number or name @param suffix_nb if True, adds an integer to the column name if it is a duplicate

Example:

table = TableFormula("name d_a d_b d_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n"))

or

table = TableFormula("file.txt", ["nb"])

or

table  = TableFormula(["date", "Y", "Y2", "xl"], values)

or

data = [{ "one":1, "two":2 }, {"two":2.1, "three":3 }]
table = TableFormula(data)

or

data = { 1:{ "one":2.3, "two":2.2 }, 2:{"one":2.1, "two":3 }
table = TableFormula("__byrow__", data)

or

table = TableFormula(numpy.matrix(…))

or

table = TableFormula(numpy.array(...))

@warning In this second case, rows and header are not copied.

__iter__()[source]

iterator on all rows, it returns a dictionary { column:value } @return dictionary

__len__()[source]

returns the number of rows

__mul__(other)[source]

do a multiplication(by a number) @param other matrix or float or string @return new matrix, keep the header of the first matrix

__rst__(add_line=True)[source]

convert the table into rst format

+------------------------+------------+----------+----------+
| Header row, column 1   | Header 2   | Header 3 | Header 4 |
| (header rows optional) |            |          |          |
+========================+============+==========+==========+
| body row 1, column 1   | column 2   | column 3 | column 4 |
+------------------------+------------+----------+----------+
| body row 2             | ...        | ...      |          |
+------------------------+------------+----------+----------+

@param add_line add a line separator between each row

__setitem__(irow, value)[source]

operator [], just accepts tuple(to change a value) @param irow 2-uple @param value new value

__str__()[source]

convert the table into a string @return string

_auto_conversion(others_columns)[source]

private set up the column type based on the column name

_fill_by_row(values)[source]

fill the table @param values dictionary { <int_row_index>: { <column name>: value} }

_interpret_row(row)[source]

private returns each row as a dictionary { column_name:value }

_private_getclass()[source]

the class often creates another class of the same type, this function returns the class object

_read_file(file, numeric_column, sep, encoding, read_n_lines, sheet=0)[source]

private

_readlines(lines, numeric_column, sep)[source]

private

add_column(colname, function, position=-1)[source]

add a column @param colname column name or columns name if it is a list or a tuple @param function function which will gives the values(or a list of functions, or a function which return a tuple) @param position where to insert the column, -1 for the end

Example:

table.add_column("has_A", lambda v: 1 if "A" in v["name"] else 0, 0)

table.add_column(("has_A", "has_B"),(lambda v: 1 if "A" in v["name"] else 0,
                                        lambda v: 1 if "B" in v["name"] else 0))

table.add_column(("has_A", "has_B"),(lambda v:(1 if "A" in v["name"] else 0, 1 if "B" in v["name"] else 0))
add_column_cumulative(column_index, column_name, functionIndex, functionValue, normalize=False, reverse=False, cumulative=True, functionSort=None)[source]

also called the Gini function Example:

table.add_column_cumulative("index_%s" % col, "dist_%s" % col,
                            lambda v: v["sum_nbclient"], lambda v: v[col],
                            functionSort = lambda v: v [col] / v["sum_nbclient"],
                            normalize=True)
add_column_index(colname='index', start=0)[source]

Example:

table.add_column("index_row")
add_column_recursive(colname, functionValue, functionAgg)[source]

Example:

table.add_column_recursive(lambda v: v ["norm_%s" % loi],
                           lambda li, v: li[-1] + v)
add_column_recursive_row(colname, functionAgg)[source]

Example:

table.add_column_recursive_row("w_%s" % loi,
                lambda li, v: li[-1] + v ["norm_%s" % loi]                             if len(li)> 0 else v ["norm_%s" % loi])
add_column_smooth(colname, function, position, weights)[source]

Example:

x = 1./3
table.add_column_smooth("has_A_smooth", lambda v: v["has_A"], [-1,0,1], [x,x,x])
add_column_vector(colname, vector)[source]

add a column defined by vector(list of values for each row)

@param colname column to add @param vector (list) list of values to add to each row @return self

static add_header_if_not_present(filename, header, encoding=None, logFunction=<function noLOG>)[source]

the function checks if the first line contains the column in header otherwise, it modifies the file and add them on the first line

@param filename filename @param header list of column name(all strings) @param encoding encoding @param logFunction use this function to log information about what is happening

addc(colname, function, position=-1)[source]

@see me add_column

aggregate(function, aggregated_function=<built-in function sum>)[source]

Example:

total = table.aggregate_column(lambda v: v["d_c"], len)
aggregate_column(colname, aggregated_function=<built-in function sum>)[source]

Example:

total = table.aggregate_column("d_c", sum)
avg_std(functionValue, functionWeight=<function TableFormula.<lambda>>)[source]

returns the average and standard deviation

static bootstrap(values, function, nbdraws=-1, alpha=0.05)[source]

return a confidence interval for a statistics @param values values @param function produces the statistics over a random set of observations chosen in values @param nbdraws number of draws, if it is equal to -1, is equal to len(values) @param alpha confidence level @return average, min, lower bound, higher bound, max

center_reduce(columnsSet=None, op=None, removeExtreme=None, mu_sigma=None)[source]

center and reduce a set of columns(or all if columnsSet is None) @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @param columnsSet set of column to deal with @param op if can be:

  • None: substract mean and normalize,
  • « mean »: substract mean only,
  • « norm »: normalize only
@param mu_sigma matrix with two rows(one for mean, second for sigma), if None,
if computes that from the matrix self, columns must have the same order that columnSet

@return the same table(with only the considered columns)

change_header(new_header)[source]

change the column names

@param new_header a list or a function which modifies the header

Example:

tbl.change_header(lambda h: h if h != "column" else "new_name")

@warning Do not do that yourself, the class holds a dictionary up to date with the column index.

concatenate(table, addPrefix='')[source]

concatenates two tables by columns @param table table @param addPrefix add a prefix to each column from table @return table (with the same number of rows as the longest one)

copy()[source]

call copy.deepcopy(self)

correlation(useBootstrap=False, collapseFormat=True, nbdraws=-1, alpha=0.05, functionKeepValue=<function TableFormula.<lambda>>)[source]

computes the correlation matrix, the first column will contains the column names

@param useBootstrap if True, use a bootstrap method to estimate the correlation @param collapseFormat if True and useBootstrap is True, produces a format average|lower bound|higher bound(at a definite confidence level) @param nbdraws number of draws(if -1, then it will be equal to the number of observations) @param alpha confidence level @param functionKeepValue if collapseFormat is True, this function is used to collapse val,low,high in a single string @return new table

static correlation_bicolumn(values, deviations=False, noCenter=False)[source]

assume values is a matrix with two columns @param values 2 column matrix @param deviations if True, returns cor, sigma1, sigma2 @param noCenter if True, do not remove the average before computing the covariance,

it means we assume variables are already centered

@return correlation factor or correlation, sigma1, sigma2 if deviations is True

correlation_col(col1, col2, noCenter=False)[source]

computes the correlation between two columns @param col1 column 1 @param col2 column 2 @param noCenter does the computation without removing the average @return float(covariance)

correlation_row(row1, row2, noCenter=False)[source]

computes the correlation between two columns @param row1 row 1(integer) @param row2 row 2(integer) @param noCenter does the computation without removing the average @return float(covariance)

covariance()[source]

computes the covariance matrix, the first column will contains the column names @return new table

covariance_col(col1, col2, noCenter=False)[source]

computes the correlation between two columns @param col1 column 1 @param col2 column 2 @param noCenter does the computation without removing the average @return float(covariance)

covariance_row(row1, row2, noCenter=False)[source]

computes the correlation between two columns @param row1 row 1(integer) @param row2 row 2(integer) @param noCenter does the computation without removing the average @return float(covariance)

create_index(functionIndex)[source]

this method creates an index, to get an indexes row, use method get Example:

table.create_index(lambda v:(v["name"], v["d_a"]))
row = table.get(('A', 1.1))
value = table.get(('A', 1.1), 2)
dataframe

creates a pandas dataframe @return pandas.dataframe

dcast(functionKey, functionInstance, full=True)[source]

@see me multiply_column_by_row_instance

delta(other)[source]

returns a list of differences between self and others

@param other TableFormula @return list of differences(first one)

extract_columns(listColumns)[source]

extract some columns

@param listColumns list of columns to remove or a function
which returns True if the column has to be extracted based on its name

@return table

Example:

ext = table.extract_columns(["name", "d_a"])
fill_sql_table(filename_or_database, tablename, add_id='idr')[source]

returns a Database object, creates the database if it does not exists, same for the table

@param filename_or_database filename or Database object,
in that second case, we assume method connect was called before

@param tablename table name @param add_id if is not None, then the function adds an id, it first takes the

max(id) and goes on incrementing it;
@return Database object(new or the one from the parameters),
in both case, the database is not disconnected
filter(condition_function)[source]

Example:

fil = table.filter(lambda v: v["d_b"] == 2)

@warning Rows are not copied.

filter_quantile(function, alpha_min=0.025, alpha_max=0.025)[source]

sort all rows using criteria defined by function and remove rows at the extremes

@param function values used to estimate the quantiles @param alpha_min lower quantile @param alpha_max higher quantile @return a table containing all the rows where the criterium

is within the two quantiles

Example:

fil = table.filter_quantile(lambda v: v["d_b"], 0, 0.4)

@warning Rows are not copied.

get(rowIndex, column=None)[source]

use the index created by method create_index Example:

table.create_index(lambda v:(v["name"], v["d_a"]))
row = table.get(('A', 1.1))
value = table.get(('A', 1.1), 2)
get_column_values(col)[source]

private returns all values for one column

get_distinct_values(col)[source]

private

groupby(functionKey, functionsValue, columns=None, functionsAgg=None, functionWeight=None)[source]

Example:

group = table.groupby(lambda v: v["name"],
                        [lambda v: v["d_a"],
                          lambda v: v["d_b"]],
                        ["name", "sum_d_a", "sum_d_b"])

or

groupmax = table.groupby(lambda v: v["name"],
                        [lambda v: v["d_a"],
                          lambda v: v["d_b"]],
                        ["name", "max_d_a", "max_d_b"],
                        [max, max])
groupby_implicit(functionKey, functionWeight=None, logging=None)[source]

use prefix of a column name to know which function to use as an aggregated(sum, avg, len, key, none, max, min) Example:

group = table.groupby_implicit(lambda v: v["name"])
histogram(functionValue, nbDiv=100, secondColumnIsWeight=False, normalize=True, removeExtreme=0.05)[source]

computes an histograms on one vector @param functionValue function which produces the value to histogram @param nbDiv number of divisions for this histograms(boundaries are min and max) @param secondColumnIsWeight if True, the second column is the weight @param normalize if True, normalize by the sum of weights @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @return table with two columns

histograms(columnsSet, nbDiv=100, secondColumnIsWeight=False, normalize=True, removeExtreme=0.05, histxName='histKey')[source]

computes a common histograms on all columns @param columnsSet set of columns @param nbDiv number of divisions for this histograms(boundaries are min and max) @param secondColumnIsWeight if True, the second column is the weight @param normalize if True, normalize by the sum of weights @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @param histxName column name given to the x axis shared by every histogram @return table with two columns

@warning The function skips any NaN of Inf value.

innerjoin(table, functionKey1, functionKey2, nameKey='key', addSuffixAnyWay=False, prefixToAdd=None, full=False, keepKey=True, putKeyInColumn=None, missingValue=None, uniqueKey=False)[source]

@param table other table to join with @param functionKey1 key for the first table(a function) @param functionKey2 key for the second table(a function) innerjoin …. ON … @param addSuffixAnyWay add a suffix to every column from the second table even if names are different(suffix is « + ») @param prefixToAdd prefix to add the the columns of the second table @param full add all items even if there is no common keys(FULL OUTER JOIN), otherwise keep only common keys @param keepKey keep the key as a column in the result(column is key), otherwise not @param putKeyInColumn private parameter: keepKey has to be true and in this case, put the key in this column(integer) @param missingValue when there is not key on one side, this default value will be put in place @param uniqueKey if True, the function assumes there is a bijection between rows and keys(one row <–> one key) on both tables,

otherwise, it will not.

@return a table

Example:

innerjoin = table.innerjoin(group, lambda v: v["name"],
                                   lambda v: v["name"], "group")
json

returns a json format @return string

modify_all(modification_function)[source]

apply the same modification to every number @param modification_function modification to apply to every number @return new table

The signature of the function is the following one:

def function(value, column_name):
    # ....
    return new_value

Example:

tbl = tbl.modify_all(lambda v,c: {"string":"", "numerical":0}.get(c,None) if v is None else v)
mu_sigma(functionValues, removeExtreme=None)[source]

computes the average and the standard deviation a vector of values @param functionValues function produces the vector of values @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @return (average, standard deviation)

mu_sigma_each_column(columnsSet=None, removeExtreme=None)[source]

returns a table with the average and the standard deviation for each columns @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @param columnsSet set of column to deal with @return table with two rows: average and standard deviation

multiplication_term_term(other)[source]

do a multiplication term by term(similar to an addition), add values if types are matching

@param other matrix or float or string @return new matrix, keep the header of the first matrix

multiply_column_by_row_instance(functionKey, functionInstance, full=True)[source]

@param functionKey defines a key(function) @param functionInstance defines a second key(will be moved to the columns dimension) @param full introduces missing values for not found combinations @return a table

If a column contains a finite set of value, for example, we have the temperature for several cities organized like if it were a table from a database: city, date, temperatue. We would like to get another table where we have: date temparature_city1 temperature_city2…

Then we would type: Example:

mul = table.multiply_column_by_row_instance(
                    lambda v: v["date"],
                    lambda v: v["city"])

The input table would be like:

city   date
A      jan
A      feb
B      feb

It returns:

KEY     A|city      A|date  B|city  B|date
feb     A       feb     B       feb
jan     A       jan     None        None
np_array

returns the values as a numpy array @return numpy array

np_matrix

returns the values as a numpy matrix @return numpy matrix

random(n, unique=False)[source]

select n random row from the table, returns a table

@param n number of desired random rows @param unique draws unique rows or non unique rows

(tirage sans remise ou avec remise)

@return a table

Example:

rnd = table.random(10)
static random_split_file(filename, outfileprefix, nb, has_header=True, encoding=None, logFunction=<function noLOG>)[source]

split a file in nb buckets by random(lines are sent to a random file as they come)

@param filename filename to split @param nb number of buckets @param outfileprefix output files will start with outfileprefix + “%04d.txt” % i @param encoding encoding @param has_header the header will be replicated in each created file @param logFunction to display information @return list of created files

static ratio(x, y)[source]

return a ratio between two real values or an empty string if the denominateur is null @return a real of an empty string

reduce_dict(functionKey, functionValue, uselist=False)[source]

@see me todict

remove_columns(listColumns)[source]

remove some columns

@param listColumns list of columns to remove or a function
which returns True if the column has to be removed based on its name

@return table

Example:

rem = table.remove("d_a")
rename_column(old_name, new_name)[source]

rename a column

@param old_name old name @param new_name new name

replicate(times)[source]

replicates all rows a given number of times @param times number of multiplication @return new matrix, keep the header of the first matrix

save(filename, sep='\t', encoding=None, newline='\n')[source]

saves the tables in a text file, first row is the column names

@param filename filename @param sep column separator @param encoding encoding @param newline line separator

save_as_excel(filename, font='Calibri', sheetname='sheet0', close=True, encoding=None)[source]

saves the table as a new Excel file, you can use .xls or .xlsx if filename is None, the function returns an object(xslx) and does not save it.

@param filename Excel filename @param sheetname name of the sheet to add @param font font name @param close if True, close the file, otherwise, the user will have to @param encoding encoding @return object Workbook

static save_multiple_as_excel(filename, list_table, font='Calibri', close=True, encoding=None)[source]

saves multiple table in one Excel file

@param filename filename(can be None) @param list_table list of 2uple(« name », tbl) @param font font name @param close if True, close the file, otherwise, the user will have to @param encoding encoding @return object Workbook

schema_database(add_id=True)[source]

returns the schema for a database which would contains this database

@param add_id if True, adds an index « PRIMARYKEY » @return dictionary { index_column:(name, type) }

select(functionRow)[source]

@param functionRow fonction @return table

Example:

d = table.select(lambda v:(v["name"], v["d_b"]))
print(list(d))
shape

returns the size(nb rows, nb columns)

size

returns the size(nb rows, nb columns)

sort(functionValue, reverse=False)[source]

Example:

table.sort(lambda v: v["d_b"] + v["d_c"])
strtype()[source]

displays the type of values(not the values)

todict(functionKey, functionValue, useList=False)[source]

convert the table as a dictionary { key:value } each of them is defined by functions.

@param functionKey defines the key @param functionValue defines the value @param useList if there are multiple rows sharing the same key, it should be true,

all values are stored in a list

@return a dictionary { key:row } or { key: [row1, row2, …] }

Example:

d = table.todict(lambda v: v["name"], lambda v: v["d_b"], True)
transpose(labelC=None, labelAsRow=True)[source]

computes the transpose @param labelC proposes labels for the column,

if None, take « r%d » % i, if it is a string, the function assumes it is a column name

@param labelAsRow add the label as a row @return new table

union(table)[source]

@param table table @return table(with the same number of columns)

concatenates two tables by rows, they must have the same header, rows of both tables are merged into a single matrix Example:

union = table.union(table2)
union_columns(columnsSet)[source]

computes the union of all values from all columns present in columnSet @param columnsSet set of columns @return table

values_to_date(format=None, only_if_possible=False, subset_columns=None)[source]

converts all values into dates @param only_if_possible if True, converts all possible values and catches exception,

if False, converts everything, raises an exception when not possible

@param format date format see fn str_to_datetime @param subset_columns if None, takes all of them, otherwise, try to convert

only for the listed columns

@return table

values_to_float(only_if_possible=False, subset_columns=None)[source]

converts all values into float @param only_if_possible if True, converts all possible values and catches exception,

if False, converts everything, raises an exception when not possible
@param subset_columns if None, takes all of them, otherwise, try to convert
only for the listed columns

@return table

values_to_str(subset_columns=None, format=None)[source]

converts all values into str @param subset_columns if None, takes all of them, otherwise, try to convert

only for the listed columns

@param format format for the conversion, by None by default but it could be for exemple %1.2f. @return table

where(condition_function)[source]

@see me filter