module `homeblog.table_formula`#

Short summary#

module ensae_teaching_cs.homeblog.table_formula

Implements TableFormula.

Classes#

class	truncated documentation
`TableFormula`	This class aims at representating a table, it provides some « SQL like » functionalities such groupby or innerjoin, select, …

Properties#

property	truncated documentation
`dataframe`	creates a pandas dataframe
`json`	returns a json format
`np_array`	returns the values as a numpy array
`np_matrix`	returns the values as a numpy matrix
`shape`	returns the size(nb rows, nb columns)
`size`	returns the size(nb rows, nb columns)

Static Methods#

staticmethod	truncated documentation
`add_header_if_not_present`	the function checks if the first line contains the column in header otherwise, it modifies the file and add them …
`bootstrap`	return a confidence interval for a statistics
`correlation_bicolumn`	assume values is a matrix with two columns
`random_split_file`	split a file in nb buckets by random(lines are sent to a random file as they come)
`ratio`	return a ratio between two real values or an empty string if the denominateur is null
`save_multiple_as_excel`	saves multiple table in one Excel file

Methods#

method	truncated documentation
`__add__`	do an addition, add values if types are matching
`__copy__`	operator copy
`__deepcopy__`	operator `deepcopy`
`__eq__`	check if two tables are equal by value
`__getitem__`	operator [], accepts slices
`__html__`	Converts the table into a html string.
`__init__`	It can either take a filename, an object TableFormula, a list of columns and values.
`__iter__`	iterator on all rows, it returns a dictionary { column:value }
`__len__`	returns the number of rows
`__mul__`	do a multiplication(by a number)
`__rst__`	convert the table into rst format
`__setitem__`	operator [], just accepts tuple(to change a value)
`__str__`	convert the table into a string
`_auto_conversion`	private set up the column type based on the column name
`_fill_by_row`	fill the table
`_interpret_row`	private returns each row as a dictionary { column_name:value }
`_private_getclass`	the class often creates another class of the same type, this function returns the class object
`_read_file`	private
`_readlines`	private
`add_column`	Adds a column.
`add_column_cumulative`	also called the Gini function Example:
`add_column_index`	Example:
`add_column_recursive`	Example:
`add_column_recursive_row`	Example:
`add_column_smooth`	Example:
`add_column_vector`	add a column defined by vector(list of values for each row)
`addc`	@see me add_column
`aggregate`	Example:
`aggregate_column`	Example:
`avg_std`	returns the average and standard deviation
`center_reduce`	center and reduce a set of columns(or all if columnsSet is None)
`change_header`	change the column names
`concatenate`	concatenates two tables by columns
`copy`	call `copy.deepcopy(self)`
`correlation`	Computes the correlation matrix, the first column will contains the column names.
`correlation_col`	Computes the correlation between two columns.
`correlation_row`	computes the correlation between two columns
`covariance`	Computes the covariance matrix, the first column will contains the column names.
`covariance_col`	Computes the correlation between two columns.
`covariance_row`	computes the correlation between two columns
`create_index`	this method creates an index, to get an indexes row, use method get Example:
`dcast`	@see me multiply_column_by_row_instance
`delta`	returns a list of differences between self and others
`extract_columns`	extract some columns
`fill_sql_table`	returns a Database object, creates the database if it does not exists, same for the table
`filter`	Example:
`filter_quantile`	sort all rows using criteria defined by function and remove rows at the extremes
`get`	use the index created by method create_index Example:
`get_column_values`	private returns all values for one column
`get_distinct_values`	private
`groupby`	Example:
`groupby_implicit`	use prefix of a column name to know which function to use as an aggregated(sum, avg, len, key, none, max, min) …
`histogram`	computes an histograms on one vector
`histograms`	computes a common histograms on all columns
`innerjoin`
`modify_all`	apply the same modification to every number
`mu_sigma`	computes the average and the standard deviation a vector of values
`mu_sigma_each_column`	returns a table with the average and the standard deviation for each columns
`multiplication_term_term`	do a multiplication term by term(similar to an addition), add values if types are matching
`multiply_column_by_row_instance`
`random`	select n random row from the table, returns a table
`reduce_dict`	@see me todict
`remove_columns`	remove some columns
`rename_column`	rename a column
`replicate`	replicates all rows a given number of times
`save`	saves the tables in a text file, first row is the column names
`save_as_excel`	saves the table as a new Excel file, you can use `.xls` or `.xlsx` if filename is None, the function returns …
`schema_database`	returns the schema for a database which would contains this database
`select`
`sort`	Example:
`strtype`	displays the type of values(not the values)
`todict`	convert the table as a dictionary { key:value } each of them is defined by functions.
`transpose`	Computes the transpose.
`union`
`union_columns`	computes the union of all values from all columns present in columnSet
`values_to_date`	converts all values into dates
`values_to_float`	converts all values into float
`values_to_str`	converts all values into str
`where`	@see me filter

Documentation#

@file @brief Implements TableFormula.

class ensae_teaching_cs.homeblog.table_formula.TableFormula(file, numeric_column=None, sep='\t', encoding=None, read_n_lines=-1, sheet=0, **options)#

Bases : _TableFormulaStat

This class aims at representating a table, it provides some « SQL like » functionalities such groupby or innerjoin, select, where… This was a custom implementation of a DataFrame before I discover pandas.

The class provides an easy to go through the row table by converting each row in a dictionary { column_name: value } on the run. Example:

tbl = TableFormula(...)
newtbl = tbl.filter(lambda v: v["criteria"] == 5)

See @see op __init__ for others ways to create a table.

@var header list of column names @var values list of rows(each row contains as many value as the number of columns) @var index dictionary { column name: position }, changing header means also changing header.

Example:

table = TableFormula("name d_a d_b d_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n"))
print(table)
print("one value 0,1:", table[0,1])

print("---------")
dist = table.get_distinct_values("name")
for k in sorted(dist): print("*%d: %s"%(int(dist[k]),k))

print("---------")
table.add_column("has_A", lambda v: 1. if "A" in v["name"] else 0.)
print(table)

x = 1./3
print("------------- smoothing", x)
table.add_column_smooth("has_A_smooth", lambda v: v["has_A"], [-1,0,1], [x,x,x])
print(table)

print("--------- filter")
fil = table.filter(lambda v: v["d_b"] == 2)
print(fil)

print("--------- random")
rnd = table.random(5)
print(rnd)

print("--------- random unique")
rnd = table.random(1, True)
print(rnd)

print("--------- filter quantile")
fil = table.filter_quantile(lambda v: v["d_b"], 0, 0.4)
print(fil)

print("--------- aggregate_column")
total = table.aggregate(lambda v: v["d_c"])
print(total)

print("--------- sort")
table.sort(lambda v: v["d_b"] + v["d_c"])
print(table)

print("--------- union")
union = table.union(table)
print(union)

print("--------- group sum")
group = table.groupby(lambda v: v["name"],
                        [lambda v: v["d_a"],
                          lambda v: v["d_b"]],
                        ["name", "sum_d_a", "sum_d_b"])
print(group)

print("--------- group max")
groupmax = table.groupby(lambda v: v["name"],
                        [lambda v: v["d_a"],
                          lambda v: v["d_b"]],
                        ["name", "max_d_a", "max_d_b"],
                        [max, max])
print(groupmax)

print("--------- group sum with weights")
group = table.groupby(lambda v: v["name"],
                        [lambda v: v["d_a"]],
                        ["name", "weight", "sum_d_a"],
                        [lambda vec,w:  sum(vec) / w],
                        lambda v: v ["d_b"])

print("--------- innerjoin")
innerjoin = table.innerjoin(group, lambda v: v["name"],
                                   lambda v: v["name"], "group")
print(innerjoin)

print("------------- extraction")
ext = table.extract_columns(["name", "d_a"])
print(ext)

print("------------- remove")
ext = table.remove_columns(["d_a"])
print(ext)

print("------------- todict")
d = table.todict(lambda v: v["name"], lambda v: v["d_b"], True)
print(d)

print("------------- select")
d = table.select(lambda v:(v["name"], v["d_b"]))
print(list(d))

print("------------- use of an index")
table.create_index(lambda v:(v["name"], v["d_a"]))
row = table.get(('A', 1.1))
print(row)
value = table.get(('A', 1.1), 2)
print(value)

print("------------- multiply_column_by_row_instance ")
table = TableFormula("name d_a d_b d_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n"))
table.add_column("key_add", lambda v:"unique")
print(table)
mul = table.multiply_column_by_row_instance(
                    lambda v: v["key_add"],
                    lambda v: v["name"])
print(mul)

if os.path.exists("BNP.PA.txt"):
    print("--------------- financial stock")
    table = TableFormula("BNP.PA.txt", sep=",")
    table.sort(lambda v: v["Date"])
    print(table[:10])

print("--------------- groupby_implicit")
table = TableFormula("key_name sum_a len_b avg_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n"))
print(table)
gr = table.groupby_implicit(lambda v: v ["key_name"])
print(gr)

print("--------------- covariance")
values = [random.random() for i in range(0,100)]
values = [[x, x + random.random()/2] for x in values]
tbl = TableFormula(["x", "y"], values).values_to_float()
cov = tbl.covariance()
print(cov)

print("--------------- histogram")
hist = tbl.histogram(lambda v:(v["x"],1), 10)
print(hist)

print("--------------- histogram")
hist = tbl.values_to_float().histograms(["x", "y"], 10)
print(hist)

print("--------------- unions of columns")
hist = tbl.values_to_float().union_columns(["x", "y"])
print(hist)

It can either take a filename, an object TableFormula, a list of columns and values.

Paramètres:

file – filename or a list of column names or a dictionary, file can also be a pandas DataFrame.
numeric_column – depends on file types(see below examples)
sep – column separator if file is a filename
read_n_lines – read the first n lines(or all if it is -1)
sheet – in case the file is an Excel file, this parameter precises the sheet number or name
suffix_nb – if True, adds an integer to the column name if it is a duplicate

Example:

table = TableFormula("name d_a d_b d_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n"))

or

table = TableFormula("file.txt", ["nb"])

or

table  = TableFormula(["date", "Y", "Y2", "xl"], values)

or

data = [{ "one":1, "two":2 }, {"two":2.1, "three":3 }]
table = TableFormula(data)

or

data = { 1:{ "one":2.3, "two":2.2 }, 2:{"one":2.1, "two":3 }
table = TableFormula("__byrow__", data)

or

table = TableFormula(numpy.matrix(…))

or

table = TableFormula(numpy.array(...))

@warning In this second case, rows and header are not copied.

__add__(other)#: do an addition, add values if types are matching :param other: matrix or float or string :return: new matrix, keep the header of the first matrix

__copy__()#: operator copy

__deepcopy__(memo)#: operator deepcopy

__eq__(other)#: check if two tables are equal by value :param other: other table :return: boolean

__getitem__(irow)#

operator [], accepts slices :param irow: integer, tuple, slice or list :return: depends on irow

int –> a table with one row

slice –> a table with several rows

list –> a table with the selected rows

tuple –> a value

__hash__ = None#

__html__(class_table=None, class_td=None, class_tr=None, class_th=None)#

Converts the table into a html string.

Paramètres:

class_table – adds a class to the tag table (None for none)
class_td – adds a class to the tag td (None for none)
class_tr – adds a class to the tag tr (None for none)
class_th – adds a class to the tag th (None for none)

__init__(file, numeric_column=None, sep='\t', encoding=None, read_n_lines=-1, sheet=0, **options)#

It can either take a filename, an object TableFormula, a list of columns and values.

Paramètres:

file –
filename or a list of column names or a dictionary, file can also be a pandas DataFrame.
numeric_column – depends on file types(see below examples)
sep – column separator if file is a filename
read_n_lines – read the first n lines(or all if it is -1)
sheet – in case the file is an Excel file, this parameter precises the sheet number or name
suffix_nb – if True, adds an integer to the column name if it is a duplicate

Example:

table = TableFormula("name d_a d_b d_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n"))

or

table = TableFormula("file.txt", ["nb"])

or

table  = TableFormula(["date", "Y", "Y2", "xl"], values)

or

data = [{ "one":1, "two":2 }, {"two":2.1, "three":3 }]
table = TableFormula(data)

or

data = { 1:{ "one":2.3, "two":2.2 }, 2:{"one":2.1, "two":3 }
table = TableFormula("__byrow__", data)

or

table = TableFormula(numpy.matrix(…))

or

table = TableFormula(numpy.array(...))

@warning In this second case, rows and header are not copied.

__iter__()#: iterator on all rows, it returns a dictionary { column:value } @return dictionary

__len__()#: returns the number of rows

__mul__(other)#: do a multiplication(by a number) :param other: matrix or float or string :return: new matrix, keep the header of the first matrix

__rst__(add_line=True)#

convert the table into rst format

+------------------------+------------+----------+----------+
| Header row, column 1   | Header 2   | Header 3 | Header 4 |
| (header rows optional) |            |          |          |
+========================+============+==========+==========+
| body row 1, column 1   | column 2   | column 3 | column 4 |
+------------------------+------------+----------+----------+
| body row 2             | ...        | ...      |          |
+------------------------+------------+----------+----------+

Paramètres:: add_line – add a line separator between each row

__setitem__(irow, value)#: operator [], just accepts tuple(to change a value) :param irow: 2-uple :param value: new value

__str__()#: convert the table into a string :return: string

_auto_conversion(others_columns)#: private set up the column type based on the column name

_fill_by_row(values)#: fill the table :param values: dictionary { <int_row_index>: { <column name>: value} }

_interpret_row(row)#: private returns each row as a dictionary { column_name:value }

_private_getclass()#: the class often creates another class of the same type, this function returns the class object

_read_file(file, numeric_column, sep, encoding, read_n_lines, sheet=0)#: private

_readlines(lines, numeric_column, sep)#: private

add_column(colname, function, position=-1)#

Adds a column. :param colname: column name or columns name if it is a list or a tuple :param function: function which will gives the values(or a list of functions, or a function which return a tuple) :param position: where to insert the column, -1 for the end

Example:

table.add_column("has_A", lambda v: 1 if "A" in v["name"] else 0, 0)

table.add_column(("has_A", "has_B"),(lambda v: 1 if "A" in v["name"] else 0,
                                        lambda v: 1 if "B" in v["name"] else 0))

table.add_column(("has_A", "has_B"),(lambda v:(1 if "A" in v["name"] else 0, 1 if "B" in v["name"] else 0))

add_column_cumulative(column_index, column_name, functionIndex, functionValue, normalize=False, reverse=False, cumulative=True, functionSort=None)#

also called the Gini function Example:

table.add_column_cumulative("index_%s" % col, "dist_%s" % col,
                            lambda v: v["sum_nbclient"], lambda v: v[col],
                            functionSort = lambda v: v [col] / v["sum_nbclient"],
                            normalize=True)

add_column_index(colname='index', start=0)#

Example:

table.add_column("index_row")

add_column_recursive(colname, functionValue, functionAgg)#

Example:

table.add_column_recursive(lambda v: v ["norm_%s" % loi],
                           lambda li, v: li[-1] + v)

add_column_recursive_row(colname, functionAgg)#

Example:

table.add_column_recursive_row("w_%s" % loi,
                lambda li, v: li[-1] + v ["norm_%s" % loi]                             if len(li)> 0 else v ["norm_%s" % loi])

add_column_smooth(colname, function, position, weights)#

Example:

x = 1./3
table.add_column_smooth("has_A_smooth", lambda v: v["has_A"], [-1,0,1], [x,x,x])

add_column_vector(colname, vector)#

add a column defined by vector(list of values for each row)

Paramètres:

colname – column to add
vector – (list) list of values to add to each row

:return self

static add_header_if_not_present(filename, header, encoding=None, logFunction=<function noLOG>)#

the function checks if the first line contains the column in header otherwise, it modifies the file and add them on the first line

@param filename filename @param header list of column name(all strings) @param encoding encoding @param logFunction use this function to log information about what is happening

addc(colname, function, position=-1)#: @see me add_column

aggregate(function, aggregated_function=<built-in function sum>)#

Example:

total = table.aggregate_column(lambda v: v["d_c"], len)

aggregate_column(colname, aggregated_function=<built-in function sum>)#

Example:

total = table.aggregate_column("d_c", sum)

avg_std(functionValue, functionWeight=<function TableFormula.<lambda>>)#: returns the average and standard deviation

static bootstrap(values, function, nbdraws=-1, alpha=0.05)#: return a confidence interval for a statistics @param values values @param function produces the statistics over a random set of observations chosen in values @param nbdraws number of draws, if it is equal to -1, is equal to len(values) @param alpha confidence level @return average, min, lower bound, higher bound, max

center_reduce(columnsSet=None, op=None, removeExtreme=None, mu_sigma=None)#

center and reduce a set of columns(or all if columnsSet is None) @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @param columnsSet set of column to deal with @param op if can be:

None: substract mean and normalize,

« mean »: substract mean only,

« norm »: normalize only

@param mu_sigma matrix with two rows(one for mean, second for sigma), if None,: if computes that from the matrix self, columns must have the same order that columnSet

@return the same table(with only the considered columns)

change_header(new_header)#

change the column names

Paramètres:: new_header – a list or a function which modifies the header

Example:

tbl.change_header(lambda h: h if h != "column" else "new_name")

Avertissement

Do not do that yourself, the class holds a dictionary up to date with the column index.

concatenate(table, addPrefix='')#: concatenates two tables by columns @param table table @param addPrefix add a prefix to each column from table @return table (with the same number of rows as the longest one)

copy()#: call copy.deepcopy(self)

correlation(useBootstrap=False, collapseFormat=True, nbdraws=-1, alpha=0.05, functionKeepValue=<function TableFormula.<lambda>>)#

Computes the correlation matrix, the first column will contains the column names.

@param useBootstrap if True, use a bootstrap method to estimate the correlation @param collapseFormat if True and useBootstrap is True, produces a format

average|lower bound|higher bound(at a definite confidence level)

@param nbdraws number of draws(if -1, then it will be equal to the number of observations) @param alpha confidence level @param functionKeepValue if collapseFormat is True, this function is used to collapse val,low,high in a single string @return new table

static correlation_bicolumn(values, deviations=False, noCenter=False)#

assume values is a matrix with two columns @param values 2 column matrix @param deviations if True, returns cor, sigma1, sigma2 @param noCenter if True, do not remove the average before computing the covariance,

it means we assume variables are already centered

@return correlation factor or correlation, sigma1, sigma2 if deviations is True

correlation_col(col1, col2, noCenter=False)#: Computes the correlation between two columns. @param col1 column 1 @param col2 column 2 @param noCenter does the computation without removing the average @return float(covariance)

correlation_row(row1, row2, noCenter=False)#: computes the correlation between two columns @param row1 row 1(integer) @param row2 row 2(integer) @param noCenter does the computation without removing the average @return float(covariance)

covariance()#: Computes the covariance matrix, the first column will contains the column names. @return new table

covariance_col(col1, col2, noCenter=False)#: Computes the correlation between two columns. @param col1 column 1 @param col2 column 2 @param noCenter does the computation without removing the average @return float(covariance)

covariance_row(row1, row2, noCenter=False)#: computes the correlation between two columns @param row1 row 1(integer) @param row2 row 2(integer) @param noCenter does the computation without removing the average @return float(covariance)

create_index(functionIndex)#

this method creates an index, to get an indexes row, use method get Example:

table.create_index(lambda v:(v["name"], v["d_a"]))
row = table.get(('A', 1.1))
value = table.get(('A', 1.1), 2)

property dataframe#: creates a pandas dataframe @return pandas.dataframe

dcast(functionKey, functionInstance, full=True)#: @see me multiply_column_by_row_instance

delta(other)#

returns a list of differences between self and others

Paramètres:: other – TableFormula
Renvoie:: list of differences(first one)

extract_columns(listColumns)#

extract some columns

@param listColumns list of columns to remove or a function: which returns True if the column has to be extracted based on its name

@return table

Example:

ext = table.extract_columns(["name", "d_a"])

fill_sql_table(filename_or_database, tablename, add_id='idr')#

returns a Database object, creates the database if it does not exists, same for the table

@param filename_or_database filename or Database object,: in that second case, we assume method connect was called before

@param tablename table name @param add_id if is not None, then the function adds an id, it first takes the

max(id) and goes on incrementing it;

@return Database object(new or the one from the parameters),: in both case, the database is not disconnected

filter(condition_function)#

Example:

fil = table.filter(lambda v: v["d_b"] == 2)

@warning Rows are not copied.

filter_quantile(function, alpha_min=0.025, alpha_max=0.025)#

sort all rows using criteria defined by function and remove rows at the extremes

@param function values used to estimate the quantiles @param alpha_min lower quantile @param alpha_max higher quantile @return a table containing all the rows where the criterium

is within the two quantiles

Example:

fil = table.filter_quantile(lambda v: v["d_b"], 0, 0.4)

@warning Rows are not copied.

get(rowIndex, column=None)#

use the index created by method create_index Example:

table.create_index(lambda v:(v["name"], v["d_a"]))
row = table.get(('A', 1.1))
value = table.get(('A', 1.1), 2)

get_column_values(col)#: private returns all values for one column

get_distinct_values(col)#: private

groupby(functionKey, functionsValue, columns=None, functionsAgg=None, functionWeight=None)#

Example:

group = table.groupby(lambda v: v["name"],
                        [lambda v: v["d_a"],
                          lambda v: v["d_b"]],
                        ["name", "sum_d_a", "sum_d_b"])

or

groupmax = table.groupby(lambda v: v["name"],
                        [lambda v: v["d_a"],
                          lambda v: v["d_b"]],
                        ["name", "max_d_a", "max_d_b"],
                        [max, max])

groupby_implicit(functionKey, functionWeight=None, logging=None)#

use prefix of a column name to know which function to use as an aggregated(sum, avg, len, key, none, max, min) Example:

group = table.groupby_implicit(lambda v: v["name"])

histogram(functionValue, nbDiv=100, secondColumnIsWeight=False, normalize=True, removeExtreme=0.05)#: computes an histograms on one vector @param functionValue function which produces the value to histogram @param nbDiv number of divisions for this histograms(boundaries are min and max) @param secondColumnIsWeight if True, the second column is the weight @param normalize if True, normalize by the sum of weights @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @return table with two columns

histograms(columnsSet, nbDiv=100, secondColumnIsWeight=False, normalize=True, removeExtreme=0.05, histxName='histKey')#

computes a common histograms on all columns @param columnsSet set of columns @param nbDiv number of divisions for this histograms(boundaries are min and max) @param secondColumnIsWeight if True, the second column is the weight @param normalize if True, normalize by the sum of weights @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @param histxName column name given to the x axis shared by every histogram @return table with two columns

@warning The function skips any NaN of Inf value.

innerjoin(table, functionKey1, functionKey2, nameKey='key', addSuffixAnyWay=False, prefixToAdd=None, full=False, keepKey=True, putKeyInColumn=None, missingValue=None, uniqueKey=False)#

@param table other table to join with @param functionKey1 key for the first table(a function) @param functionKey2 key for the second table(a function) innerjoin …. ON … @param addSuffixAnyWay add a suffix to every column from the second table even

if names are different(suffix is « + »)

@param prefixToAdd prefix to add the the columns of the second table @param full add all items even if there is no common keys(FULL OUTER JOIN),

otherwise keep only common keys

@param keepKey keep the key as a column in the result(column is key), otherwise not @param putKeyInColumn private parameter: keepKey has to be true and in this case,

put the key in this column(integer)

@param missingValue when there is not key on one side, this default value will be put in place @param uniqueKey if True, the function assumes there is a bijection between rows

and keys(one row <–> one key) on both tables, otherwise, it will not.

@return a table

Example:

innerjoin = table.innerjoin(group, lambda v: v["name"],
                                   lambda v: v["name"], "group")

property json#: returns a json format @return string

modify_all(modification_function)#

apply the same modification to every number @param modification_function modification to apply to every number @return new table

The signature of the function is the following one:

def function(value, column_name):
    # ....
    return new_value

Example:

tbl = tbl.modify_all(lambda v,c: {"string":"", "numerical":0}.get(c,None) if v is None else v)

mu_sigma(functionValues, removeExtreme=None)#: computes the average and the standard deviation a vector of values @param functionValues function produces the vector of values @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @return (average, standard deviation)

mu_sigma_each_column(columnsSet=None, removeExtreme=None)#: returns a table with the average and the standard deviation for each columns @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @param columnsSet set of column to deal with @return table with two rows: average and standard deviation

multiplication_term_term(other)#

do a multiplication term by term(similar to an addition), add values if types are matching

Paramètres:: other – matrix or float or string
Renvoie:: new matrix, keep the header of the first matrix

multiply_column_by_row_instance(functionKey, functionInstance, full=True)#

@param functionKey defines a key(function) @param functionInstance defines a second key(will be moved to the columns dimension) @param full introduces missing values for not found combinations @return a table

If a column contains a finite set of value, for example, we have the temperature for several cities organized like if it were a table from a database: city, date, temperatue. We would like to get another table where we have: date temparature_city1 temperature_city2…

Then we would type: Example:

mul = table.multiply_column_by_row_instance(
                    lambda v: v["date"],
                    lambda v: v["city"])

The input table would be like:

city   date
A      jan
A      feb
B      feb

It returns:

KEY     A|city      A|date  B|city  B|date
feb     A       feb     B       feb
jan     A       jan     None        None

property np_array#: returns the values as a numpy array @return numpy array

property np_matrix#: returns the values as a numpy matrix @return numpy matrix

random(n, unique=False)#

select n random row from the table, returns a table

@param n number of desired random rows @param unique draws unique rows or non unique rows

(tirage sans remise ou avec remise)

@return a table

Example:

rnd = table.random(10)

static random_split_file(filename, outfileprefix, nb, has_header=True, encoding=None, logFunction=<function noLOG>)#

split a file in nb buckets by random(lines are sent to a random file as they come)

@param filename filename to split @param nb number of buckets @param outfileprefix output files will start with outfileprefix + “%04d.txt” % i @param encoding encoding @param has_header the header will be replicated in each created file @param logFunction to display information @return list of created files

static ratio(x, y)#: return a ratio between two real values or an empty string if the denominateur is null @return a real of an empty string

reduce_dict(functionKey, functionValue, uselist=False)#: @see me todict

remove_columns(listColumns)#

remove some columns

@param listColumns list of columns to remove or a function: which returns True if the column has to be removed based on its name

@return table

Example:

rem = table.remove("d_a")

rename_column(old_name, new_name)#

rename a column

Paramètres:

old_name – old name
new_name – new name

replicate(times)#: replicates all rows a given number of times :param times: number of multiplication :return: new matrix, keep the header of the first matrix

save(filename, sep='\t', encoding=None, newline='\n')#

saves the tables in a text file, first row is the column names

Paramètres:

filename – filename
sep – column separator
encoding – encoding
newline – line separator

save_as_excel(filename, font='Calibri', sheet_name='sheet0', close=True, encoding=None)#

saves the table as a new Excel file, you can use .xls or .xlsx if filename is None, the function returns an object(xslx) and does not save it.

@param filename Excel filename @param sheet_name name of the sheet to add @param font font name @param close if True, close the file, otherwise, the user will have to @param encoding encoding @return object Workbook

static save_multiple_as_excel(filename, list_table, font='Calibri', close=True, encoding=None)#

saves multiple table in one Excel file

@param filename filename(can be None) @param list_table list of 2uple(« name », tbl) @param font font name @param close if True, close the file, otherwise, the user will have to @param encoding encoding @return object Workbook

schema_database(add_id=True)#

returns the schema for a database which would contains this database

@param add_id if True, adds an index « PRIMARYKEY » @return dictionary { index_column:(name, type) }

select(functionRow)#

@param functionRow fonction @return table

Example:

d = table.select(lambda v:(v["name"], v["d_b"]))
print(list(d))

property shape#: returns the size(nb rows, nb columns)

property size#: returns the size(nb rows, nb columns)

sort(functionValue, reverse=False)#

Example:

table.sort(lambda v: v["d_b"] + v["d_c"])

strtype()#: displays the type of values(not the values)

todict(functionKey, functionValue, useList=False)#

convert the table as a dictionary { key:value } each of them is defined by functions.

@param functionKey defines the key @param functionValue defines the value @param useList if there are multiple rows sharing the same key, it should be true,

all values are stored in a list

@return a dictionary { key:row } or { key: [row1, row2, …] }

Example:

d = table.todict(lambda v: v["name"], lambda v: v["d_b"], True)

transpose(labelC=None, labelAsRow=True)#

Computes the transpose. @param labelC proposes labels for the column,

if None, take « r%d » % i, if it is a string, the function assumes it is a column name

@param labelAsRow add the label as a row @return new table

union(table)#

@param table table @return table(with the same number of columns)

concatenates two tables by rows, they must have the same header, rows of both tables are merged into a single matrix Example:

union = table.union(table2)

union_columns(columnsSet)#: computes the union of all values from all columns present in columnSet @param columnsSet set of columns @return table

values_to_date(format=None, only_if_possible=False, subset_columns=None)#

converts all values into dates @param only_if_possible if True, converts all possible values and catches exception,

if False, converts everything, raises an exception when not possible

@param format date format see fn str_to_datetime @param subset_columns if None, takes all of them, otherwise, try to convert

only for the listed columns

@return table

values_to_float(only_if_possible=False, subset_columns=None)#

converts all values into float @param only_if_possible if True, converts all possible values and catches exception,

if False, converts everything, raises an exception when not possible

@param subset_columns if None, takes all of them, otherwise, try to convert: only for the listed columns

@return table

values_to_str(subset_columns=None, format=None)#

converts all values into str @param subset_columns if None, takes all of them, otherwise, try to convert

only for the listed columns

@param format format for the conversion, by None by default but it could be for exemple %1.2f. @return table

where(condition_function)#: @see me filter

Liens

Contenu

Information

Sujet précédent

Sujet suivant

module `homeblog.table_formula`#

Short summary#

Classes#

Properties#

Static Methods#

Methods#

Documentation#

Liens

Contenu

Information

Sujet précédent

Sujet suivant

module homeblog.table_formula#

Short summary#

Classes#

Properties#

Static Methods#

Methods#

Documentation#

module `homeblog.table_formula`#