module homeblog.table_formula
#
Short summary#
module ensae_teaching_cs.homeblog.table_formula
Implements TableFormula.
Classes#
class |
truncated documentation |
---|---|
This class aims at representating a table, it provides some « SQL like » functionalities such groupby or innerjoin, select, … |
Properties#
property |
truncated documentation |
---|---|
creates a pandas dataframe |
|
returns a json format |
|
returns the values as a numpy array |
|
returns the values as a numpy matrix |
|
returns the size(nb rows, nb columns) |
|
returns the size(nb rows, nb columns) |
Static Methods#
staticmethod |
truncated documentation |
---|---|
the function checks if the first line contains the column in header otherwise, it modifies the file and add them … |
|
return a confidence interval for a statistics |
|
assume values is a matrix with two columns |
|
split a file in nb buckets by random(lines are sent to a random file as they come) |
|
return a ratio between two real values or an empty string if the denominateur is null |
|
saves multiple table in one Excel file |
Methods#
method |
truncated documentation |
---|---|
do an addition, add values if types are matching |
|
operator copy |
|
operator |
|
check if two tables are equal by value |
|
operator [], accepts slices |
|
Converts the table into a html string. |
|
It can either take a filename, an object TableFormula, a list of columns and values. |
|
iterator on all rows, it returns a dictionary { column:value } |
|
returns the number of rows |
|
do a multiplication(by a number) |
|
convert the table into rst format |
|
operator [], just accepts tuple(to change a value) |
|
convert the table into a string |
|
private set up the column type based on the column name |
|
fill the table |
|
private returns each row as a dictionary { column_name:value } |
|
the class often creates another class of the same type, this function returns the class object |
|
private |
|
private |
|
Adds a column. |
|
also called the Gini function Example: |
|
Example: |
|
Example: |
|
Example: |
|
Example: |
|
add a column defined by vector(list of values for each row) |
|
@see me add_column |
|
Example: |
|
Example: |
|
returns the average and standard deviation |
|
center and reduce a set of columns(or all if columnsSet is None) |
|
change the column names |
|
concatenates two tables by columns |
|
call |
|
Computes the correlation matrix, the first column will contains the column names. |
|
Computes the correlation between two columns. |
|
computes the correlation between two columns |
|
Computes the covariance matrix, the first column will contains the column names. |
|
Computes the correlation between two columns. |
|
computes the correlation between two columns |
|
this method creates an index, to get an indexes row, use method get Example: |
|
@see me multiply_column_by_row_instance |
|
returns a list of differences between self and others |
|
extract some columns |
|
returns a Database object, creates the database if it does not exists, same for the table |
|
Example: |
|
sort all rows using criteria defined by function and remove rows at the extremes |
|
use the index created by method create_index Example: |
|
private returns all values for one column |
|
private |
|
Example: |
|
use prefix of a column name to know which function to use as an aggregated(sum, avg, len, key, none, max, min) … |
|
computes an histograms on one vector |
|
computes a common histograms on all columns |
|
apply the same modification to every number |
|
computes the average and the standard deviation a vector of values |
|
returns a table with the average and the standard deviation for each columns |
|
do a multiplication term by term(similar to an addition), add values if types are matching |
|
select n random row from the table, returns a table |
|
@see me todict |
|
remove some columns |
|
rename a column |
|
replicates all rows a given number of times |
|
saves the tables in a text file, first row is the column names |
|
saves the table as a new Excel file, you can use |
|
returns the schema for a database which would contains this database |
|
Example: |
|
displays the type of values(not the values) |
|
convert the table as a dictionary { key:value } each of them is defined by functions. |
|
Computes the transpose. |
|
computes the union of all values from all columns present in columnSet |
|
converts all values into dates |
|
converts all values into float |
|
converts all values into str |
|
@see me filter |
Documentation#
@file @brief Implements TableFormula.
- class ensae_teaching_cs.homeblog.table_formula.TableFormula(file, numeric_column=None, sep='\t', encoding=None, read_n_lines=-1, sheet=0, **options)#
Bases :
_TableFormulaStat
This class aims at representating a table, it provides some « SQL like » functionalities such groupby or innerjoin, select, where… This was a custom implementation of a DataFrame before I discover pandas.
The class provides an easy to go through the row table by converting each row in a dictionary
{ column_name: value }
on the run. Example:tbl = TableFormula(...) newtbl = tbl.filter(lambda v: v["criteria"] == 5)
See @see op __init__ for others ways to create a table.
@var header list of column names @var values list of rows(each row contains as many value as the number of columns) @var index dictionary { column name: position }, changing
header
means also changingheader
.Example:
table = TableFormula("name d_a d_b d_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n")) print(table) print("one value 0,1:", table[0,1]) print("---------") dist = table.get_distinct_values("name") for k in sorted(dist): print("*%d: %s"%(int(dist[k]),k)) print("---------") table.add_column("has_A", lambda v: 1. if "A" in v["name"] else 0.) print(table) x = 1./3 print("------------- smoothing", x) table.add_column_smooth("has_A_smooth", lambda v: v["has_A"], [-1,0,1], [x,x,x]) print(table) print("--------- filter") fil = table.filter(lambda v: v["d_b"] == 2) print(fil) print("--------- random") rnd = table.random(5) print(rnd) print("--------- random unique") rnd = table.random(1, True) print(rnd) print("--------- filter quantile") fil = table.filter_quantile(lambda v: v["d_b"], 0, 0.4) print(fil) print("--------- aggregate_column") total = table.aggregate(lambda v: v["d_c"]) print(total) print("--------- sort") table.sort(lambda v: v["d_b"] + v["d_c"]) print(table) print("--------- union") union = table.union(table) print(union) print("--------- group sum") group = table.groupby(lambda v: v["name"], [lambda v: v["d_a"], lambda v: v["d_b"]], ["name", "sum_d_a", "sum_d_b"]) print(group) print("--------- group max") groupmax = table.groupby(lambda v: v["name"], [lambda v: v["d_a"], lambda v: v["d_b"]], ["name", "max_d_a", "max_d_b"], [max, max]) print(groupmax) print("--------- group sum with weights") group = table.groupby(lambda v: v["name"], [lambda v: v["d_a"]], ["name", "weight", "sum_d_a"], [lambda vec,w: sum(vec) / w], lambda v: v ["d_b"]) print("--------- innerjoin") innerjoin = table.innerjoin(group, lambda v: v["name"], lambda v: v["name"], "group") print(innerjoin) print("------------- extraction") ext = table.extract_columns(["name", "d_a"]) print(ext) print("------------- remove") ext = table.remove_columns(["d_a"]) print(ext) print("------------- todict") d = table.todict(lambda v: v["name"], lambda v: v["d_b"], True) print(d) print("------------- select") d = table.select(lambda v:(v["name"], v["d_b"])) print(list(d)) print("------------- use of an index") table.create_index(lambda v:(v["name"], v["d_a"])) row = table.get(('A', 1.1)) print(row) value = table.get(('A', 1.1), 2) print(value) print("------------- multiply_column_by_row_instance ") table = TableFormula("name d_a d_b d_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n")) table.add_column("key_add", lambda v:"unique") print(table) mul = table.multiply_column_by_row_instance( lambda v: v["key_add"], lambda v: v["name"]) print(mul) if os.path.exists("BNP.PA.txt"): print("--------------- financial stock") table = TableFormula("BNP.PA.txt", sep=",") table.sort(lambda v: v["Date"]) print(table[:10]) print("--------------- groupby_implicit") table = TableFormula("key_name sum_a len_b avg_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n")) print(table) gr = table.groupby_implicit(lambda v: v ["key_name"]) print(gr) print("--------------- covariance") values = [random.random() for i in range(0,100)] values = [[x, x + random.random()/2] for x in values] tbl = TableFormula(["x", "y"], values).values_to_float() cov = tbl.covariance() print(cov) print("--------------- histogram") hist = tbl.histogram(lambda v:(v["x"],1), 10) print(hist) print("--------------- histogram") hist = tbl.values_to_float().histograms(["x", "y"], 10) print(hist) print("--------------- unions of columns") hist = tbl.values_to_float().union_columns(["x", "y"]) print(hist)
It can either take a filename, an object TableFormula, a list of columns and values.
- Paramètres:
file – filename or a list of column names or a dictionary, file can also be a pandas DataFrame.
numeric_column – depends on file types(see below examples)
sep – column separator if file is a filename
read_n_lines – read the first n lines(or all if it is -1)
sheet – in case the file is an Excel file, this parameter precises the sheet number or name
suffix_nb – if True, adds an integer to the column name if it is a duplicate
Example:
table = TableFormula("name d_a d_b d_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n"))
or
table = TableFormula("file.txt", ["nb"])
or
table = TableFormula(["date", "Y", "Y2", "xl"], values)
or
data = [{ "one":1, "two":2 }, {"two":2.1, "three":3 }] table = TableFormula(data)
or
data = { 1:{ "one":2.3, "two":2.2 }, 2:{"one":2.1, "two":3 } table = TableFormula("__byrow__", data)
or
table = TableFormula(numpy.matrix(…))
or
table = TableFormula(numpy.array(...))
@warning In this second case, rows and header are not copied.
- __add__(other)#
do an addition, add values if types are matching :param other: matrix or float or string :return: new matrix, keep the header of the first matrix
- __copy__()#
operator copy
- __deepcopy__(memo)#
operator
deepcopy
- __eq__(other)#
check if two tables are equal by value :param other: other table :return: boolean
- __getitem__(irow)#
operator [], accepts slices :param irow: integer, tuple, slice or list :return: depends on irow
int –> a table with one row
slice –> a table with several rows
list –> a table with the selected rows
tuple –> a value
- __hash__ = None#
- __html__(class_table=None, class_td=None, class_tr=None, class_th=None)#
Converts the table into a html string.
- Paramètres:
class_table – adds a class to the tag
table
(None for none)class_td – adds a class to the tag
td
(None for none)class_tr – adds a class to the tag
tr
(None for none)class_th – adds a class to the tag
th
(None for none)
- __init__(file, numeric_column=None, sep='\t', encoding=None, read_n_lines=-1, sheet=0, **options)#
It can either take a filename, an object TableFormula, a list of columns and values.
- Paramètres:
file –
filename or a list of column names or a dictionary, file can also be a pandas DataFrame.
numeric_column – depends on file types(see below examples)
sep – column separator if file is a filename
read_n_lines – read the first n lines(or all if it is -1)
sheet – in case the file is an Excel file, this parameter precises the sheet number or name
suffix_nb – if True, adds an integer to the column name if it is a duplicate
Example:
table = TableFormula("name d_a d_b d_c#A 1 2 3#A 1.1 2.1 3.1#B 3 4 5".replace(" ", "\t").replace("#","\n"))
or
table = TableFormula("file.txt", ["nb"])
or
table = TableFormula(["date", "Y", "Y2", "xl"], values)
or
data = [{ "one":1, "two":2 }, {"two":2.1, "three":3 }] table = TableFormula(data)
or
data = { 1:{ "one":2.3, "two":2.2 }, 2:{"one":2.1, "two":3 } table = TableFormula("__byrow__", data)
or
table = TableFormula(numpy.matrix(…))
or
table = TableFormula(numpy.array(...))
@warning In this second case, rows and header are not copied.
- __iter__()#
iterator on all rows, it returns a dictionary { column:value } @return dictionary
- __len__()#
returns the number of rows
- __mul__(other)#
do a multiplication(by a number) :param other: matrix or float or string :return: new matrix, keep the header of the first matrix
- __rst__(add_line=True)#
convert the table into rst format
+------------------------+------------+----------+----------+ | Header row, column 1 | Header 2 | Header 3 | Header 4 | | (header rows optional) | | | | +========================+============+==========+==========+ | body row 1, column 1 | column 2 | column 3 | column 4 | +------------------------+------------+----------+----------+ | body row 2 | ... | ... | | +------------------------+------------+----------+----------+
- Paramètres:
add_line – add a line separator between each row
- __setitem__(irow, value)#
operator [], just accepts tuple(to change a value) :param irow: 2-uple :param value: new value
- __str__()#
convert the table into a string :return: string
- _auto_conversion(others_columns)#
private set up the column type based on the column name
- _fill_by_row(values)#
fill the table :param values: dictionary { <int_row_index>: { <column name>: value} }
- _interpret_row(row)#
private returns each row as a dictionary { column_name:value }
- _private_getclass()#
the class often creates another class of the same type, this function returns the class object
- _read_file(file, numeric_column, sep, encoding, read_n_lines, sheet=0)#
private
- _readlines(lines, numeric_column, sep)#
private
- add_column(colname, function, position=-1)#
Adds a column. :param colname: column name or columns name if it is a list or a tuple :param function: function which will gives the values(or a list of functions, or a function which return a tuple) :param position: where to insert the column, -1 for the end
Example:
table.add_column("has_A", lambda v: 1 if "A" in v["name"] else 0, 0) table.add_column(("has_A", "has_B"),(lambda v: 1 if "A" in v["name"] else 0, lambda v: 1 if "B" in v["name"] else 0)) table.add_column(("has_A", "has_B"),(lambda v:(1 if "A" in v["name"] else 0, 1 if "B" in v["name"] else 0))
- add_column_cumulative(column_index, column_name, functionIndex, functionValue, normalize=False, reverse=False, cumulative=True, functionSort=None)#
also called the Gini function Example:
table.add_column_cumulative("index_%s" % col, "dist_%s" % col, lambda v: v["sum_nbclient"], lambda v: v[col], functionSort = lambda v: v [col] / v["sum_nbclient"], normalize=True)
- add_column_index(colname='index', start=0)#
Example:
table.add_column("index_row")
- add_column_recursive(colname, functionValue, functionAgg)#
Example:
table.add_column_recursive(lambda v: v ["norm_%s" % loi], lambda li, v: li[-1] + v)
- add_column_recursive_row(colname, functionAgg)#
Example:
table.add_column_recursive_row("w_%s" % loi, lambda li, v: li[-1] + v ["norm_%s" % loi] if len(li)> 0 else v ["norm_%s" % loi])
- add_column_smooth(colname, function, position, weights)#
Example:
x = 1./3 table.add_column_smooth("has_A_smooth", lambda v: v["has_A"], [-1,0,1], [x,x,x])
- add_column_vector(colname, vector)#
add a column defined by vector(list of values for each row)
- Paramètres:
colname – column to add
vector – (list) list of values to add to each row
:return self
- static add_header_if_not_present(filename, header, encoding=None, logFunction=<function noLOG>)#
the function checks if the first line contains the column in header otherwise, it modifies the file and add them on the first line
@param filename filename @param header list of column name(all strings) @param encoding encoding @param logFunction use this function to log information about what is happening
- addc(colname, function, position=-1)#
@see me add_column
- aggregate(function, aggregated_function=<built-in function sum>)#
Example:
total = table.aggregate_column(lambda v: v["d_c"], len)
- aggregate_column(colname, aggregated_function=<built-in function sum>)#
Example:
total = table.aggregate_column("d_c", sum)
- avg_std(functionValue, functionWeight=<function TableFormula.<lambda>>)#
returns the average and standard deviation
- static bootstrap(values, function, nbdraws=-1, alpha=0.05)#
return a confidence interval for a statistics @param values values @param function produces the statistics over a random set of observations chosen in values @param nbdraws number of draws, if it is equal to -1, is equal to len(values) @param alpha confidence level @return average, min, lower bound, higher bound, max
- center_reduce(columnsSet=None, op=None, removeExtreme=None, mu_sigma=None)#
center and reduce a set of columns(or all if columnsSet is None) @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @param columnsSet set of column to deal with @param op if can be:
None: substract mean and normalize,
« mean »: substract mean only,
« norm »: normalize only
- @param mu_sigma matrix with two rows(one for mean, second for sigma), if None,
if computes that from the matrix self, columns must have the same order that columnSet
@return the same table(with only the considered columns)
- change_header(new_header)#
change the column names
- Paramètres:
new_header – a list or a function which modifies the header
Example:
tbl.change_header(lambda h: h if h != "column" else "new_name")
Avertissement
Do not do that yourself, the class holds a dictionary up to date with the column index.
- concatenate(table, addPrefix='')#
concatenates two tables by columns @param table table @param addPrefix add a prefix to each column from table @return table (with the same number of rows as the longest one)
- copy()#
call
copy.deepcopy(self)
- correlation(useBootstrap=False, collapseFormat=True, nbdraws=-1, alpha=0.05, functionKeepValue=<function TableFormula.<lambda>>)#
Computes the correlation matrix, the first column will contains the column names.
@param useBootstrap if True, use a bootstrap method to estimate the correlation @param collapseFormat if True and useBootstrap is True, produces a format
average|lower bound|higher bound(at a definite confidence level)
@param nbdraws number of draws(if -1, then it will be equal to the number of observations) @param alpha confidence level @param functionKeepValue if collapseFormat is True, this function is used to collapse val,low,high in a single string @return new table
- static correlation_bicolumn(values, deviations=False, noCenter=False)#
assume values is a matrix with two columns @param values 2 column matrix @param deviations if True, returns cor, sigma1, sigma2 @param noCenter if True, do not remove the average before computing the covariance,
it means we assume variables are already centered
@return correlation factor or correlation, sigma1, sigma2 if deviations is True
- correlation_col(col1, col2, noCenter=False)#
Computes the correlation between two columns. @param col1 column 1 @param col2 column 2 @param noCenter does the computation without removing the average @return float(covariance)
- correlation_row(row1, row2, noCenter=False)#
computes the correlation between two columns @param row1 row 1(integer) @param row2 row 2(integer) @param noCenter does the computation without removing the average @return float(covariance)
- covariance()#
Computes the covariance matrix, the first column will contains the column names. @return new table
- covariance_col(col1, col2, noCenter=False)#
Computes the correlation between two columns. @param col1 column 1 @param col2 column 2 @param noCenter does the computation without removing the average @return float(covariance)
- covariance_row(row1, row2, noCenter=False)#
computes the correlation between two columns @param row1 row 1(integer) @param row2 row 2(integer) @param noCenter does the computation without removing the average @return float(covariance)
- create_index(functionIndex)#
this method creates an index, to get an indexes row, use method get Example:
table.create_index(lambda v:(v["name"], v["d_a"])) row = table.get(('A', 1.1)) value = table.get(('A', 1.1), 2)
- property dataframe#
creates a pandas dataframe @return pandas.dataframe
- dcast(functionKey, functionInstance, full=True)#
@see me multiply_column_by_row_instance
- delta(other)#
returns a list of differences between self and others
- Paramètres:
other – TableFormula
- Renvoie:
list of differences(first one)
- extract_columns(listColumns)#
extract some columns
- @param listColumns list of columns to remove or a function
which returns True if the column has to be extracted based on its name
@return table
Example:
ext = table.extract_columns(["name", "d_a"])
- fill_sql_table(filename_or_database, tablename, add_id='idr')#
returns a Database object, creates the database if it does not exists, same for the table
- @param filename_or_database filename or Database object,
in that second case, we assume method connect was called before
@param tablename table name @param add_id if is not None, then the function adds an id, it first takes the
max(id) and goes on incrementing it;
- @return Database object(new or the one from the parameters),
in both case, the database is not disconnected
- filter(condition_function)#
Example:
fil = table.filter(lambda v: v["d_b"] == 2)
@warning Rows are not copied.
- filter_quantile(function, alpha_min=0.025, alpha_max=0.025)#
sort all rows using criteria defined by function and remove rows at the extremes
@param function values used to estimate the quantiles @param alpha_min lower quantile @param alpha_max higher quantile @return a table containing all the rows where the criterium
is within the two quantiles
Example:
fil = table.filter_quantile(lambda v: v["d_b"], 0, 0.4)
@warning Rows are not copied.
- get(rowIndex, column=None)#
use the index created by method create_index Example:
table.create_index(lambda v:(v["name"], v["d_a"])) row = table.get(('A', 1.1)) value = table.get(('A', 1.1), 2)
- get_column_values(col)#
private returns all values for one column
- get_distinct_values(col)#
private
- groupby(functionKey, functionsValue, columns=None, functionsAgg=None, functionWeight=None)#
Example:
group = table.groupby(lambda v: v["name"], [lambda v: v["d_a"], lambda v: v["d_b"]], ["name", "sum_d_a", "sum_d_b"])
or
groupmax = table.groupby(lambda v: v["name"], [lambda v: v["d_a"], lambda v: v["d_b"]], ["name", "max_d_a", "max_d_b"], [max, max])
- groupby_implicit(functionKey, functionWeight=None, logging=None)#
use prefix of a column name to know which function to use as an aggregated(sum, avg, len, key, none, max, min) Example:
group = table.groupby_implicit(lambda v: v["name"])
- histogram(functionValue, nbDiv=100, secondColumnIsWeight=False, normalize=True, removeExtreme=0.05)#
computes an histograms on one vector @param functionValue function which produces the value to histogram @param nbDiv number of divisions for this histograms(boundaries are min and max) @param secondColumnIsWeight if True, the second column is the weight @param normalize if True, normalize by the sum of weights @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @return table with two columns
- histograms(columnsSet, nbDiv=100, secondColumnIsWeight=False, normalize=True, removeExtreme=0.05, histxName='histKey')#
computes a common histograms on all columns @param columnsSet set of columns @param nbDiv number of divisions for this histograms(boundaries are min and max) @param secondColumnIsWeight if True, the second column is the weight @param normalize if True, normalize by the sum of weights @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @param histxName column name given to the x axis shared by every histogram @return table with two columns
@warning The function skips any NaN of Inf value.
- innerjoin(table, functionKey1, functionKey2, nameKey='key', addSuffixAnyWay=False, prefixToAdd=None, full=False, keepKey=True, putKeyInColumn=None, missingValue=None, uniqueKey=False)#
@param table other table to join with @param functionKey1 key for the first table(a function) @param functionKey2 key for the second table(a function) innerjoin …. ON … @param addSuffixAnyWay add a suffix to every column from the second table even
if names are different(suffix is « + »)
@param prefixToAdd prefix to add the the columns of the second table @param full add all items even if there is no common keys(
FULL OUTER JOIN
),otherwise keep only common keys
@param keepKey keep the key as a column in the result(column is key), otherwise not @param putKeyInColumn private parameter: keepKey has to be true and in this case,
put the key in this column(integer)
@param missingValue when there is not key on one side, this default value will be put in place @param uniqueKey if True, the function assumes there is a bijection between rows
and keys(one row <–> one key) on both tables, otherwise, it will not.
@return a table
Example:
innerjoin = table.innerjoin(group, lambda v: v["name"], lambda v: v["name"], "group")
- property json#
returns a json format @return string
- modify_all(modification_function)#
apply the same modification to every number @param modification_function modification to apply to every number @return new table
The signature of the function is the following one:
def function(value, column_name): # .... return new_value
Example:
tbl = tbl.modify_all(lambda v,c: {"string":"", "numerical":0}.get(c,None) if v is None else v)
- mu_sigma(functionValues, removeExtreme=None)#
computes the average and the standard deviation a vector of values @param functionValues function produces the vector of values @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @return (average, standard deviation)
- mu_sigma_each_column(columnsSet=None, removeExtreme=None)#
returns a table with the average and the standard deviation for each columns @param removeExtreme remove extreme values at both sides(0.05 means 0.025 on each side) @param columnsSet set of column to deal with @return table with two rows: average and standard deviation
- multiplication_term_term(other)#
do a multiplication term by term(similar to an addition), add values if types are matching
- Paramètres:
other – matrix or float or string
- Renvoie:
new matrix, keep the header of the first matrix
- multiply_column_by_row_instance(functionKey, functionInstance, full=True)#
@param functionKey defines a key(function) @param functionInstance defines a second key(will be moved to the columns dimension) @param full introduces missing values for not found combinations @return a table
If a column contains a finite set of value, for example, we have the temperature for several cities organized like if it were a table from a database: city, date, temperatue. We would like to get another table where we have: date temparature_city1 temperature_city2…
Then we would type: Example:
mul = table.multiply_column_by_row_instance( lambda v: v["date"], lambda v: v["city"])
The input table would be like:
city date A jan A feb B feb
It returns:
KEY A|city A|date B|city B|date feb A feb B feb jan A jan None None
- property np_array#
returns the values as a numpy array @return numpy array
- property np_matrix#
returns the values as a numpy matrix @return numpy matrix
- random(n, unique=False)#
select n random row from the table, returns a table
@param n number of desired random rows @param unique draws unique rows or non unique rows
(tirage sans remise ou avec remise)
@return a table
Example:
rnd = table.random(10)
- static random_split_file(filename, outfileprefix, nb, has_header=True, encoding=None, logFunction=<function noLOG>)#
split a file in nb buckets by random(lines are sent to a random file as they come)
@param filename filename to split @param nb number of buckets @param outfileprefix output files will start with outfileprefix + “%04d.txt” % i @param encoding encoding @param has_header the header will be replicated in each created file @param logFunction to display information @return list of created files
- static ratio(x, y)#
return a ratio between two real values or an empty string if the denominateur is null @return a real of an empty string
- reduce_dict(functionKey, functionValue, uselist=False)#
@see me todict
- remove_columns(listColumns)#
remove some columns
- @param listColumns list of columns to remove or a function
which returns True if the column has to be removed based on its name
@return table
Example:
rem = table.remove("d_a")
- rename_column(old_name, new_name)#
rename a column
- Paramètres:
old_name – old name
new_name – new name
- replicate(times)#
replicates all rows a given number of times :param times: number of multiplication :return: new matrix, keep the header of the first matrix
- save(filename, sep='\t', encoding=None, newline='\n')#
saves the tables in a text file, first row is the column names
- Paramètres:
filename – filename
sep – column separator
encoding – encoding
newline – line separator
- save_as_excel(filename, font='Calibri', sheet_name='sheet0', close=True, encoding=None)#
saves the table as a new Excel file, you can use
.xls
or.xlsx
if filename is None, the function returns an object(xslx) and does not save it.@param filename Excel filename @param sheet_name name of the sheet to add @param font font name @param close if True, close the file, otherwise, the user will have to @param encoding encoding @return object Workbook
- static save_multiple_as_excel(filename, list_table, font='Calibri', close=True, encoding=None)#
saves multiple table in one Excel file
@param filename filename(can be None) @param list_table list of 2uple(« name », tbl) @param font font name @param close if True, close the file, otherwise, the user will have to @param encoding encoding @return object Workbook
- schema_database(add_id=True)#
returns the schema for a database which would contains this database
@param add_id if True, adds an index « PRIMARYKEY » @return dictionary { index_column:(name, type) }
- select(functionRow)#
@param functionRow fonction @return table
Example:
d = table.select(lambda v:(v["name"], v["d_b"])) print(list(d))
- property shape#
returns the size(nb rows, nb columns)
- property size#
returns the size(nb rows, nb columns)
- sort(functionValue, reverse=False)#
Example:
table.sort(lambda v: v["d_b"] + v["d_c"])
- strtype()#
displays the type of values(not the values)
- todict(functionKey, functionValue, useList=False)#
convert the table as a dictionary { key:value } each of them is defined by functions.
@param functionKey defines the key @param functionValue defines the value @param useList if there are multiple rows sharing the same key, it should be true,
all values are stored in a list
@return a dictionary { key:row } or { key: [row1, row2, …] }
Example:
d = table.todict(lambda v: v["name"], lambda v: v["d_b"], True)
- transpose(labelC=None, labelAsRow=True)#
Computes the transpose. @param labelC proposes labels for the column,
if None, take « r%d » % i, if it is a string, the function assumes it is a column name
@param labelAsRow add the label as a row @return new table
- union(table)#
@param table table @return table(with the same number of columns)
concatenates two tables by rows, they must have the same header, rows of both tables are merged into a single matrix Example:
union = table.union(table2)
- union_columns(columnsSet)#
computes the union of all values from all columns present in columnSet @param columnsSet set of columns @return table
- values_to_date(format=None, only_if_possible=False, subset_columns=None)#
converts all values into dates @param only_if_possible if True, converts all possible values and catches exception,
if False, converts everything, raises an exception when not possible
@param format date format see fn str_to_datetime @param subset_columns if None, takes all of them, otherwise, try to convert
only for the listed columns
@return table
- values_to_float(only_if_possible=False, subset_columns=None)#
converts all values into float @param only_if_possible if True, converts all possible values and catches exception,
if False, converts everything, raises an exception when not possible
- @param subset_columns if None, takes all of them, otherwise, try to convert
only for the listed columns
@return table
- values_to_str(subset_columns=None, format=None)#
converts all values into str @param subset_columns if None, takes all of them, otherwise, try to convert
only for the listed columns
@param format format for the conversion, by None by default but it could be for exemple %1.2f. @return table
- where(condition_function)#
@see me filter