module pandashelper.tblformat
¶
Short summary¶
module pyquickhelper.pandashelper.tblformat
To format a pandas dataframe
Functions¶
function |
truncated documentation |
---|---|
Converts the table into a html string. |
|
Splits a dataframe by columns to display shorter dataframes. |
Documentation¶
To format a pandas dataframe
- pyquickhelper.pandashelper.tblformat.df2html(self, class_table=None, class_td=None, class_tr=None, class_th=None)[source]¶
Converts the table into a html string.
- Parameters:
self – dataframe (to be added as a class method)
class_table – adds a class to the tag
table
(None for none)class_td – adds a class to the tag
td
(None for none)class_tr – adds a class to the tag
tr
(None for none)class_th – adds a class to the tag
th
(None for none)
- Returns:
HTML
- pyquickhelper.pandashelper.tblformat.df2rst(df, add_line=True, align='l', column_size=None, index=False, list_table=False, title=None, header=True, sep=',', number_format=None, replacements=None, split_row=None, split_row_level='+', split_col_common=None, split_col_subsets=None, filter_rows=None, label_pattern=None)[source]¶
Builds a string in RST format from a dataframe.
- Parameters:
df – dataframe
add_line – (bool) add a line separator between each row
align –
r
orl
orc
column_size – something like
[1, 2, 5]
to multiply the column size, a dictionary (if list_table is False) to overwrite a column size like{'col_name1': 20}
or{3: 20}
index – add the index
list_table – use the list_table
title – used only if list_table is True
header – add one header
sep – separator if df is a string and is a filename to load
number_format – formats number in a specific way, if number_format is an integer, the pattern is replaced by
{numpy.float64: '{:.2g}'}
(if number_format is 2), see also pyformat.info>`__replacements – replacements just before converting into RST (dictionary)
split_row – displays several table, one column is used as the name of each section
split_row_level – title level if option split_row is used
split_col_common – splits the dataframe by columns, see
enumerate_split_df
split_col_subsets – splits the dataframe by columns, see
enumerate_split_df
filter_rows – None or function to removes rows, signature
def filter_rows(df: DataFrame) -> DataFrame
label_pattern – if split_row is used, the function may insert a label in front of every section, example:
".. _lpy-{section}:"
- Returns:
string
If list_table is False, the format is the following.
None values are replaced by empty string (4 spaces). It produces the following results:
+------------------------+------------+----------+----------+ | Header row, column 1 | Header 2 | Header 3 | Header 4 | | (header rows optional) | | | | +========================+============+==========+==========+ | body row 1, column 1 | column 2 | column 3 | column 4 | +------------------------+------------+----------+----------+ | body row 2 | ... | ... | | +------------------------+------------+----------+----------+
If list_table is True, the format is the following:
.. list-table:: title :widths: 15 10 30 :header-rows: 1 * - Treat - Quantity - Description * - Albatross - 2.99 - anythings ...
Convert a dataframe into RST
<<<
from pandas import DataFrame from pyquickhelper.pandashelper import df2rst df = DataFrame([{'A': 0, 'B': 'text'}, {'A': 1e-5, 'C': 'longer text'}]) print(df2rst(df))
>>>
+-------+------+-------------+ | A | B | C | +=======+======+=============+ | 0.0 | text | | +-------+------+-------------+ | 1e-05 | | longer text | +-------+------+-------------+
Convert a dataframe into markdown
<<<
from io import StringIO from textwrap import dedent import pandas from_excel = dedent(''' Op;axes;shape;SpeedUp ReduceMax;(3,);(8, 24, 48, 8);2.96 ReduceMax;(3,);(8, 24, 48, 16);2.57 ReduceMax;(3,);(8, 24, 48, 32);2.95 ReduceMax;(3,);(8, 24, 48, 64);3.28 ReduceMax;(3,);(8, 24, 48, 100);3.05 ReduceMax;(3,);(8, 24, 48, 128);3.11 ReduceMax;(3,);(8, 24, 48, 200);2.86 ReduceMax;(3,);(8, 24, 48, 256);2.50 ReduceMax;(3,);(8, 24, 48, 400);2.48 ReduceMax;(3,);(8, 24, 48, 512);2.90 ReduceMax;(3,);(8, 24, 48, 1024);2.76 ReduceMax;(0,);(8, 24, 48, 8);19.29 ReduceMax;(0,);(8, 24, 48, 16);11.83 ReduceMax;(0,);(8, 24, 48, 32);5.69 ReduceMax;(0,);(8, 24, 48, 64);5.49 ReduceMax;(0,);(8, 24, 48, 100);6.13 ReduceMax;(0,);(8, 24, 48, 128);6.27 ReduceMax;(0,);(8, 24, 48, 200);5.46 ReduceMax;(0,);(8, 24, 48, 256);4.76 ReduceMax;(0,);(8, 24, 48, 400);2.21 ReduceMax;(0,);(8, 24, 48, 512);4.52 ReduceMax;(0,);(8, 24, 48, 1024);4.38 ReduceSum;(3,);(8, 24, 48, 8);1.79 ReduceSum;(3,);(8, 24, 48, 16);0.79 ReduceSum;(3,);(8, 24, 48, 32);1.67 ReduceSum;(3,);(8, 24, 48, 64);1.19 ReduceSum;(3,);(8, 24, 48, 100);2.08 ReduceSum;(3,);(8, 24, 48, 128);2.96 ReduceSum;(3,);(8, 24, 48, 200);1.66 ReduceSum;(3,);(8, 24, 48, 256);2.26 ReduceSum;(3,);(8, 24, 48, 400);1.76 ReduceSum;(3,);(8, 24, 48, 512);2.61 ReduceSum;(3,);(8, 24, 48, 1024);2.21 ReduceSum;(0,);(8, 24, 48, 8);2.56 ReduceSum;(0,);(8, 24, 48, 16);2.05 ReduceSum;(0,);(8, 24, 48, 32);3.04 ReduceSum;(0,);(8, 24, 48, 64);2.57 ReduceSum;(0,);(8, 24, 48, 100);2.41 ReduceSum;(0,);(8, 24, 48, 128);2.77 ReduceSum;(0,);(8, 24, 48, 200);2.02 ReduceSum;(0,);(8, 24, 48, 256);1.61 ReduceSum;(0,);(8, 24, 48, 400);1.59 ReduceSum;(0,);(8, 24, 48, 512);1.48 ReduceSum;(0,);(8, 24, 48, 1024);1.50 ''') df = pandas.read_csv(StringIO(from_excel), sep=";") print(df.columns) sub = df[["Op", "axes", "shape", "SpeedUp"]] piv = df.pivot_table(values="SpeedUp", index=['axes', "shape"], columns="Op") piv = piv.reset_index(drop=False) print(piv.to_markdown(index=False))
>>>
Index(['Op', 'axes', 'shape', 'SpeedUp'], dtype='object') | axes | shape | ReduceMax | ReduceSum | |:-------|:------------------|------------:|------------:| | (0,) | (8, 24, 48, 100) | 6.13 | 2.41 | | (0,) | (8, 24, 48, 1024) | 4.38 | 1.5 | | (0,) | (8, 24, 48, 128) | 6.27 | 2.77 | | (0,) | (8, 24, 48, 16) | 11.83 | 2.05 | | (0,) | (8, 24, 48, 200) | 5.46 | 2.02 | | (0,) | (8, 24, 48, 256) | 4.76 | 1.61 | | (0,) | (8, 24, 48, 32) | 5.69 | 3.04 | | (0,) | (8, 24, 48, 400) | 2.21 | 1.59 | | (0,) | (8, 24, 48, 512) | 4.52 | 1.48 | | (0,) | (8, 24, 48, 64) | 5.49 | 2.57 | | (0,) | (8, 24, 48, 8) | 19.29 | 2.56 | | (3,) | (8, 24, 48, 100) | 3.05 | 2.08 | | (3,) | (8, 24, 48, 1024) | 2.76 | 2.21 | | (3,) | (8, 24, 48, 128) | 3.11 | 2.96 | | (3,) | (8, 24, 48, 16) | 2.57 | 0.79 | | (3,) | (8, 24, 48, 200) | 2.86 | 1.66 | | (3,) | (8, 24, 48, 256) | 2.5 | 2.26 | | (3,) | (8, 24, 48, 32) | 2.95 | 1.67 | | (3,) | (8, 24, 48, 400) | 2.48 | 1.76 | | (3,) | (8, 24, 48, 512) | 2.9 | 2.61 | | (3,) | (8, 24, 48, 64) | 3.28 | 1.19 | | (3,) | (8, 24, 48, 8) | 2.96 | 1.79 |
Nan value are replaced by empty string even if number_format is not None.
- pyquickhelper.pandashelper.tblformat.enumerate_split_df(df, common, subsets)[source]¶
Splits a dataframe by columns to display shorter dataframes.
- Parameters:
df – dataframe
common – common columns
subsets – subsets of columns
- Returns:
split dataframes
<<<
from pandas import DataFrame from pyquickhelper.pandashelper.tblformat import enumerate_split_df df = DataFrame([{'A': 0, 'B': 'text'}, {'A': 1e-5, 'C': 'longer text'}]) res = list(enumerate_split_df(df, ['A'], [['B'], ['C']])) print(res[0]) print('-----') print(res[1])
>>>
A B 0 0.00000 text 1 0.00001 NaN ----- A C 0 0.00000 NaN 1 0.00001 longer text