2018-08-03 Exploration with pybind11 and ExtensionArrayΒΆ
I tried the version of pybind11
to expose a dummy C++ object
WeightedDouble
,
to implement a couple of operators and to see
how it behaves into a dataframe.
from pandas import DataFrame, Series
from cpyquickhelper.numbers.weighted_number import WeightedDouble
from cpyquickhelper.numbers.weighted_dataframe import WeightedSeries
n1 = WeightedDouble(1, 1)
n2 = WeightedDouble(3, 2)
ser = Series([n1, n2])
df = DataFrame(data=dict(wd=ser, x=[6., 7.]))
df["A"] = df.wd + df.x
# Whole dataframe.
print(df)
# Show only the values for column 'wd'.
print(df.wd.wdouble.value)
# About types
print(df.dtypes)
Output:
wd x A
0 1.000000 (1) 6.0 7.000000 (2)
1 3.000000 (2) 7.0 10.000000 (3)
<WeightedArray>
[1.0, 3.0]
Length: 2, dtype: float64
wd object
x float64
A object
dtype: object
The latest version of pandas
(0.23) introduced ExtensionArray
to define array of custom types and get rid
of the type object
. The current implemented does not
use a true C++ array but a series of
WeightedDouble
underneath but still show what it looks like
despite the potential speed issue.
<<<
from pandas import DataFrame, Series
from cpyquickhelper.numbers.weighted_number import WeightedDouble
from cpyquickhelper.numbers.weighted_dataframe import WeightedArray
n1 = WeightedDouble(1, 1)
n2 = WeightedDouble(3, 2)
ser = WeightedArray([n1, n2])
df = DataFrame(data=dict(wd=ser, x=[6., 7.]))
df["A"] = df.wd + df.x
# Whole dataframe.
print(df)
# Show only the values for column 'wd'.
print(df.wd.wdouble.value)
# About types
print(df.dtypes)
>>>
wd x A
0 1.000000 (1) 6.0 7.000000 (2)
1 3.000000 (2) 7.0 10.000000 (3)
<WeightedArray>
[1.0, 3.0]
Length: 2, dtype: float64
wd object
x float64
A object
dtype: object
The property wdouble
should not be necessary but
the type of a column is still a Series,
the new array is just a container.