2018-08-03 Exploration with pybind11 and ExtensionArrayΒΆ

I tried the version of pybind11 to expose a dummy C++ object WeightedDouble, to implement a couple of operators and to see how it behaves into a dataframe.

from pandas import DataFrame, Series
from cpyquickhelper.numbers.weighted_number import WeightedDouble
from cpyquickhelper.numbers.weighted_dataframe import WeightedSeries

n1 = WeightedDouble(1, 1)
n2 = WeightedDouble(3, 2)
ser = Series([n1, n2])
df = DataFrame(data=dict(wd=ser, x=[6., 7.]))
df["A"] = df.wd + df.x
# Whole dataframe.
print(df)
# Show only the values for column 'wd'.
print(df.wd.wdouble.value)
# About types
print(df.dtypes)

Output:

            wd    x              A
0  1.000000 (1)  6.0   7.000000 (2)
1  3.000000 (2)  7.0  10.000000 (3)
<WeightedArray>
[1.0, 3.0]
Length: 2, dtype: float64
wd     object
x     float64
A      object
dtype: object

The latest version of pandas (0.23) introduced ExtensionArray to define array of custom types and get rid of the type object. The current implemented does not use a true C++ array but a series of WeightedDouble underneath but still show what it looks like despite the potential speed issue.

<<<

from pandas import DataFrame, Series
from cpyquickhelper.numbers.weighted_number import WeightedDouble
from cpyquickhelper.numbers.weighted_dataframe import WeightedArray

n1 = WeightedDouble(1, 1)
n2 = WeightedDouble(3, 2)
ser = WeightedArray([n1, n2])
df = DataFrame(data=dict(wd=ser, x=[6., 7.]))
df["A"] = df.wd + df.x
# Whole dataframe.
print(df)
# Show only the values for column 'wd'.
print(df.wd.wdouble.value)
# About types
print(df.dtypes)

>>>

             wd    x              A
0  1.000000 (1)  6.0   7.000000 (2)
1  3.000000 (2)  7.0  10.000000 (3)
<WeightedArray>
[1.0, 3.0]
Length: 2, dtype: float64
wd     object
x     float64
A      object
dtype: object

The property wdouble should not be necessary but the type of a column is still a Series, the new array is just a container.