Convert a R script into PythonΒΆ

Links: notebook, html, PDF, python, slides, GitHub

This notebook introduces the function r2python which converts R into Python. It does not work for eveything, it is being improved everytime it is needed. This notebook was executed with the following versions:

import sys
print(sys.version)
3.10.5 (tags/v3.10.5:f377153, Jun  6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)]
text = !python -m pip freeze antlr4-python3-runtime
[t for t in text if "antlr" in t]
['antlr4-python3-runtime==4.10']

A script as an example:

rscript = """
nb=function(y=1930){
debut=1816
MatDFemale=matrix(D$Female,nrow=111)
colnames(MatDFemale)=(debut+0):198
cly=(y-debut+1):111
deces=diag(MatDFemale[:,cly[cly%in%1:199]])
return(c(B$Female[B$Year==y],deces))}
"""
from pyensae.languages.rconverter import r2python
print(r2python(rscript, pep8=True))
from python2r_helper import make_tuple

def nb(y=1930):
    debut = 1816
    MatDFemale = matrix(D . Female, nrow=111)
    colnames(MatDFemale) .set(range((debut + 0), 198))
    cly = range((y - debut + 1), 111)
    deces = diag(MatDFemale[:, cly[set(cly) & set(range(1, 199))]])
    return make_tuple(B . Female[B . Year == y], deces)

It adds some not implemented function such as colnames(MatDFemale) .set(range((debut + 0), 198)) because the original syntax colnames(MatDFemale)=debut+0:198 does not work in Python. The conversion does not fix indices (first position is zero in Python and 1 in R). The bracket (debut+0):198 are needed to tell the converter the beginning of the expression. The operator %in% is converted into a set intersection.

The unit tests check the function is working on the following list of example unittests/ut_languages/data. Anything not included in that list might require a few code change. Some instructions colnames(MatDFemale) .set(range((debut + 0), 198)) should probably be rewritten.

import numpy

def matrix(array, nrow=None):
    arr = numpy.array(array)
    if nrow is not None:
        ncol = len(arr) // nrow
        arr = numpy.resize(arr, new_shape=(nrow, ncol))
    return arr

def colnames(df):
    if isinstance(df, pandas.DataFrame):
        return list(df.columns)
    raise TypeError(type(df))

def make_tuple(*el, aslist=True):
    if aslist:
        return list(el)
    return tuple(el)