XD blog

blog page

c#


2018-07-10 DataFrame for C Sharp

Microsoft has recently released an open source machine learning library called ML.net. As opposed to scikit-learn, there is no dataframe in C# and the data are described as an array of instances specific to the data the learning pipeline has to handle : Get started with ML.NET in 10 minutes. I was wondering if there could be a way to skip that part even if it means to be a little bit slower. I finally ended up by implementing something similar to what a dataframe in Python with pandas which I called Scikit.ML.DataFrame. I modified the inital example:

var iris = "iris.txt";

// We read the text data and create a dataframe / dataview.
var df = DataFrame.ReadCsv(iris, sep: '\t',
                           dtypes: new DataKind?[] { DataKind.R4 });

var importData = df.EPTextLoader(iris, sep: '\t', header: true);
var learningPipeline = new GenericLearningPipeline();
learningPipeline.Add(importData);
learningPipeline.Add(new ColumnConcatenator("Features", "Sepal_length", "Sepal_width"));
learningPipeline.Add(new StochasticDualCoordinateAscentClassifier());
var predictor = learningPipeline.Train();
var predictions = predictor.Predict(df);

var dfout = DataFrame.ReadView(predictions);

// And access one value...
var v = dfout.iloc[0, 7];
Console.WriteLine("{0}: {1}", vdf.Schema.GetColumnName(7), v.iloc[0, 7]);

Xavier Dupré