Introduction is a machine learning library implemented in C# by Microsoft. This projet aims at showing how to extend it with custom tranforms or learners. It implements standard abstraction in C# such as dataframes and pipeline following the scikit-learn API. implements two API. The first one structured as a streaming API merges every experiment in a single sequence of transform and learners possibly handling one out-of-memory dataset. The second API is built on the top of the first one and proposes an easier way to build pipeline with multiple datasets. This second API is also used by wrapper to other language such as NimbusML. Let’s see first how this library can be used without any addition.

Command line proposes some sort of simple language to define a simple machine learning pipeline. We use it on Iris data to train a logistic regression.

Label	Sepal_length	Sepal_width	Petal_length	Petal_width
0	3.5	1.4	0.2	5.1
0	3.0	1.4	0.2	4.9
0	3.2	1.3	0.2	4.7
0	3.1	1.5	0.2	4.6
0	3.6	1.4	0.2	5.0

The pipeline is simply define by a logistic regression named mlr for MultiLogisticRegression. Options are defined inside {...}. The parameter data= specifies the data file, loader= specifies the format and column names.


data = iris.txt
loader = text{col = Label: R4: 0 col = Features: R4: 1 - 4 header = +}
tr = mlr{maxiter = 5}
out =

The documentation of every component is available through the command line. An exemple for Multi-class Logistic Regression:


? mlr


    Help for MultiClassClassifierTrainer, Trainer: 'MultiClassLogisticRegression'
      Aliases: MulticlassLogisticRegressionPredictorNew, mlr, multilr
    showTrainingStats=[+|-]             Show statistics of training examples.
                                        Default value:'-' (short form stat)
    l2Weight=<float>                    L2 regularization weight Default value:'1'
                                        (short form l2)
    l1Weight=<float>                    L1 regularization weight Default value:'1'
                                        (short form l1)
    optTol=<float>                      Tolerance parameter for optimization
                                        convergence. Lower = slower, more accurate
                                        Default value:'1E-07' (short form ot)
    memorySize=<int>                    Memory size for L-BFGS. Lower=faster, less
                                        accurate Default value:'20' (short form m)
    maxIterations=<int>                 Maximum iterations. Default
                                        value:'2147483647' (short form maxiter)
    sgdInitializationTolerance=<float>  Run SGD to initialize LR weights,
                                        converging to this tolerance Default
                                        value:'0' (short form sgd)
    quiet=[+|-]                         If set to true, produce no output during
                                        training. Default value:'-' (short form q)
    initWtsDiameter=<float>             Init weights diameter Default value:'0'
                                        (short form initwts)
    numThreads=<int>                    Number of threads (short form nt)
    denseOptimizer=[+|-]                Force densification of the internal
                                        optimization vectors Default value:'-'
                                        (short form do)
    enforceNonNegativity=[+|-]          Enforce non-negative weights Default
                                        value:'-' (short form nn)

More example can be found at Command Line. The command line is usually the preferred way to use the library. It does not requires a huge setup and and makes the training easier. Online predictions require C# but command Generate Sample Prediction Code may help in that regard.