8

I'm looking at a the cs file here: https://www.microsoft.com/net/learn/apps/machine-learning-and-ai/ml-dotnet/get-started/windows and everything works well.

Now I'd like to improve the example: I'd like to predict a number-only data set and not a number-string dataset, for example predict the ouput of a seven segments display.

Here is my super easy dataset, the last column is the int number that I want to predict:

1,0,1,1,1,1,1,0
0,0,0,0,0,1,1,1
1,1,1,0,1,1,0,2
1,1,1,0,0,1,1,3
0,1,0,1,0,1,1,4
1,1,1,1,0,0,1,5
1,1,1,1,1,0,1,6
1,0,0,0,0,1,1,7
1,1,1,1,1,1,1,8
1,1,1,1,0,1,1,9

And here is my test code:

public class Digit
{
    [Column("0")] public float Up;

    [Column("1")] public float Middle;

    [Column("2")] public float Bottom;

    [Column("3")] public float UpLeft;
    [Column("4")] public float BottomLeft;
    [Column("5")] public float TopRight;
    [Column("6")] public float BottomRight;

    [Column("7")] [ColumnName("DigitValue")]
    public float DigitValue;
}

public class DigitPrediction
{
    [ColumnName("PredictedDigits")] public float PredictedDigits;
}

public PredictDigit()
{
    var pipeline = new LearningPipeline();
    var dataPath = Path.Combine("Segmenti", "segments.txt");
    pipeline.Add(new TextLoader<Digit>(dataPath, false, ","));
    pipeline.Add(new ColumnConcatenator("Label", "DigitValue"));
    pipeline.Add(new ColumnConcatenator("Features", "Up", "Middle", "Bottom", "UpLeft", "BottomLeft", "TopRight", "BottomRight"));
    pipeline.Add(new StochasticDualCoordinateAscentClassifier());
    var model = pipeline.Train<Digit, DigitPrediction>();
    var prediction = model.Predict(new Digit
    {
        Up = 1,
        Middle = 1,
        Bottom = 1,
        UpLeft = 1,
        BottomLeft = 1,
        TopRight = 1,
        BottomRight = 1,
    });

    Console.WriteLine($"Predicted digit is: {prediction.PredictedDigits}");
    Console.ReadLine();
}

As you can see it is very similar to the example provided except the last column ("Label") handling beacause I need to predict a number and not a string. I try with:

pipeline.Add(new ColumnConcatenator("Label", "DigitValue"));

but it does not work, exception:

Training label column 'Label' type is not valid for multi-class: Vec<R4, 1>. Type must be R4 or R8.

I'm sure I miss something but actually I cannot find anything on internet that can help me solve this problem.

UPDATE

I found that the dataset have to have a Label column like this:

[Column("7")] [ColumnName("Label")] public float Label;

and the DigitPrediction a Score column like:

public class DigitPrediction
{
    [ColumnName("Score")] public float[] Score;
}

Now the system "works" and I got as prediction.Score a Single[] value where the index associated with the higher value is the predicted value. Is it the right approach?

UPDATE 2 - Complete code example

Following the answer and other suggestions I got the right result, if you need it you can find complete code here.

Rowandish
  • 2,223
  • 3
  • 22
  • 45
  • Because ML.net is so new, I would post this as an Issue on the github for ML.net: https://github.com/dotnet/machinelearning/issues tag it with "question" if you can and when you get an answer, post it here as answering your own question. I'd like to know the answer too. Does it predict correctly for you? – Kyle B May 24 '18 at 01:03
  • @KyleB Thanks for the suggestion, I made it here: https://github.com/dotnet/machinelearning/issues/226 – Rowandish May 24 '18 at 07:37

3 Answers3

4

Looks like you would need to add this field to your class DigitPrediction:

public class DigitPrediction
{
    [ColumnName("PredicatedLabel")]
    public uuint ExpectedDigit; // <-- This is the predicted value

    [ColumnName("Score")]
    public float[] Score; // <-- This is the probability that the predicted value is the right classification
}

And I think you would need to change the line where it writes the result to:

    Console.WriteLine($"Predicted digit is: {prediction.ExpectedDigit}");

One more thing, it looks like there is a bug in the API where the expected digit will be off by one but if you shift it by adding +1 to the predicted value it will be the correct value. I expect them to fix this in the future, there is an issue for it: (https://github.com/dotnet/machinelearning/issues/235)

Kyle B
  • 1,794
  • 14
  • 32
1

You could also use try swapping ColumnConcatenator with ColumnCopier, in the Pipeline, for the Label column.

pipeline.Add(new ColumnCopier ("Label", "DigitValue"));

That will indicate the pipeline which column is the Label, but the output of ColumnCopier will not be a vector, unlike the output of ColumnConcatenator.

And could similarly add a Score column as well.

amy8374
  • 1,210
  • 2
  • 14
  • 24
0

Now, it is essential to follow the pattern:

  • Column Features (all the features - they have to have the same type)

  • Column Label (Your "answers")

If original dataset have another answer column use:

pipeline.Add(new ColumnCopier(("DigitValue", "Label")));

The first is the source, the second is the destination. As I see, double '(' is required.

Arhisan
  • 1
  • 1