Understanding Binary Classification using through ML .NET (Part 3 of 5)

It is the third part of a 5 part blog series of MachineLearning.net, here are the first and second parts.

First Blog Post on the introduction of Machine Learning.NET https://cloudandmobileblog.com/2018/07/09/introduction-of-machine-learning-net-part-1-of-5/

Second Blog Post on Clustering in Machine Learning .NET https://cloudandmobileblog.com/2018/07/15/clustering-in-machinelearning-net/


Binary or binomial classification is the task of classifying the elements of a given set into two groups (predicting which group each one belongs to) on the basis of a classification rule. Binary Classification would generally fall into the domain of Supervised Learning since the training dataset is labeled. And as the name suggests it is simply a special case in which there are only two classes.
Some typical examples include:

  1. Credit Card Fraudulent Transaction detection
  2. Medical Diagnosis
  3. Spam Detection

Now there are various paradigms that are used for learning binary classifiers which include:

  1. Decision Trees
  2. Neural Networks
  3. Bayesian Classification
  4. Support Vector Machines

The actual output of many binary classification algorithms is a prediction score. The score indicates the system’s certainty that the given observation belongs to the positive class. To make the decision about whether the observation should be classified as positive or negative, as a consumer of this score, you will interpret the score by picking a classification threshold (cut-off) and compare the score against it. Any observations with scores higher than the threshold are then predicted as the positive class and scores lower than the threshold are predicted as the negative class.

Depending on your business problem, you might be more interested in a model that performs well for a specific subset of these metrics. For example, two business applications might have very different requirements for their ML models:

  • One application might need to be extremely sure about the positive predictions actually being positive (high precision) and be able to afford to misclassify some positive examples as negative (moderate recall).
  • Another application might need to correctly predict as many positive examples as possible (high recall) and will accept some negative examples being misclassified as positive (moderate precision).

Problem

This problem is centered around predicting if a passenger aboard the Titanic survived or not. We will use the data provided in the repo: Real-World Machine Learning in which each passenger has been assigned a label:

  • 0 – did not survive
  • 1 – survived

Using those datasets we will build a model that will analyze a string and predict if a passenger survived.


Step 1. Create a new Dot Net Core Console App, I am using Visual Studio for Mac as shown below,  you can also use Visual Studio code on Linux or Visual Studio 2017 for Windows.

1-NewConsoleApp.png

I named my Application as TitanicSurvivalClassifier

2-ProjectCreated.png


Step 2:  Add Microsoft.ML NuGet package and import these two CSV files for the training and evaluating our model. Add these files and set their properties as “Copy to output directory”

https://github.com/abhiongithub/ML-for-Dot-Net-developers/blob/master/3-BinaryClassification/TitanicSurvivalClassifier/TitanicSurvivalClassifier/titanic-train.csv

https://github.com/abhiongithub/ML-for-Dot-Net-developers/blob/master/3-BinaryClassification/TitanicSurvivalClassifier/TitanicSurvivalClassifier/titanic-test.csv

 


Step 3:  Now add TitanicData.cs file as shown below.


Step 4: Now add TitanicPrediction.cs file as shown below


Step 5: Now add TestTitanicData.cs


Step 6: Now Modify Program.cs .


Step 7: Now Once you run this program, you must see the following output.

TitanicOutput.png

You can download the source code of this application from following GitHub repository.

https://github.com/abhiongithub/ML-for-Dot-Net-developers

Here is the link to next blog post of this series

https://cloudandmobileblog.com/2018/07/28/sentiment-analysis-using-machine-learning-net-part-4-of-5/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.