It is the third part of a 5 part blog series of MachineLearning.net, here are the first and second parts.
First Blog Post on the introduction of Machine Learning.NET https://cloudandmobileblog.com/2018/07/09/introduction-of-machine-learning-net-part-1-of-5/
Second Blog Post on Clustering in Machine Learning .NET https://cloudandmobileblog.com/2018/07/15/clustering-in-machinelearning-net/
Binary or binomial classification is the task of classifying the elements of a given set into two groups (predicting which group each one belongs to) on the basis of a classification rule. Binary Classification would generally fall into the domain of Supervised Learning since the training dataset is labeled. And as the name suggests it is simply a special case in which there are only two classes.
Some typical examples include:
- Credit Card Fraudulent Transaction detection
- Medical Diagnosis
- Spam Detection
Now there are various paradigms that are used for learning binary classifiers which include:
- Decision Trees
- Neural Networks
- Bayesian Classification
- Support Vector Machines
The actual output of many binary classification algorithms is a prediction score. The score indicates the system’s certainty that the given observation belongs to the positive class. To make the decision about whether the observation should be classified as positive or negative, as a consumer of this score, you will interpret the score by picking a classification threshold (cut-off) and compare the score against it. Any observations with scores higher than the threshold are then predicted as the positive class and scores lower than the threshold are predicted as the negative class.
Depending on your business problem, you might be more interested in a model that performs well for a specific subset of these metrics. For example, two business applications might have very different requirements for their ML models:
- One application might need to be extremely sure about the positive predictions actually being positive (high precision) and be able to afford to misclassify some positive examples as negative (moderate recall).
- Another application might need to correctly predict as many positive examples as possible (high recall) and will accept some negative examples being misclassified as positive (moderate precision).
This problem is centered around predicting if a passenger aboard the Titanic survived or not. We will use the data provided in the repo: Real-World Machine Learning in which each passenger has been assigned a label:
- 0 – did not survive
- 1 – survived
Using those datasets we will build a model that will analyze a string and predict if a passenger survived.
Step 1. Create a new Dot Net Core Console App, I am using Visual Studio for Mac as shown below, you can also use Visual Studio code on Linux or Visual Studio 2017 for Windows.
I named my Application as TitanicSurvivalClassifier
Step 2: Add Microsoft.ML NuGet package and import these two CSV files for the training and evaluating our model. Add these files and set their properties as “Copy to output directory”
Step 3: Now add TitanicData.cs file as shown below.
Step 4: Now add TitanicPrediction.cs file as shown below
Step 5: Now add TestTitanicData.cs
Step 6: Now Modify Program.cs .
Step 7: Now Once you run this program, you must see the following output.
You can download the source code of this application from following GitHub repository.
Here is the link to next blog post of this series