Clustering in Machine Learning.Net (Part 2 of 5)

It is the second part of a 5 part blog series of MachineLearning.net, here is the first part.
https://cloudandmobileblog.com/2018/07/09/introduction-of-machine-learning-net-part-1-of-5/


Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. It is basically a collection of objects on the basis of similarity and dissimilarity between them.

clustering.jpg

As a first Step let us understand the problem of clustering.

This problem is about dividing the set of iris flowers in different groups based on the features of the flower. Those features are the length and width of a sepal and the length and width of a petal. For this tutorial, assume that the type of each flower is unknown. You want to learn the structure of a dataset from the features and predict how a data instance fits this structure.

sepal

As we don’t know to which group each flower belongs to, we need to choose the unsupervised machine learning task. To divide a data set in groups in such a way that elements in the same group are more similar to each other than to those in other groups, use a clustering machine learning task.

Now let us Create a new Console application in Dot NET Core using Visual Studio just like we did in the previous blog post, In Solution Explorer, right-click the project and select Add > New Folder. Type “Data” and hit Enter. Now install Install the Microsoft.ML NuGet package.

Download the iris.data dataset and save it to the Data folder you’ve created at the previous step.In Solution Explorer, right-click the iris.data file and select Properties. Under Advanced, change the value of Copy to Output Directory to Copy if newer.

The iris.data file contains five columns that represent:

  • sepal length in centimetres
  • sepal width in centimetres
  • petal length in centimetres
  • petal width in centimetres
  • type of iris flower

For the sake of the clustering example, we are ignoring the last column.

First, add the required namespace.

using Microsoft.ML.Runtime.Api;

and now create the data classes

Here IrisData is the input data class and has definitions for each feature from the data set. Use the Column attribute to specify the indices of the source columns in the dataset file.

Now Program.cs  add two fields to hold the paths to the dataset file and to the file to save the model:

  • _dataPath contains the path to the file with the data set used to train the model.
  • _modelPath contains the path to the file where the trained model is stored.

Now our Program.cs (Main File) will look like this.

Here in this screenshot the solution structure and output is clearly visible and when we executed this code it generated IrisClusteringModel.zip in the Data Folder.

Screen Shot 2018-07-15 at 8.04.21 PM.png

Here is the Github repository.

https://github.com/abhiongithub/ML-for-Dot-Net-developers

Here is the link to next blog post of this series

https://cloudandmobileblog.com/2018/07/28/understanding-binary-classification-using-sentiment-analysis-through-ml-net-part-3-of-5/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.