Clustering in Machine Learning.Net (Part 2 of 5)

It is the second part of a 5 part blog series of, here is the first part.

Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. It is basically a collection of objects on the basis of similarity and dissimilarity between them.


As a first Step let us understand the problem of clustering.

This problem is about dividing the set of iris flowers in different groups based on the features of the flower. Those features are the length and width of a sepal and the length and width of a petal. For this tutorial, assume that the type of each flower is unknown. You want to learn the structure of a dataset from the features and predict how a data instance fits this structure.


As we don’t know to which group each flower belongs to, we need to choose the unsupervised machine learning task. To divide a data set in groups in such a way that elements in the same group are more similar to each other than to those in other groups, use a clustering machine learning task.

Now let us Create a new Console application in Dot NET Core using Visual Studio just like we did in the previous blog post, In Solution Explorer, right-click the project and select Add > New Folder. Type “Data” and hit Enter. Now install Install the Microsoft.ML NuGet package.

Download the dataset and save it to the Data folder you’ve created at the previous step.In Solution Explorer, right-click the file and select Properties. Under Advanced, change the value of Copy to Output Directory to Copy if newer.

The file contains five columns that represent:

  • sepal length in centimetres
  • sepal width in centimetres
  • petal length in centimetres
  • petal width in centimetres
  • type of iris flower

For the sake of the clustering example, we are ignoring the last column.

First, add the required namespace.

using Microsoft.ML.Runtime.Api;

and now create the data classes

Here IrisData is the input data class and has definitions for each feature from the data set. Use the Column attribute to specify the indices of the source columns in the dataset file.

Now Program.cs  add two fields to hold the paths to the dataset file and to the file to save the model:

  • _dataPath contains the path to the file with the data set used to train the model.
  • _modelPath contains the path to the file where the trained model is stored.

Now our Program.cs (Main File) will look like this.

Here in this screenshot the solution structure and output is clearly visible and when we executed this code it generated in the Data Folder.

Screen Shot 2018-07-15 at 8.04.21 PM.png

Here is the Github repository.

Here is the link to next blog post of this series

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.