# Clustering in Machine Learning.Net (Part 2 of 5)

It is the second part of a 5 part blog series of MachineLearning.net, here is the first part.
https://cloudandmobileblog.com/2018/07/09/introduction-of-machine-learning-net-part-1-of-5/

Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. It is basically a collection of objects on the basis of similarity and dissimilarity between them. As a first Step let us understand the problem of clustering.

This problem is about dividing the set of iris flowers in different groups based on the features of the flower. Those features are the length and width of a sepal and the length and width of a petal. For this tutorial, assume that the type of each flower is unknown. You want to learn the structure of a dataset from the features and predict how a data instance fits this structure. As we don’t know to which group each flower belongs to, we need to choose the unsupervised machine learning task. To divide a data set in groups in such a way that elements in the same group are more similar to each other than to those in other groups, use a clustering machine learning task.

Now let us Create a new Console application in Dot NET Core using Visual Studio just like we did in the previous blog post, In Solution Explorer, right-click the project and select Add > New Folder. Type “Data” and hit Enter. Now install Install the Microsoft.ML NuGet package.

Download the iris.data dataset and save it to the Data folder you’ve created at the previous step.In Solution Explorer, right-click the iris.data file and select Properties. Under Advanced, change the value of Copy to Output Directory to Copy if newer.

The iris.data file contains five columns that represent:

• sepal length in centimetres
• sepal width in centimetres
• petal length in centimetres
• petal width in centimetres
• type of iris flower

For the sake of the clustering example, we are ignoring the last column.

`using Microsoft.ML.Runtime.Api;`

and now create the data classes

 public class IrisData { [Column("0")] public float SepalLength; [Column("1")] public float SepalWidth; [Column("2")] public float PetalLength; [Column("3")] public float PetalWidth; } public class ClusterPrediction { [ColumnName("PredictedLabel")] public uint PredictedClusterId; [ColumnName("Score")] public float[] Distances; }

view raw
Program.cs
hosted with ❤ by GitHub

Here IrisData is the input data class and has definitions for each feature from the data set. Use the Column attribute to specify the indices of the source columns in the dataset file.

Now Program.cs  add two fields to hold the paths to the dataset file and to the file to save the model:

• `_dataPath` contains the path to the file with the data set used to train the model.
• `_modelPath` contains the path to the file where the trained model is stored.

Now our Program.cs (Main File) will look like this.

 using System; using System.IO; using Microsoft.ML; using Microsoft.ML.Data; using Microsoft.ML.Trainers; using Microsoft.ML.Transforms; namespace ClusteringInML { public static class Program { static readonly string _dataPath = Path.Combine(Environment.CurrentDirectory, "Data", "iris-data.txt"); static readonly string _modelPath = Path.Combine(Environment.CurrentDirectory, "Data", "IrisClusteringModel.zip"); private static void Main(string[] args) { PredictionModel model = Train(); model.WriteAsync(_modelPath); var prediction = model.Predict(TestIrisData.Setosa); Console.WriteLine(\$"Cluster: {prediction.PredictedClusterId}"); Console.WriteLine(\$"Distances: {string.Join(" ", prediction.Distances)}"); } private static PredictionModel Train() { var pipeline = new LearningPipeline(); pipeline.Add(new TextLoader(_dataPath).CreateFrom(separator: ',')); pipeline.Add(new ColumnConcatenator( "Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth")); pipeline.Add(new KMeansPlusPlusClusterer() { K = 3 }); var model = pipeline.Train(); return model; } } }

view raw
gistfile1.cs
hosted with ❤ by GitHub

Here in this screenshot the solution structure and output is clearly visible and when we executed this code it generated IrisClusteringModel.zip in the Data Folder. Here is the Github repository.

https://github.com/abhiongithub/ML-for-Dot-Net-developers

Here is the link to next blog post of this series

https://cloudandmobileblog.com/2018/07/28/understanding-binary-classification-using-sentiment-analysis-through-ml-net-part-3-of-5/

This site uses Akismet to reduce spam. Learn how your comment data is processed.