Building Fare Predictor using Regression Evaluator of Machine Learning .Net(Part 5 of 5)

It is the fifth and final part of a 5 part blog series of, here are the links of previous blogs of this series.

First Blog Post on the introduction of Machine Learning.NET

Second Blog Post on Clustering in Machine Learning .NET

Third Blog Post on Understanding Binary Classification

Fourth Blog Post on Sentiment Analysis using FastTreeBinaryClassifier

Linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called linear models.

Most commonly, the conditional mean of the response given the values of the explanatory variables (or predictors) is assumed to be an affine function of those values; less commonly, the conditional median or some other quantile is used. Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of the response given the values of the predictors, rather than on the joint probability distribution of all of these variables, which is the domain of multivariate analysis.

Now we will solve a problem through Regression and to be specific FastTreeRegressor.

FastTrees is an efficient implementation of the MART gradient boosting algorithm. Gradient boosting is a machine learning technique for regression problems. It builds each regression tree in a step-wise fashion, using a predefined loss function to measure the error for each step and corrects for it in the next.

So this prediction model is actually an ensemble of weaker prediction models. In regression problems, boosting builds a series of such trees in a step-wise fashion and then selects the optimal tree using an arbitrary differentiable loss function.

This problem is about predicting the fare of a taxi trip in New York City. At first glance, it may seem to depend simply on the distance traveled. However, taxi vendors in New York charge varying amounts for other factors such as additional passengers or paying with a credit card instead of cash.

CSV Files are here :

The provided dataset contains the following columns:

  • vendor_id: The ID of the taxi vendor is a feature.
  • rate_code: The rate type of the taxi trip is a feature.
  • passenger_count: The number of passengers on the trip is a feature.
  • trip_time_in_secs: The amount of time the trip took. You want to predict the fare of the trip before the trip is completed. At that moment you don’t know how long the trip would take. Thus, the trip time is not a feature and you’ll exclude this column from the model.
  • trip_distance: The distance of the trip is a feature.
  • payment_type: The payment method (cash or credit card) is a feature.
  • fare_amount: The total taxi fare paid is the label.

Step 1: Create a new Dot Net Core Console App, I am using Visual Studio for Mac as shown below,  you can also use Visual Studio code on Linux or Visual Studio 2017 for Windows.


Step 2:  Add Microsoft.ML Nuget package and import the CSV files , set the properties of CSV files as “Copy to Output directory” as shown below.


Step 3: Create a new file TaxiTrip.cs as shown below

public class TaxiTrip
public string VendorId;
public string RateCode;
public float PassengerCount;
public float TripTime;
public float TripDistance;
public string PaymentType;
public float FareAmount;

view raw
hosted with ❤ by GitHub

Step 4: Create a new file TaxiTripFarePrediction.cs as shown below

public class TaxiTripFarePrediction
public float FareAmount;

Step 5: Update Program.cs as shown below

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Models;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
namespace FarePredictor
class Program
static readonly string _datapath = Path.Combine(Environment.CurrentDirectory, "taxi-fare-train.csv");
static readonly string _testdatapath = Path.Combine(Environment.CurrentDirectory, "taxi-fare-test.csv");
static readonly string _modelpath = Path.Combine(Environment.CurrentDirectory, "");
static async Task Main(string[] args)
Console.WriteLine("Building Fare Predictor!");
PredictionModel<TaxiTrip, TaxiTripFarePrediction> model = await TrainAsync();
TaxiTripFarePrediction prediction = model.Predict(TestTrips.Trip1);
Console.WriteLine("Predicted fare: {0}, actual fare: 29.5", prediction.FareAmount);
public static async Task<PredictionModel<TaxiTrip, TaxiTripFarePrediction>> TrainAsync()
var pipeline = new LearningPipeline
new TextLoader(_datapath).CreateFrom<TaxiTrip>(useHeader: true, separator: ','),
new ColumnCopier(("FareAmount", "Label")),
new CategoricalOneHotVectorizer(
new ColumnConcatenator(
new FastTreeRegressor()
PredictionModel<TaxiTrip, TaxiTripFarePrediction> model = pipeline.Train<TaxiTrip, TaxiTripFarePrediction>();
await model.WriteAsync(_modelpath);
return model;
private static void Evaluate(PredictionModel<TaxiTrip, TaxiTripFarePrediction> model)
var testData = new TextLoader(_testdatapath).CreateFrom<TaxiTrip>(useHeader: true, separator: ',');
var evaluator = new RegressionEvaluator();
RegressionMetrics metrics = evaluator.Evaluate(model, testData);
Console.WriteLine($"Rms = {metrics.Rms}");
Console.WriteLine($"RSquared = {metrics.RSquared}");

Step 6:  Now When you will run this app, you can see the predicted fare as 31.4 while the actual fare is 29.5 which is very close to what we have predicted.


You can download this application code from here



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.