Solving Exploration-Exploitation trade-off with Quantile Regression

By: Dr. Yan Xu – Technical Specialist, Data & Optimization

The exploration-exploitation trade-off is a fundamental dilemma whenever you make decisions among options. The dilemma is between choosing what you know and getting something close to what you expect (‘exploitation’) and choosing something you aren’t sure about and possibly learning more (‘exploration’).

At Rokt, we adopt sampling-based techniques (e.g. Thompson Sampling) to deal with the trade-off in a number of scenarios:

  • Creative selection (creative text selection, creative layout selection, etc)
  • Widget selection (layout, design, etc)
  • Email follow-up optimisation (email subject selection, sending hour selection, etc)
  • New client and campaign onboarding

Sampling-based techniques not only require point estimation but also the probability distribution (or variance estimation). Unfortunately, many main-stream machine learning models (e.g. Gradient Boosting Trees, Deep Learning Models) do not naturally support variance estimation. In this blog, I’ll describe how we solved that problem using quantile regression.


Quantiles are points in a distribution that relate to the rank order of values in that distribution. For example, we say that a student scores at the 25th quantile of a standardized exam if he performs better than 25%, and worse than 75%, of the reference group of students. The 50th quantile is also named as the median value.

More formally, given a random variable Y, whose cumulative distribution function is:

while for any 0 < ? < 1,

is called the ?th quantile. Like the distribution function, quantile function provides a complete characterization of the random variable Y.

Quantile Regression

Just as linear regression for conditional mean estimation by minimizing sums of squared residuals, quantile regression offers a mechanism for conditional quantile estimation.

The cost function (or quantile loss) for linear conditional quantile estimation is as follows:

where, pt(u) the piecewise linear check function is illustrated as Figure 1:

Figure 1: quantile regression piecewise linear check function (source: [1])

Given any quantile ?, we could obtain a regression model for the quantile by minimizing the cost function above. Note that the general formulation is equivalent with pre-selecting quantile values from sorted rearrangement of the original samples. Given a list of quantile values, we could obtain a list of regression models, which, when used together, could predict the entire probability distribution.

An Example

In “Quantile Regression” [1], the author used quantile regression to build a simple first order autoregressive model for maximum daily temperature in Melbourne, Australia. The scatter plot of 10 years of daily temperature data is shown in Figure 2: today’s maximum daily temperature against yesterday’s maximum. The data shows that:

  1. There is a strong correlation between today’s maximum daily temperature and yesterday’s maximum daily temperature.
  2. However, the variance is growing high as yesterday’s maximum daily temperature increases.

Figure 2: today’s maximum daily temperature against yesterday’s maximum in Melbourne (Source: [1])

Quantile regression could capture the variance dynamics, which could not be captured by the simple linear regression model. Besides, the predicted maximum daily temperature is not required to follow Normal Distribution. Figure 3 shows example predicted probability distribution, given yesterday’s maximum daily temperature.

Figure 3: Example predicted probability distribution (Source: [1])

Quantile Regression with LightGBM

Gradient boosting is a machine learning technique for regression and classification problems that produces a prediction model in the form of an ensemble of weak prediction models (typically decision trees). Gradient boosting gained massive popularity on tabular datasets, because most of the Kaggle competition winners have used some variant of Gradient Boosting algorithm in recent years. At Rokt, we adopted LightGBM for both classification problems and regression problems. So, in this blog, we focus on the combination of quantile regression and LightGBM.

An example is illustrated here. The function to learn is a scaled sin() function.

The generated data with random noises is shown as follows:

The training data is plotted as follows:

The code to train LightGBM to learn the 5th, 50th, 95th quantile is shown as follows:

The predicted confidence interval is plotted as follows:

The predicted 5th quantile value and 95th quantile value could be used as variance estimation to deal with the exploration-exploitation trade-off.

For the details of “how quantile loss was implemented in LightGBM?”, please check J. Mark Hou’s blog [3] or LightGBM code [4].

[1] Koenker, Roger and Kevin F. Hallock. “Quantile Regression”. Journal of Economic Perspectives, Volume 15, Number 4, Fall 2001, Pages 143–156


Dr. Yan Xu is currently a technical specialist on the Machine Learning team at Rokt. His work focuses on auction system, click/conversion prediction, strategic bidding, price modelling and email optimisation. Before Rokt, he was working in Amazon AWS Networking, where his work focused on network operation automation and network safety modelling. Dr. Yan Xu holds a PhD in computer science from National University of Singapore, where his thesis was in scalable short-term traffic prediction. He has authored/co-authored several publications and patents in leading conferences and journals.