When a single line is not enough to fit our data, piecewise linear regression can come to our rescue.
Published in ·
--
The Problem
A few years ago I was working with a problem that at first seemed simple and not “fancy” enough compared to other ML models, but with time I learned to appreciate and get great insights out of it. As the title says, we’ll be talking about piecewise linear regression, but before diving deep into it let’s take a look at the problem I faced.
The graph above shows the variation of temperature with height (dT/dz). We can see that the variation of the temperature isn't a nice straight line. This creates some implications when analyzing this data. For instance, if we try to calculate dT/dz, it’ll give us a number that doesn't represent accurately the change in the different layers (or pieces) of the atmosphere.
In fact, we can visually divide it into 3 different layers (or pieces). Above we can see the representation of these layers and its approximate height (x-axis). The break points are the points where the data completely changes its behaviour. For instance, break point 1 (let’s denote it b1) is our start point; the layer between b1 and b2 has a similar dT/dz. However, by the time it gets to b2, dT/dz changes its behaviour to a more smooth change not increasing exponentially like before. What we are trying to understand is how dT/dz changes inside of each these layers and where are the break points.
If we plot our data and try to fit a regression on it we fail miserably. However, it doesn't mean that a linear regression isn't the best choice for our problem, we just need to adapt it a little bit.
When I was brainstorming the problem, one of the question I asked myself was: why don’t I try to fit a polynomial regression? It’s simple and we can find implementations all over the internet. Above I plotted a 3rd order polynomial regression fitting the data. The problem of polynomial regression is that you lose the interpretability of the model when adding the polynomial terms (quadratic, cubic, etc). So if you don't care so much about interpretability, you can stop reading here. However, if you need interpretability to deeply understand the problem, piecewise linear regression is your buddy.
Piecewise Linear Regression: Solution of Our Problems
The idea behind piecewise linear regression is that if the data follows different linear trends over different regions of the data, as shown before, then we should model the regression function in “pieces”. Below we have the system of equations that construct our problem:
where b1 is the x location of the first break point, b2 is the x location of the second break point, and so forth until the last break point bnb for nb number of break points.
Although it's fairly simple to solve linear regressions, one question still remains: how to find the break points? For me, that's where I got stuck. You can try to solve it by brute force using a bunch of for loops, but this won't be optimal. Thankfully, there’s a relatively simple and straightforward library called PWLF created by Charles Jekel (many kudos to him!).
With a few lines of code, you can create a piecewise linear model as shown above. In my case, I knew I needed 3 regressions fit my data, so I just passed it and the break points were found by the algorithm. The way PWLF finds it is by differential evolution algorithm, which tries to find the global minimum of a given function; in our case, the sum of the square of the residuals. For more info on differential evolution, I recommend checking this article.
Now with the break points known, we can fit our data. The most amazing thing about it is that you can still analyze each segment as a normal linear regression, calculate the same statistics as a linear regression, etc. We can see the PWLF result in the graph above.
The Wrap Up
Piecewise linear regression takes the best aspects of linear regression and solves complex problems that we wouldn’t be able to solve with a simple linear regression. The most awesome part of this simple algorithm is that it allows you easily understand your data by solving multiple linear regressions, so if you have data that doesn’t fit a single line, piecewise linear regression can help you. Furthermore, we saw that the PWLF library is a fairly simple way to implement piecewise linear regression and has an easy and understandable approach. Having it in your hands can help you to get many insights into your complex data.
This is it for now. I hope you enjoyed the article and if you want to talk more about it, you’re welcome to connect with me on LinkedIn.