Polynomial Regression Calculator
- What is polynomial regression?
- Polynomial regression definition
- What is the difference between linear and polynomial regression?
- How to find the polynomial regression coefficients?
- How to use this polynomial regression calculator?
- Matrix formula for polynomial regression
- System of linear equations for a polynomial regression model
- FAQ
So you find yourself needing to fit a polynomial model of regression to a dataset... Thankfully, Omni's polynomial regression calculator is here! With its help, you'll be able to quickly determine the polynomial that best fits your data.
If you're not yet familiar with this concept and want to learn what polynomial regression is, don't hesitate to read the article below. It not only explains the definition of the polynomial regression model and provides all the necessary math formulas for the polynomial regression but also explains in friendly terms the difference between linear and polynomial regression!
What is polynomial regression?
Regression is a statistical method that attempts to model the values of one variable (called the dependent variable) based on the values of other variable(s) (one or more, known as independent variable(s)). For instance, we may want to find the relationship between people's weight and their height and sex, or between salaries and work experience and level of education.
In the polynomial regression model, we assume that the relationship between the dependent variable and a single independent variable is described by a polynomial of some arbitrary degree.
If you've already encountered the model of simple linear regression, where the relationship between the dependent and independent variables is modeled by a straight line of best fit, then you've seen the simplest example of polynomial regression, that is, where the polynomial has degree one! Now, imagine some data that you can't fit a straight line too, yet a parabola would be perfect. Since we can keep increasing the degree of the curve, we see why the polynomial regression model is so useful!
Polynomial regression definition
We now know what polynomial regression is, so it's time we discuss in more detail the mathematical side of the polynomial regression model. Here and henceforth, we will denote by y
the dependent variable and by x
the independent variable.
The polynomial regression equation reads:
y = a0 + a1x + a2x2 + ... + anxn,
where a0, a1, ..., an are called coefficients and n
is the degree of the polynomial regression model under consideration.
If you need a refresher on the topic of polynomials, check out the multiplying polynomials calculator and dividing polynomials calculator.
The equation with an arbitrary degree n
might look a bit scary, but don't worry! In most real-life applications, we use polynomial regression of rather low degrees:
-
Degree 1: y = a0 + a1x
As we've already mentioned, this is simple linear regression, where we try to fit a straight line to the data points.
-
Degree 2: y = a0 + a1x + a2x2
Here we've got a quadratic regression, also known as second-order polynomial regression, where we fit parabolas.
-
Degree 3: y = a0 + a1x + a2x2 + a3x3
This is cubic regression, a.k.a. third-degree polynomial regression, and here we deal with cubic functions, that is, curves of degree 3.
๐ If you want to learn more about the specific regression models mentioned above, check out the following Omni tools:
In the same vein, the polynomial regression model of degree n = 4
is called a quartic regression (or fourth-order polynomial regression), n = 5
is quintic regression, n = 6
is called sextic regression, and so on.
What is the difference between linear and polynomial regression?
In many books, you can find a remark that polynomial regression is an example of linear regression. At the same time and on the same page, you see the parabolas and cubic curves generated by polynomial regression. And then your head explodes because you can't wrap your head around all that.
Why is polynomial regression linear if all the world can see that it models non-linear relationships?
When we think of linear regression, we most often have in mind simple linear regression, which is the model where we fit a straight line to a dataset. We've already explained that simple linear regression is a particular case of polynomial regression, where we have polynomials of order 1.
However, when we talk about linear regression, what we have in mind is the family of regression models where the dependent variable is given by a function of the independent variable(s) and this function is linear in coefficients a0, a1, ... , an. In other words, the model equation can contain all sorts of expressions like roots, logarithms, etc., and still be linear on the condition that all those crazy stuff is applied to the independent variable(s) and not to the coefficients. For instance, the following model is an example of linear regression:
y = a0sin(x) + a1ln(x) + a2x17 + a3โx,
while this model is non-linear:
y = a0 * xa1
because the coefficient a1 is in the exponent. To sum up, it doesn't matter what happens to x
. What matters is that nothing non-linear happens to the coefficients: they are in first power, we don't multiply them by each other nor act on them with any functions like roots, logs, trigonometric functions, etc.
And so the mystery of why is polynomial regression linear? is solved. Now go and spread the happy news among your peers!
How to find the polynomial regression coefficients?
As always with regression, the main challenge is to determine the values of the coefficients a0, a1, ..., an based on the values of the data sample (x1,y1), ..., (xN,yN). To find the coefficients of the polynomial regression model, we usually resort to the least-squares method, that is, we look for the values of a0, a1, ..., an that minimize the sum of squared distances between each data point:
(xi, yi),
and the corresponding point is predicted by the polynomial regression equation is:
(xi, a0 + a1xi + ... + anxin).
In other words, we want to minimize the following function:
(a0, a1, ..., an) โฆ โi(a0 + a1xi + ... + anxin - yi)2,
where i
goes from 1
to N
, i.e., we sum over the whole data set. If you think it's not at all obvious how to solve this problem, you're absolutely right. A quick solution is, of course, to use Omni's polynomial regression calculator ๐ so we'll now discuss how to do it most efficiently. Then we will explain how to determine the coefficients in polynomial regression function by hand.
How to use this polynomial regression calculator?
Here's a short instruction on how to use our polynomial regression calculator:
- Enter your data: you can enter up to 30 data points (new rows will appear as you go). Remember that we need at least
n+1
points (both coordinates!) to fit a polynomial regression model of ordern
, and with exactlyn+1
points, the fit is always perfect! - The calculator will show you the scatter plot of your data along with the polynomial curve (of the degree you desired) fitted to your points.
- Below the scatter plot, you'll find the polynomial regression equation for your data.
- The coefficient of determination, Rยฒ, measures how well the model fits your data points. It assumes values between
0
and1
, and the closer it is to1
, the better your polynomial regression model is. - You can go to the
Advanced mode
if you need the polynomial regression calculator to perform calculations with a higher precision.
Matrix formula for polynomial regression
Let's briefly discuss how to calculate the coefficients of polynomial regression by hand. First, let's discuss the projection matrix approach. Let us introduce some necessary notation:
- Let
X
be the model matrix. This is a matrix withn+1
columns andN
rows, wheren
is the desired order of polynomial regression andN
is the number of data points, which we fill as follows:- The first column we fill with ones.
- The second with the observed values x1, ..., xN of the independent variable.
- The third with squares of these values.
- And so on...
- The last
n+1
-th column with then
-th powers of the observed values.
- Let
y
be a column vector filled with the values y1, ..., yN of the dependent variable:
- Finally,
ฮฒ
is the column of the coefficients of the polynomial regression model:
Now, to determine the coefficients, we use the following matrix equation (the so-called normal equation):
ฮฒ = (XTX)-1XTy,
where:
-
XT - Transpose of X;
-
(XTX)-1 - Inverse of XTX; and
-
The operation between every two matrices is matrix multiplication.
โ ๏ธ For some very peculiar datasets, it may happen that the matrix XTX is singular, i.e., its inverse does not exist. In such a case, the polynomial regression cannot be computed.
The normal equation is the method that our polynomial regression calculator uses. If you'd rather solve systems of linear equations than perform a bunch of matrix operations, you may benefit from the alternative method, which we provide in the following final section.
System of linear equations for a polynomial regression model
The coefficients of a polynomial regression model satisfy the following system of n+1
linear equations:
You may use any method of solving systems of linear equations to deal with this system and work out the coefficients.
FAQ
Why is polynomial regression linear?
Polynomial regression is a particular case of linear regression model because its equation:
y = a0 + a1x + a2x2 + ... + anxn
is linear as the function of the regression coefficients is a0, a1, ... , an. However, polynomial regression can model all sorts of non-linear relationships!
How many points do I need to fit polynomial regression?
The number of data points needed to determine the polynomial regression model depends on the degree of the polynomial you want to fit. For degree n
, you need at least n+1
data points. If you have exactly n+1
points, then the fit will be perfect, i.e., the curve will go through every point. Remember, the model is more reliable when you build it on a larger sample!
Can I always calculate polynomial regression?
No, it may happen that the polynomial regression cannot be fitted. However, this occurs only for very peculiar data sets, so you have a very low chance of ever facing this problem with actual real-life data.