R squared of a linear regression (2024)

by Marco Taboga, PhD

How good is a linear regression model in predicting the output variable on the basis of the input variables?

How much of the variability in the output is explained by the variability in the inputs of a linear regression?

The R squared of a linear regression is a statistic that provides a quantitative answer to these questions.

R squared of a linear regression (1)

R squared of a linear regression (2)

Table of contents

  1. Caveat

  2. The linear regression model

  3. Sample variance of the outputs

  4. Sample variance of the residuals

  5. Definition of R squared

  6. Properties and interpretation

  7. Alternative definition

  8. Adjusted R squared

  9. Interpretation of the adjusted R squared

  10. More details about the degrees-of-freedom adjustment

Caveat

Before defining the R squared of a linear regression, we warn our readers that several slightly different definitions can be found in the literature.

Usually, these definitions are equivalent in the special, but important case in which the linear regression includes a constant among its regressors.

We choose a definition that is easy to understand, and then we make some brief comments about other definitions.

The linear regression model

Consider the linear regression modelR squared of a linear regression (3)where R squared of a linear regression (4) is a R squared of a linear regression (5) vector of inputs and R squared of a linear regression (6) is a R squared of a linear regression (7) vector of regression coefficients.

Suppose that we have a sample of R squared of a linear regression (8) observations R squared of a linear regression (9), for R squared of a linear regression (10).

Given an estimate R squared of a linear regression (11) of R squared of a linear regression (12) (for example, an OLS estimate), we compute the residuals of the regression:R squared of a linear regression (13)

Sample variance of the outputs

Denote by R squared of a linear regression (14) the unadjusted sample variance of the outputs:R squared of a linear regression (15)where R squared of a linear regression (16) is the sample meanR squared of a linear regression (17)

The sample variance R squared of a linear regression (18) is a measure of the variability of the outputs, that is, of the variability that we are trying to explain with the regression model.

Sample variance of the residuals

Denote by R squared of a linear regression (19) the mean of the squared residuals:R squared of a linear regression (20)which coincides with the unadjusted sample variance of the residuals when the sample mean of the residualsR squared of a linear regression (21)is equal to zero.

Unless stated otherwise, we are going to maintain the assumption that R squared of a linear regression (22) in what follows.

The sample variance R squared of a linear regression (23) is a measure of the variability of the residuals, that is, of the part of the variability of the outputs that we are not able to explain with the regression model.

Intuitively, when the predictions of the linear regression model are perfect, then the residuals are always equal to zero and their sample variance is also equal to zero.

On the contrary, the less the predictions of the linear regression model are accurate, the highest the variance of the residuals is.

Definition of R squared

We are now ready to give a definition of R squared.

Definition The R squared of the linear regression, denoted by R squared of a linear regression (24), isR squared of a linear regression (25)where R squared of a linear regression (26) is the sample variance of the residuals and R squared of a linear regression (27) is the sample variance of the outputs.

Thus, the R squared is a decreasing function of the sample variance of the residuals: the higher the sample variance of the residuals is, the smaller the R squared is.

Properties and interpretation

Note that the R squared cannot be larger than 1: it is equal to 1 when the sample variance of the residuals is zero, and it is smaller than 1 when the sample variance of the residuals is strictly positive.

The R squared is equal to 0 when the variance of the residuals is equal to the variance of the outputs, that is, when predicting the outputs with the regression model is no better than using the sample mean of the outputs as a prediction.

It is possible to prove that the R squared cannot be smaller than 0 if the regression includes a constant among its regressors and R squared of a linear regression (28) is the OLS estimate of R squared of a linear regression (29) (in this case we also have that R squared of a linear regression (30)). Outside this important special case, the R squared can take negative values.

In summary, the R square is a measure of how well the linear regression fits the data (in more technical terms, it is a goodness-of-fit measure): when it is equal to 1 (and R squared of a linear regression (31)), it indicates that the fit of the regression is perfect; and the smaller it is, the worse the fit of the regression is.

Alternative definition

Another common definition of the R squared isR squared of a linear regression (32)

This definition is equivalent to the previous definition in the case in which the sample mean of the residuals R squared of a linear regression (33) is equal to zero (e.g., if the regression includes an intercept).

Check the Wikipedia article for other definitions.

Adjusted R squared

The adjusted R squared is obtained by using the adjusted sample variancesR squared of a linear regression (34)andR squared of a linear regression (35)instead of the unadjusted sample variances R squared of a linear regression (36) and R squared of a linear regression (37).

This is done because R squared of a linear regression (38) and R squared of a linear regression (39) are unbiased estimators of R squared of a linear regression (40) and R squared of a linear regression (41) under certain assumptions (see the lectures on Variance estimation and The Normal Linear Regression Model).

Definition The adjusted R squared of the linear regression, denoted by R squared of a linear regression (42), isR squared of a linear regression (43)where R squared of a linear regression (44) is the adjusted sample variance of the residuals and R squared of a linear regression (45) is the adjusted sample variance of the outputs.

The adjusted R squared can also be written as a function of the unadjusted sample variances:R squared of a linear regression (46)

Proof

This is an immediate consequence of the fact thatR squared of a linear regression (47)andR squared of a linear regression (48)

The ratioR squared of a linear regression (49)used in the formula above is often called a degrees-of-freedom adjustment.

Interpretation of the adjusted R squared

The intuition behind the adjustment is as follows.

When the number R squared of a linear regression (50) of regressors is large, the mere fact of being able to adjust many regression coefficients allows us to significantly reduce the variance of the residuals. As a consequence, the R squared tends to be small.

This phenomenon is known as overfitting. The extreme case is when the number of regressors R squared of a linear regression (51) is equal to the number of observations R squared of a linear regression (52) and we can choose R squared of a linear regression (53) so as to make all the residuals equal to R squared of a linear regression (54).

But being able to mechanically make the variance of the residuals small by adjusting R squared of a linear regression (55) does not mean that the variance of the errors of the regression R squared of a linear regression (56) is as small.

The degrees-of-freedom adjustment allows us to take this fact into consideration and to avoid under-estimating the variance of the error terms.

More details about the degrees-of-freedom adjustment

In more technical terms, the idea behind the adjustment is that what we would really like to know is the quantityR squared of a linear regression (57)but the unadjusted sample variances R squared of a linear regression (58) and R squared of a linear regression (59) are biased estimators of R squared of a linear regression (60) and R squared of a linear regression (61).

The bias is downwards, that is, they tend to underestimate their population counterparts.

As a consequence, we estimate R squared of a linear regression (62) and R squared of a linear regression (63) with the adjusted sample variances R squared of a linear regression (64) and R squared of a linear regression (65), which are unbiased estimators.

How to cite

Please cite as:

Taboga, Marco (2021). "R squared of a linear regression", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/R-squared-of-a-linear-regression.

R squared of a linear regression (2024)

References

Top Articles
Radical Red Cheat Codes 3.0 Nes
Chase Bank Time Hours
Chs.mywork
Printable Whoville Houses Clipart
Swimgs Yuzzle Wuzzle Yups Wits Sadie Plant Tune 3 Tabs Winnie The Pooh Halloween Bob The Builder Christmas Autumns Cow Dog Pig Tim Cook’s Birthday Buff Work It Out Wombats Pineview Playtime Chronicles Day Of The Dead The Alpha Baa Baa Twinkle
Pixel Speedrun Unblocked 76
Frederick County Craigslist
Jonathon Kinchen Net Worth
Don Wallence Auto Sales Vehicles
FFXIV Immortal Flames Hunting Log Guide
Find All Subdomains
Craigslist Dog Sitter
Music Archives | Hotel Grand Bach - Hotel GrandBach
123 Movies Babylon
Corporate Homepage | Publix Super Markets
Natureza e Qualidade de Produtos - Gestão da Qualidade
Our Facility
Curtains - Cheap Ready Made Curtains - Deconovo UK
Georgia Vehicle Registration Fees Calculator
Illinois VIN Check and Lookup
97226 Zip Code
Hannaford To-Go: Grocery Curbside Pickup
Bennington County Criminal Court Calendar
Red8 Data Entry Job
What Is The Lineup For Nascar Race Today
Naval Academy Baseball Roster
Craigslist Panama City Beach Fl Pets
Kleinerer: in Sinntal | markt.de
Used 2 Seater Go Karts
J&R Cycle Villa Park
Dtlr On 87Th Cottage Grove
Sf Bay Area Craigslist Com
Xfinity Outage Map Lacey Wa
Kstate Qualtrics
Prima Healthcare Columbiana Ohio
Help with your flower delivery - Don's Florist & Gift Inc.
10 Most Ridiculously Expensive Haircuts Of All Time in 2024 - Financesonline.com
Craigs List Stockton
Vivek Flowers Chantilly
Bbc Gahuzamiryango Live
Pepsi Collaboration
The Holdovers Showtimes Near Regal Huebner Oaks
How Many Dogs Can You Have in Idaho | GetJerry.com
Citibank Branch Locations In Orlando Florida
Lamp Repair Kansas City Mo
Tableaux, mobilier et objets d'art
Coffee County Tag Office Douglas Ga
What is a lifetime maximum benefit? | healthinsurance.org
Sitka Alaska Craigslist
Craigslist Sarasota Free Stuff
Call2Recycle Sites At The Home Depot
786 Area Code -Get a Local Phone Number For Miami, Florida
Latest Posts
Article information

Author: Rev. Porsche Oberbrunner

Last Updated:

Views: 5945

Rating: 4.2 / 5 (73 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Rev. Porsche Oberbrunner

Birthday: 1994-06-25

Address: Suite 153 582 Lubowitz Walks, Port Alfredoborough, IN 72879-2838

Phone: +128413562823324

Job: IT Strategist

Hobby: Video gaming, Basketball, Web surfing, Book restoration, Jogging, Shooting, Fishing

Introduction: My name is Rev. Porsche Oberbrunner, I am a zany, graceful, talented, witty, determined, shiny, enchanting person who loves writing and wants to share my knowledge and understanding with you.