Forecasting With Regression Analysis

House Price Example

1 Indistinguishable and Distinguishable Data
 
Forecasting Based on "Indistinguishable" Data

 
Forecast the price of your house based on prices only of a sample of ten houses sold in your city in the past year.



Average and Standard Deviation of all ten houses

Forecasting Based on "Distinguishable" Data

   
Forecast the price of your house based on prices and square footages of a sample of ten houses sold in your city in the past year.


 
Average and standard deviation of all ten houses
Average and standard deviation of  4 smallest houses
Average and standard deviation of  4 middle-sized houses
Average and standard deviation of  2 large houses.

Sample Residuals and Their Standard Deviations



Difference between each house and its category mean

Using the Data More Efficiently



Lookalike cells: houses "almost like" your house but not "exactly like"
4
A Regression Model

Estimated price = b0   +   b1 * square footage


b1 = increase of decrease of average selling price
corresponding to increase or decrease of 1 square foot increase


b0 = fudge factor to put the numbers in the right ballpark  



( A simpleminded mathematical interpretation: b0 =  Price of a zero square foot house)
5
Inputs to a Regression Analysis



Identify the Dependent Variable
Specify the Independent Variable or Variables
Specify the Relevant Data
Specify the Nature of the Relationship.  
LINEAR UNLESS YOU HAVE STRONG BUSINESS REASONS OTHERWISE
Provide Values of the Dependent Variable
Paired With Values of the Independent Variable or Variables
6
Outputs From A Regression Analysis

6 Regression Coefficients


7 Multiple Regression: effect of size and age on proce

8 Uncertainty in Regression Coefficients: standard error, t-statistic

9 Proxy Effects (Multicolinearity)

10 Two Common Misconceptions


"It is often incorrectly assumed that variables used in a regression must be measured in comparable units"


"It is often incorrectly assumed that variables used in a regression must be uncorrelated



Correlation does very little harm to the power to predict the dependent variable,
but it kills  power to explain or influence the dependent variable
10
Forecasts

Point Forecasts

Probabilistic Forecasts

Additional  Sources of Uncertainty
13
Measures of Goodness of Fit

Residual Standard Deviation - measured in units of y

Coefficient of Determination (R2) - % improvement in fit: 0% = indistinguishable, 100% = perfect

"Adjusted" R2 - Ockham's Razor

Interpretation of  R2 - even small R2 can be useful if sample is huge and forecasting is difficult
15
Transformed Variables

Lagged Variables in Time Series for Modeling Non-Contemporaneous Effects

Dummy Variables for Modeling Effects of Ordinal or Categorical Variables

Detecting, Studying, and Interpreting Curvilinear Relationships: Exploratory Analysis



Do Not Use Curvilinear Relationships
Unless There Are Strong Business Reasons to Use a Particular Relationship