1  Indistinguishable and Distinguishable Data  

Forecasting
Based on "Indistinguishable" Data 


Forecast
the price of your house based on prices only of a sample of ten houses
sold in your city in the past year. 

Average
and Standard Deviation of all ten houses 

Forecasting
Based on "Distinguishable" Data 


Forecast the price of your house based on prices and square footages of a sample of ten houses sold in your city in the past year.  

Average
and standard deviation of all ten houses Average and standard deviation of 4 smallest houses Average and standard deviation of 4 middlesized houses Average and standard deviation of 2 large houses. 

Sample
Residuals and Their Standard Deviations 

Difference between each house and its category mean  
Using
the Data More Efficiently 

Lookalike cells: houses "almost
like" your house but not "exactly like" 

4 
A Regression Model  
Estimated
price = b_{0} + b_{1} * square footage 

b_{1}
= increase of decrease of average selling price corresponding to increase or
decrease of 1 square foot increase


b_{0}
= fudge factor to put the numbers in the right ballpark 

( A simpleminded mathematical
interpretation: b0 = Price of a zero square foot house) 

5 
Inputs to a Regression Analysis  
Identify the Dependent Variable Specify the Independent Variable or Variables Specify the Relevant Data Specify the Nature of the Relationship. LINEAR UNLESS YOU HAVE
STRONG BUSINESS REASONS OTHERWISE
Provide Values of the Dependent Variable Paired With Values of the
Independent Variable or Variables


6 
Outputs From A Regression Analysis  
6
Regression Coefficients 

7
Multiple Regression: effect of size and age on proce 

8
Uncertainty
in Regression Coefficients: standard error, tstatistic 

9 Proxy
Effects (Multicolinearity) 

10 Two Common Misconceptions  
"It is
often incorrectly assumed that variables used in a regression must be
measured in comparable units" 

"It is
often incorrectly assumed that variables used in a regression must be
uncorrelated 

Correlation does very
little harm to the power to predict the dependent variable, but it kills power
to explain or influence the dependent variable


10

Forecasts  
Point
Forecasts 

Probabilistic Forecasts 

Additional
Sources of Uncertainty 

13 
Measures of Goodness of Fit  
Residual Standard Deviation  measured in units of y  
Coefficient
of Determination (R^{2})  %
improvement in fit: 0% = indistinguishable, 100% = perfect 

"Adjusted"
R^{2 } Ockham's Razor 

Interpretation
of R^{2}  even small R^{2}
can be useful if sample is huge and forecasting is difficult 

15 
Transformed Variables 

Lagged
Variables in Time Series for Modeling NonContemporaneous Effects 

Dummy Variables for Modeling Effects of Ordinal or Categorical Variables  
Detecting,
Studying, and Interpreting Curvilinear Relationships: Exploratory
Analysis 

Do Not Use Curvilinear
Relationships Unless There Are Strong
Business Reasons to Use a Particular Relationship
