# Forecasting With Regression Analysis

House Price Example

 1 Indistinguishable and Distinguishable Data Forecasting Based on "Indistinguishable" Data Forecast the price of your house based on prices only of a sample of ten houses sold in your city in the past year. Average and Standard Deviation of all ten houses Forecasting Based on "Distinguishable" Data Forecast the price of your house based on prices and square footages of a sample of ten houses sold in your city in the past year. Average and standard deviation of all ten houses Average and standard deviation of  4 smallest houses Average and standard deviation of  4 middle-sized houses Average and standard deviation of  2 large houses. Sample Residuals and Their Standard Deviations Difference between each house and its category mean Using the Data More Efficiently Lookalike cells: houses "almost like" your house but not "exactly like" 4 A Regression Model Estimated price = b0   +   b1 * square footage b1 = increase of decrease of average selling price corresponding to increase or decrease of 1 square foot increase b0 = fudge factor to put the numbers in the right ballpark ( A simpleminded mathematical interpretation: b0 =  Price of a zero square foot house) 5 Inputs to a Regression Analysis Identify the Dependent Variable Specify the Independent Variable or Variables Specify the Relevant Data Specify the Nature of the Relationship.   LINEAR UNLESS YOU HAVE STRONG BUSINESS REASONS OTHERWISE Provide Values of the Dependent Variable Paired With Values of the Independent Variable or Variables 6 Outputs From A Regression Analysis 6 Regression Coefficients 7 Multiple Regression: effect of size and age on proce 8 Uncertainty in Regression Coefficients: standard error, t-statistic 9 Proxy Effects (Multicolinearity) 10 Two Common Misconceptions "It is often incorrectly assumed that variables used in a regression must be measured in comparable units" "It is often incorrectly assumed that variables used in a regression must be uncorrelated Correlation does very little harm to the power to predict the dependent variable, but it kills  power to explain or influence the dependent variable 10 Forecasts Point Forecasts Probabilistic Forecasts Additional  Sources of Uncertainty 13 Measures of Goodness of Fit Residual Standard Deviation - measured in units of y Coefficient of Determination (R2) - % improvement in fit: 0% = indistinguishable, 100% = perfect "Adjusted" R2 - Ockham's Razor Interpretation of  R2 - even small R2 can be useful if sample is huge and forecasting is difficult 15 Transformed Variables Lagged Variables in Time Series for Modeling Non-Contemporaneous Effects Dummy Variables for Modeling Effects of Ordinal or Categorical Variables Detecting, Studying, and Interpreting Curvilinear Relationships: Exploratory Analysis Do Not Use Curvilinear Relationships Unless There Are Strong Business Reasons to Use a Particular Relationship