Problem Set 2
Spring 2000
Brian Schott

In my directory (class/dsc845/files) are two files XYZ (the data set) and XYZ.READ(a SAS INPUT statement for the data). The data refer to XYZ University. Included is a sample of 202 MBA students who have completed their first year of coursework at XYZ U in the past two years. The objective of the study is to determine entrance requirements for future students.

The variables collected are defined here.

X1 ID Student identification number
X2 MBA GPA Grade point average for the first year of courses in the XYZ MBA
X3 GMAT Score on the GMAT test
X4 UGd GPA Undergraduate grade point average
X5 UG Major 1=BBA, 2=Sci, Engr, other technical, 3=other
X6 SchlRate UG School rating: 5= top 20%, 1=bottom 20%
X7 Age Age of student in years
X8 Foreign 1=foreign citizen, 0=US citizen
Success 1=MBA GPA>3.2, 0=MBA GPA<=3.2

1 The purpose of this problem is to perform a regression analysis which predicts or explains the variable X2 (MBA GPA) using PROC REG. As a minimum, do the following things:

1a) From your knowledge of regression analysis, discuss briefly how the results of following regression model can be used to predict MBA GPA.

proc reg;
model x2 = x3 x4;

1b) Produce a partial regression plot corresponding to the variable GMAT in the above model to discuss influential observations.

1c) Construct a residuals plot of the variate in the above model and discuss the corresponding relevant regression model assumptions.

1d) Using one of the sequential selection options of PROC REG and only the list of variables below, construct an alternative model (designed to predict the variable MBA GPA) to the one in question 1a).


1e) Compare the two models above with respect to predictive and explanatory power.

2 Discriminant analysis to predict success (start with all the variables except never use X2 or X5 ) using PROC STEPDISC and PROC DISCRIM (with the CAN option). As a minimum, do the following things.

2a) Use the stepwise discriminant procedure PROC STEPDISC without a holdout sample and determine the "best" set of discriminating variables.

2b)Using only the "best" set of discriminating variables, perform a linear discriminant analysis without a holdout sample using linear discriminant function (ie. PROC DISCRIM with the option POOL = YES). Request in the same SAS run a crossvalidation which holds out one observation out at a time to test the classification accuracy of the model. (That is, use the SAS DISCRIM option CROSSVALIDATE.)Request the option which prints out all of the cases and how they are classified.

2c)State and describe the face validity of the resulting linear discriminant function produced above. You may need to force a linear discriminant function by requesting the option POOL=YES.

2d) Assess the practical value of the analyses you have performed using and comparing the hit matrices.

Due Wednesday, April 4

Go back to DSc8450 home