DSc8450

Problem Set 2

Spring 2000

Brian Schott

In my directory (class/dsc845/files) are two files XYZ (the data set)
and XYZ.READ(a *SAS* INPUT statement for the data). The data refer
to XYZ University. Included is a sample of 202 MBA students who have completed
their first year of coursework at XYZ U in the past two years. The objective
of the study is to determine entrance requirements for future students.

The variables collected are defined here.

X1 ID Student identification number

X2 MBA GPA Grade point average for the first year of courses in the XYZ
MBA

X3 GMAT Score on the GMAT test

X4 UGd GPA Undergraduate grade point average

X5 UG Major 1=BBA, 2=Sci, Engr, other technical, 3=other

X6 SchlRate UG School rating: 5= top 20%, 1=bottom 20%

X7 Age Age of student in years

X8 Foreign 1=foreign citizen, 0=US citizen

Success 1=MBA GPA>3.2, 0=MBA GPA<=3.2

**1** The purpose of this problem is to perform a regression analysis
which predicts or explains the variable X2 (**MBA GPA**) using PROC REG.
As a minimum, do the following things:

1a) From your knowledge of regression analysis, discuss briefly how the
results of following regression model can be used to predict **MBA GPA**.

proc reg;

model x2 = x3 x4;

1b) Produce a **partial regression plot** corresponding to the variable
**GMAT** in the above model to discuss influential observations.

1c) Construct a **residuals plot** of the variate in the above model
and discuss the corresponding relevant regression model **assumptions**.

1d) Using one of the **sequential selection options** of PROC REG
and only the list of variables below, construct an alternative model (designed
to predict the variable **MBA GPA**) to the one in question 1a).

X3

X4

X6

X7

X8

1e) **Compare** the two models above with respect to predictive and
explanatory power.

**2** Discriminant analysis to predict **success **(start with **all**
the variables **except** never use X2 or X5 ) using PROC STEPDISC and
PROC DISCRIM (with the CAN option). As a minimum, do the following things.

2a) Use the stepwise discriminant procedure PROC STEPDISC without a holdout sample and determine the "best" set of discriminating variables.

2b)Using only the "best" set of discriminating variables, perform
a linear discriminant analysis without a holdout sample using linear discriminant
function (ie. PROC DISCRIM with the option POOL = YES). Request in the same
*SAS* run a crossvalidation which holds out one observation out at
a time to test the classification accuracy of the model. (That is, use the
*SAS* DISCRIM option CROSSVALIDATE.)Request the option which prints
out all of the cases and how they are classified.

2c)State and describe the face validity of the resulting **linear**
discriminant function produced above. You may need to force a **linear**
discriminant function by requesting the option POOL=YES.

2d) Assess the practical value of the analyses you have performed using and comparing the hit matrices.

Due Wednesday, April 4