Practice questions for Multivariate Statistics

Discuss the notion of p-value or prob-value as it is used in statistics.
Give an example.

Describe how you could use a box and whisker plot to locate potential outliers.

What does validation mean and how can it be accomplished in multiple regression
analysis? discriminant analysis? cluster analysis? factor analysis?

What graphic tool can an analyst best use to examine the shape of the distribution
of a metric variable? Sketch a simple example of the graphic tool and label
the key features.

Describe and explain the correct use of each of the following four devices
for determining whether a variable is normally distributed.

- frequency histogram

normal probability plot

Shapiro-Wilks (W) test

skewness test

How does the sample size effect your application of the following four
devices for determining whether a variable is normally distributed?

- frequency histogram

normal probability plot

Shapiro-Wilks (W) test

skewness test

Sketch a hypothetical histogram and a boxplot of a data set which is
skewed to the right but has no outliers. Label the axes with hypothetical
but concrete names.

Why is it important to identify the process which produced missing values?

Argue for or against, "Outliers negatively effect data analysis and
should be removed from the results."

The next few questions refer to the situatuation having to do with MBA students in XYZ University.

Discuss the meaning of the index of multiple determination R^2 = 0.5664
**in this problem**.

Using observation number 27 as a randomly chosen example **in this problem**,
discuss the practical significance of the predicted y-value from the regression
analysis. What does practical significance mean in this context?

What variable is most important at predicting a y-value **in this problem**?
Why?

Which applicants' information seem to be strongly influencing the predicted
regression equation **in this problem**? Why do you say so?

How would you assess the linearity between the dependent variable and the
independent variables in the regression analysis **in this problem**.
Explain.

Assess the degree of collinearity **in this problem** and how you would
advise that the model be used in light of this assessment.

a) Which of the statistical readouts of *SAS* could be used to assess
the normality of the variable X2 **in this problem**?

b)What is your conclusion regarding the normality of the variable X2 in
light of the *SAS* information **in this problem**? Explain your
answer.

b) Give an example from the readouts and explain your use of the p-value
**in this problem**.

b) Use the attached table taken from page 104 of your textbook to discuss
adequacy of the sample size in relation to the statistical power and effect
size **in this problem**

Minimum R2 that can be found statistically significant with a power of 0.80 for varying numbers of independent variables and sample sizes (all R2 values in the table are multiplied by 100: 49 means 0.49) | ||||||||||

Significance level = 0.01 | Significance level = 0.05 | |||||||||

Number of variables | Number of variables | |||||||||

n |
2 | 5 | 10 | 20 | n |
2 | 5 | 10 | 20 | |

20 | 45 | 56 | 71 | #N/A | 20 | 39 | 48 | 64 | #N/A | |

50 | 23 | 29 | 36 | 49 | 50 | 19 | 23 | 29 | 42 | |

100 | 13 | 16 | 20 | 26 | 100 | 10 | 12 | 15 | 21 | |

250 | 5 | 7 | 8 | 11 | 250 | 4 | 5 | 6 | 8 | |

500 | 3 | 3 | 4 | 6 | 500 | 3 | 4 | 5 | 7 | |

1000 | 1 | 2 | 2 | 3 | 1000 | 1 | 1 | 2 | 2 |

What does validation mean and how might it be accomplished

What is the estimated prediction equation for predicting the dependent variable

What is the meaning of parameter estimate for the coefficient X4

Describe

Discuss technical issues regarding the use of the

Using observation number 89 as a randomly chosen example, explain and discuss the 95% prediction interval and 95% confidence interval of the predicted person

a) What are the uses of a partial regression residual plot in multiple regression analysis?

b) Demonstrate these uses

What is collinearity in multiple regression analysis? Assess the degree of collinearity and how you would advise that the model

Management specifies a minimum X1 of 430 and a minimum X3 of 2.1. People are not considered unless they meet both minima. How would you advise management regarding their kind of criterion in light of the statistical analysis

Why might one wish to perform a factor analysis and a cluster analysis in the same study?

Clearly explain the meaning of eigenvalues and factor loadings in the context of a principal components analysis.

"If a principal components analysis is performed on an uncorrelated data set, the eigenvalues would all equal one." Please argue for or against this statement.

Argue for the best way to measure similarities among observations in a cluster analysis context.

Give a brief description of how a cluster analysis might be used in a functional area of business of interest to you.

"In a principal components analysis, the first component is generally a good overall measure of variance in the data set." Please comment.

Compare and contrast collinearity and correlation in multivariate analysis.

"A major purpose of principal components/factor analysis is to develop a set of new variables to be used in subsequent analyses." Please comment on this statement.

What are the ideals of simple structure and why would you wish to achieve them?

Distinguish between principal components factor analysis and common factor
analysis (principal factoring) and discuss why one might be preferred over
the other.

"In a decision-making, industrial-application context, linear statistical
models are used more often for prediction than for explanation." Develop
an argument to support or refute this statement.

Why is a separate holdout sample so important in discriminant analysis?

Compare and contrast discriminant analysis and regression analysis.

Calculate the squared distance between the following two centroids.

How would you determine whether or not the classification accuracy of your
discriminant function is sufficiently high relative to chance classification?

Compare and contrast discriminant analysis and cluster analysis.

Describe an effective quantitative way to help choose the number of clusters
in a hierarchical cluster analysis context.

How are quantitative and qualitative factors involved in "naming"
hierarchically derived clusters?

Describe the purpose and use of Kaiser's measure of sampling adequacy (MSA).

Using the concept of variance, argue for a good way to determine the number
of factors to retain in an exploratory factor analysis.

Explain the meaning and use of factor loadings in the context of a principal
components analysis.

What is an orthogonal rotation of factor loadings and what is its purpose?

Provide a possible interpretation of the two retained factors **in this
problem **(and explain your answer).

Why might one wish to perform a factor analysis in a regression analysis
study? Describe the steps to such an approach.

In what situations might you consider transforming a variable?

How can *SAS* be used to check for *multi*collinearity in discriminant
analysis?

What does the adjective "partial" mean when it is used in statistics?

Go back to DSc8450 home