Practice questions for Multivariate Statistics
Discuss the notion of p-value or prob-value as it is used in statistics.
Give an example.
Describe how you could use a box and whisker plot to locate potential outliers.
What does validation mean and how can it be accomplished in multiple regression
analysis? discriminant analysis? cluster analysis? factor analysis?
What graphic tool can an analyst best use to examine the shape of the distribution
of a metric variable? Sketch a simple example of the graphic tool and label
the key features.
Describe and explain the correct use of each of the following four devices
for determining whether a variable is normally distributed.
How does the sample size effect your application of the following four
devices for determining whether a variable is normally distributed?
Sketch a hypothetical histogram and a boxplot of a data set which is
skewed to the right but has no outliers. Label the axes with hypothetical
but concrete names.
Why is it important to identify the process which produced missing values?
Argue for or against, "Outliers negatively effect data analysis and
should be removed from the results."
Discuss the meaning of the index of multiple determination R^2 = 0.5664
in this problem.
Using observation number 27 as a randomly chosen example in this problem,
discuss the practical significance of the predicted y-value from the regression
analysis. What does practical significance mean in this context?
What variable is most important at predicting a y-value in this problem?
Why?
Which applicants' information seem to be strongly influencing the predicted
regression equation in this problem? Why do you say so?
How would you assess the linearity between the dependent variable and the
independent variables in the regression analysis in this problem.
Explain.
Assess the degree of collinearity in this problem and how you would
advise that the model be used in light of this assessment.
a) Which of the statistical readouts of SAS could be used to assess
the normality of the variable X2 in this problem?
b)What is your conclusion regarding the normality of the variable X2 in
light of the SAS information in this problem? Explain your
answer.
b) Give an example from the readouts and explain your use of the p-value
in this problem.
b) Use the attached table taken from page 104 of your textbook to discuss
adequacy of the sample size in relation to the statistical power and effect
size in this problem
| Minimum R2 that can be found statistically significant with a power of 0.80 for varying numbers of independent variables and sample sizes (all R2 values in the table are multiplied by 100: 49 means 0.49) | ||||||||||
| Significance level = 0.01 | Significance level = 0.05 | |||||||||
| Number of variables | Number of variables | |||||||||
| n | 2 | 5 | 10 | 20 | n | 2 | 5 | 10 | 20 | |
| 20 | 45 | 56 | 71 | #N/A | 20 | 39 | 48 | 64 | #N/A | |
| 50 | 23 | 29 | 36 | 49 | 50 | 19 | 23 | 29 | 42 | |
| 100 | 13 | 16 | 20 | 26 | 100 | 10 | 12 | 15 | 21 | |
| 250 | 5 | 7 | 8 | 11 | 250 | 4 | 5 | 6 | 8 | |
| 500 | 3 | 3 | 4 | 6 | 500 | 3 | 4 | 5 | 7 | |
| 1000 | 1 | 2 | 2 | 3 | 1000 | 1 | 1 | 2 | 2 | |
Distinguish between principal components factor analysis and common factor
analysis (principal factoring) and discuss why one might be preferred over
the other.
"In a decision-making, industrial-application context, linear statistical
models are used more often for prediction than for explanation." Develop
an argument to support or refute this statement.
Why is a separate holdout sample so important in discriminant analysis?
Compare and contrast discriminant analysis and regression analysis.
Calculate the squared distance between the following two centroids.
How would you determine whether or not the classification accuracy of your
discriminant function is sufficiently high relative to chance classification?
Compare and contrast discriminant analysis and cluster analysis.
Describe an effective quantitative way to help choose the number of clusters
in a hierarchical cluster analysis context.
How are quantitative and qualitative factors involved in "naming"
hierarchically derived clusters?
Describe the purpose and use of Kaiser's measure of sampling adequacy (MSA).
Using the concept of variance, argue for a good way to determine the number
of factors to retain in an exploratory factor analysis.
Explain the meaning and use of factor loadings in the context of a principal
components analysis.
What is an orthogonal rotation of factor loadings and what is its purpose?
Provide a possible interpretation of the two retained factors in this
problem (and explain your answer).
Why might one wish to perform a factor analysis in a regression analysis
study? Describe the steps to such an approach.
In what situations might you consider transforming a variable?
How can SAS be used to check for multicollinearity in discriminant
analysis?
What does the adjective "partial" mean when it is used in statistics?
Go back to DSc8450 home