Covariances vs. Pearson Correlations

SEM is also known as "analysis of covariance structuress." Does this mean that SEM analysis should always be conducted with covariance matrices, or can (Pearson) correlation matrices serve just as well? Here are some edited remarks from SEMNET subscribers (errors are mine)

James Steiger writes:
Whether or not you should use covariances depends (a) on the specific question you wish to ask, and (b) the software you are using.

(a) If, for example, your statistical question is not "scale free," i.e., depends directly on the scale of the measurements, then you may be restricted to examining covariances.

Ed Rigdon explains:
Suppose you have a model which includes a latent construct that has only one measure. Then, typically, one must either fix the error variance for that measure to a particular value, to reflect the proportion of the measure's variance that is due to error. If you choose a value other than zero, then the value you choose will depend on the variance, or scale, of the measure in question. If you change the scale of the measure, by going from covariance to correlations or vice versa, then the value you chose becomes incorrect. Thus, your model is no longer "scale free."
On the other hand, if your question is framed in a scale free fashion, then you may, in fact, be better off examining correlations. An example of the latter occurs in the context of factor analysis, where people nearly always factor a correlation matrix.

(b) Classical SEM statistical theory is based on the distributional properties of the elements of a covariance matrix. Unfortunately, covariance and correlation matrices have different distributional properties, so if you use theory based on a covariance matrix to analyze a correlation matrix, you can have problems. If you input a correlation matrix as if it were a covariance matrix, and "fool" your software, you will almost certainly get wrong standard errors. This was the subject of a Psychological Bulletin article by Bob Cudeck.

David Ronis provided the cite:
Cudeck, R. (1989). Analysis of correlation matrices using covariance structure models. Psychological Bulletin, 105, 317-327.
Jim Steiger earlier pointed to a demonstration of this problem:
Lawley and Maxwell give parameter estimates for a confirmatory factor analysis in chapter 7 of their book. Their Table 7.9 gives the parameter estimates to two decimals. Their Table 7.10 gives correct standard errors, along with the incorrect standard errors obtained when theory applicable for a covariance matrix is applied to their correlation matrix. Their model is a confirmatory factor model. You will notice that the ratio of the incorrect to correct standard errors exceeds 2 to 1 in several cases, and consequently "T-statistics" for such parameters would be off by a factor of 2.
Some software is set up to provide correct standard errors whether a correlation matrix or a covariance matrix is analyzed. These programs include my own SEPATH (part of Statistica for Windows) and RAMONA by Michael Browne, part of SYSTAT 6. Other software, such as LISREL and EQS, do not produce the correct standard errors by default, and may require some additional programming.

In this regard, Alain Marcharnd passed along a note from Leo Stam, of Scientific Software, which distributes LISREL, demonstrating how to obtain correct standard errors from the analysis of correlation matrices:
I first give the Steiger example producing the "wrong" standard errors in LISREL syntax:

Lawley Factor Analysis Example. "Wrong" standard errors.
DA NI=9 NO=72 MA=KM
LA
VIS_PERC CUBES LOZENGES PAR_COMP SEN_COMP WRD_MNG ADDITION CNT_DOT ST_CURVE
KM=LAWLEY.COR
MO NX=9 NK=3 PH=ST
LK
Visual Verbal Speed
FR LX 1 1 LX 2 1 LX 3 1 LX 4 2 LX 5 2 LX 6 2 LX 7 3 LX 8 3 LX 9 1 LX 9 3
OU
Next is the command file that produces the "correct" standard errors.

Lawley Factor Analysis Example. Correct standard errors.
DA NI=9 NO=72 MA=KM
LA
VIS_PERC CUBES LOZENGES PAR_COMP SEN_COMP WRD_MNG ADDITION CNT_DOT ST_CURVE
KM=LAWLEY.COR
MO NY=9 NE=9 NK=3 LY=DI,FR GA=FI PH=ST PS=DI TE=ZE
LK
Visual Verbal Speed
FR GA 1 1 GA 2 1 GA 3 1 GA 4 2 GA 5 2 GA 6 2 GA 7 3 GA 8 3 GA 9 1 GA 9 3
CO PS 1 1 = 1 - GA 1 1 ** 2
CO PS 2 2 = 1 - GA 2 1 ** 2
CO PS 3 3 = 1 - GA 3 1 ** 2
CO PS 4 4 = 1 - GA 4 2 ** 2
CO PS 5 5 = 1 - GA 5 2 ** 2
CO PS 6 6 = 1 - GA 6 2 ** 2
CO PS 7 7 = 1 - GA 7 3 ** 2
CO PS 8 8 = 1 - GA 8 3 ** 2
CO PS 9 9 = 1 - GA 9 1 ** 2 - GA 9 1 * GA 9 3 - GA 9 3 ** 2
OU SO
The factor loadings with the standard errors can now be found in the Gamma matrix. You may want to check that the fit indices for these two different formulations of the same exploratory factor analysis model are the same.

James Steiger concludes:
So really, succinctly speaking, there are two questions you have to answer. One involves the limitations imposed by the statistical question you are asking, the other involves the limitations imposed by the software you are using.

On this point, David Kaplan adds:
I agree with Jim and simply wish to add that one is always on safe ground by analyzing the covariance matrix and asking for the standardized solution. This is also true for multi-sample modeling where one would want to examine solutions standardized to a common metric. This issue is discussed in the LISREL manual.

Ed Rigdon adds:
Modeling nonlinear and interaction relationships between latent variables also requires the use of covariances instead of correlations. This class of models involves variables that are the products of other variables. Even if the original variables had unit variance, the product will not, unless the original variables are uncorrelated.


http://www.gsu.edu/~mkteer/covcorr.html
Return to the SEMNET FAQ home page.
Return to Ed Rigdon's home page.
Last updated: April 29, 1996