# Methodological Alternatives to SEM/CFA

While structural equation modeling (SEM), based on a confirmatory factor analysis (CFA) measurement model, is a very general methodology, it is not the solution to every problem. Here are some methodologies that are "related" to SEM and CFA, but which are appropriate to rather different research problems.

## Alternatives to CFA

Partial Least Squares (PLS)

Exploratory Factor Analysis

Unrestricted Factor Analysis

Latent Class Analysis

Item Response Theory (IRT)

Principal Components Analysis (PCA)

Formative (or "Causal") Indicators

## Partial least squares

Partial least squares (PLS) was invented by Herman Wold (mentor to Karl Jöreskog, a founder of SEM), as an analytical alternative for situations where theory is weak and where the available manifest variables or measures would likely not conform to a rigorously specified measurement model. For these reasons, Wold labeled his approach, "soft modeling." In recent years, active PLS researchers, including Wold's son, Svante, and Fred Bookstein, have developed and even redefined PLS, so there is room for some confusion. The material presented here adheres most closely to Wold's original formulation, but it is generally applicable to the other forms, as well.

Rather than using the factor analytic measurement model associated with SEM, PLS more often uses a principal components measurement model, where the "latent constructs" are defined as linear composites of the measures associated with them. In restricted cases, these linear composites are equivalent to principal components.

Also, rather than seek overall optimization in parameter estimates through a full information estimation technique (such as maximum likelihood), Wold opted for limited information methods that provide statistically inferior estimates but which make minimal demands on the data. Thus, PLS may represent a pragmatic alternative to SEM. In addition, the PLS method is designed to maximize prediction rather than fit. That is, the method is optimized to maximize the proportion of variance of the dependent "construct" that is explained by the predictor "constructs." SEM, by contrast, is designed to maximize and then test the degree of consistency between model and data.

While many researchers are devoted to PLS as a practical method, others find it objectionable. Some argue that the statistical properties of the method are not well understood, and that PLS users rely too much on mere assertions that are not supported by rigorous analysis. Other researchers argue that the "latent constructs" in PLS are not really "latent" at all, since they are strict linear composites of observed variables.

For more information about PLS, see Wynne Chin's PLS Web site or Wynne's overview chapter in Marcoulides (1998). Falk and Miller's (1992) short text provides a readable introduction to the practice and philosophy of PLS. McDonald's (1996) paper is a fascinating look at exactly how PLS and SEM differ, and showing how standard SEM programs can be used to mimic PLS estimation methods.

You can download PLS-PC, perhaps the most widely known PLS program, from Jack McArdle's web site.

(This section is essentially taken from Bollen & Ting 1993, Ting 1995, and Spirtes, Glymour and Scheines 1993.)
While structural equation modeling works generally at the level of variances and covariances, tetrad analysis works with relations between sets of four covariances at once. A tetrad (from the Greek word for "four") for four variables g, h, i, j is defined as: where indicates, of course, the covariance between the subscripted variables. A tetrad that is equal to 0 is called a "vanishing tetrad."

Tetrad analysis actually encompasses two very different methodologies. Confirmatory tetrad analysis (Bollen & Ting 1993) evaluates a proposed structural equation model by determining whether the vanishing tetrads implied by the model are really present, within sampling error. Spirtes, Glymour and Scheines' (1993) TETRAD II package explores a covariance matrix, looking for models that are consistent with it. Thus, their work presents an exception to the generally confirmatory character of SEM.

Confirmatory tetrad analysis may be useful to researchers who are facing identification problems. A structural equation model that is not identified in terms of its parameters may still be identified in terms of vanishing tetrads. So a researcher may be able to derive fit information for the model based on tetrad analysis. In addition, two models that are not nested in terms of their parameters may be nested in terms of vanishing tetrads. That is, one model may imply a set of vanishing tetrads that is a subset of those implied by the other model, facilitating a comparative analysis of the two models. However, performing this tetrad analysis by hand is tricky. Many of the tetrads implied by a model will be redundant with other tetrads, but an analysis must be based on a unique set. It can be difficult to determine which tetrads are unique. Ting (1995) describes a SAS macro, called CTA-SAS, which performs this analysis. The program and documentation are available via ftp.

The exploratory tools developed by Spirtes, et al. may be useful to modelers facing a covariance matrix with little prior information. Indeed, some related work by Glymour and others has appeared in the literature on "data mining," a field which deals with the increasingly common situation of an analyst being overwhelmed by the quantity of available data.

However, these tools may also be useful in a confirmatory context. Researchers may wish to evaluate fit information for one model in light of fit information for other models. If many models fit a data set well, then the good fit of one model would seem to offer less compelling evidence in favor of the theory that inspired the model. This procedure might also be sued to find equivalent models empirically.

Spirtes et al. have suggested that augmenting "traditional" SEM analysis with their exploratory approach is more likely to produce the "true" model for a data set (if one exists) than is SEM analysis by itself.

In their work, Spirtes et al. use a distinctive, graphically oriented language. A link between two constructs or variables is called an "edge." If the link represents a covariance between the two, it is called an "undirected edge"; otherwise, it is called a "directed edge." Depending on the algorithm used, TETRAD II may begin with a model where all variables are correlated (an "undirected graph") and then proceed to eliminate edges where possible and turn undirected edges into directed edges. The result is a "partially direct acyclic graph" which fits the data set to a tolerance specified by the user. The word "acyclic" indicates that the resulting model will not involve reciprocal or feedback relations--a limitation of the method.

.

## Latent class models

Latent class analysis (LCA) is appropriate when the dependent variable is intrinsically categorical--that is, when it boils down to distinguishing between those who take an action and those who do not, or between those who vote for different political candidates. In SEM, ordinal measures may be used to reflect continuous latent variables, but in LCA both the measure and the underlying latent variable are categorical. For more information, see Keith Markus' LCA Web site, or Karen Schmidt McCollam's overview chapter in Marcoulides (1998).

## Unrestricted factor analysis

(This section consists of edited SEMNET comments from Stan Mulaik.)

Researchers can use unrestricted factor analysis to test only the hypothesis that relations between a set of measures can be explained by a certain number of common factors, without specifying a particular factor structure. This allows the researcher to determine whether any p-factor model will provide acceptable fit. If this hypothesis fails, then there is no point in testing more constrained CFA models.

The unrestricted model was described by Karl Jöreskog in an addendum to a reprint of the first paper on confirmatory factor analysis that appeared in Psychometrika in 1972. This addendum is to be found in:

Jöreskog, K. G. & Sörbom, D. (1979), In Jay Magidson (Ed.), Advances in Factor Analysis and Structural Equation Models. Cambridge, MA: Abt Books.

On pp.40-43, Jöreskog describes how to specify an unrestricted model for k factors:

• Fix the k variances of the latent common factors to unity, and free the covariances between the common factors.
• For each factor, pick a manifest variable that one expects to have the highest or near highest loading on that
• factor. Free its loading in the factor loading matrix. Fix the k - 1 remaining loadings of that variable on the remaining factors to 0. You will now have k rows of Lambda with k - 1 zeros in each of them
For a model with two factors and four measures per factor, the factor covariance and loading matrices should look like this (where 0 indicates an element fixed to 0 and ? indicates a parameter to be estimated):

```      ?  0
?  ?
?  ?
?  ?
0  ?
?  ?
?  ?             1
?  ?             ?  1
Lambda            Phi
```
Experience has shown that a problem you may have with convergence will be cleared up with good starting values. Use an exploratory factor analysis as the basis for picking what variables to choose to specify the above model and use the corresponding parameter values as starting values for iterations. The result will fit as well as a two-factor exploratory factor analysis estimated in the same way.

An alternative method for specifying the above model is to free all elements of Phi while fixing the supposed highest loading item on a factor to unity and its loading on the other factors to zero. This is often the way to proceed in doing multiple group analyses where you want to set the metric without explicitly fixing the variances of the common factors so they can differ across groups while the Lambda loadings are constrained to be equal across groups.

If you decide that k is the number of factors, then you can impose additional restrictions on the loadings (e.g. setting some to zero) to specify a measurement model. The measurement model may not fit, and you might then use modification indicies to free up zero loadings, one at a time, to get an optimal measurement model (at the expense of lost degrees of freedom).

Once you have a good measurement model, you can then test your structural model involving relations between latent variables.

As Roger Millsap later pointed out, in a message to SEMNET:

In the addendum, Jöreskog attempted to lay out sufficient conditions for "identification" in factor analysis through fixing various loadings to zero or nonzero values. Joöreskog himself noted in a later paper that these conditions were not sufficient in general to identify the model, and were only sufficient to achieve rotational uniqueness. This is all contained in the paper:

Bollen & Jöreskog (1985), Uniqueness does not imply identification. Sociological Methods and Research, 14, 155-163.

The distinction is also mentioned in Bollen's (1989) text. I was also not fully aware of the distinction until a few years ago when I was writing a paper on identification in MTMM factor models, and found the above paper. The basic point is that using Jöreskog's restrictions, one can show that they are sufficient to define a unique rotational position for the factors. They are not sufficient in general to define a unique solution for ALL of the factor model parameters however, which include the unique variances as well as the loadings and common factor covariances.

Stan Mulaik then asked whether a UFA solution would be equivalent to a EFA solution for the same data and same number of factors. Catharina Hartmann provided an answer, originally due to Karl Jöreskog, and referencing Lawley and Maxwell:

Jöreskog indicated that, even if two solutions are equivalent [i.e. the EFA solution vs the CFA unrestricted model so- lution], the chi-square from Lisrel will be slightly higher than the one in EFA because of different multiplicative factors in front of the ML fit function. For example, Lawley and Maxwell (1971), pp. 35-36, provided the following multiplicative factor:

n-(2p+5)/6, where p=number of variables and n=N-1, N=number of observations

Roger Millsap replied:
It turns out that Lawley and Maxwell had advocated a slightly different multiplier:

n-(2p+4r+5)/6 where r=number of factors

## Item response theory

For more information about Rasch modeling and/or item response theory (IRT), there are two online sources to recommend. The web site for the journal, Rasch Measurement Transactions, includes the journal's past contents and instructions for subscribing to a listserv devoted to the Rasch approach. A newer mailing list, maintained by R. J. Harvey, solicits more general discussion of IRT. To subscribe to the latter, send email with body:

subscribe IRT firstname lastname

## References

Bollen, K. A. & Ting, K. F. (1993). Confirmatory tetrad analysis. In P. Marsden (Ed.), Sociological methodology (pp. 147-175), Washington, DC: American Sociological Association.

Chin, W.W. (1998). The partial least squares approach for structural equation modeling. In G.A. Marcoulides (Ed.), Modern methods for business research (pp. 295-336), Mahwah, NJ: Lawrence Erlbaum.

McDonald, R.P. (1996). Path analysis with composite variables. Multivariate Behavioral Research, 31(2), 239-270. Schmidt McCollam, K.M. (1998). Latent trait and latent class models. In G.A. Marcoulides (Ed.), Modern methods for business research (pp. 23-46), Mahwah, NJ: Lawrence Erlbaum.

Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction and search. New York: Springer-Verlag.

Ting, K. F. (1995). Confirmatory tetrad analysis with SAS. Structural Equation Modeling, 2(2), 163-171.

http://www.gsu.edu/~mkteer/relmeth.html