x + 2y = 7An infinite number of pairs of values will serve for x and y. These values are "not identified" or "underidentified." There are fewer "knowns" than "unknowns." Here is a different situation:
x + 2y = 7Now there are just as many known as unknowns, and there is one best pair of values (x = 3, y = 2). The system of equation is now "just identified."
3x - y = 7
In structural equation modeling, the knowns consist chiefly of the variances and covariances of the measured variables (but may include other elements as well), while the unknowns consist of model parameters. Identification is an important concern for SEM researchers because the methodology give users the freedom to specify models that are not identified. Here, for example, is a model that is not identified:
Why is this model not identified? Compare knowns and unknowns. If we specify (for convenience) that the latent construct has unit variance, then this model has 4 parameters with unknown values--one loading and one error variance for each X. Now, the variance/covariance matrix of the X's has 3 distinct elements--the variance of each X, and their covariance. So there are 4 unknowns but only 3 knowns, and the model will nto be identified unless additional constraints are imposed.
For measurement models, for example, the "Three Measure Rule" states that a congeneric measurement model will be identified if every latent construct is associated with at least 3 measures. The "Two Measure Rule" states that a congeneric measurement model will be identified if every latent construct is associated with at least 2 measures AND every construct is correlated with at least one other construct. More recent constributions on the identification of measurement models have come from Davis (1993) and Reilly (1995).
The most well-known rule of thumb for the structural model are the "Rank and Order Conditions." These conditions are necessary and sufficient for identification of the structural model when all of the disturbance terms are allowed to correlate. There is also the "Recursive Rule," which says that recursive models are always identified. A recent contribution in this area has come from Rigdon (1995).
Researchers who rely on these heuristics must realize, however, that separately assessing the identification of the measurement model and the structural model can lead to errors. The identification status of the two can be intertwined, so that restrictions in one can aid identification of the other.
This approach has two shortcomings. First, the rank of the information matrix is only evaluated after the parameters have been estimated, and the evaluation only applies at that point in parameter space. In the words of McDonald (1982), who wrote many fundamental papers in this area, the model is identified locally, rather than globally over the whole parameter space. This suggests that a model might be identified at one point in this space but not identified at others.
The second shortcoming is a result of the way this technique is typically implemented in SEM programs. Typically, a program evaluates the rank of the information matrix sequentially, beginning with one row and column (representing one parameter) then going to the first two rows and columns (representing two parameters) . . . and so on until either the entire matrix is evaluated or a rank deficiency occurs. If a deficiency occurs, the program sends a message saying that the corresponding parameter is problematic. The problem is, the reseacher is not informed of how many other parameters may also be involved in the identification problem. Thus, identification error messages produced in this way may mislead researchers about the true nature of the problem. Besides examining the information matrix itself, researchers may also check identification by looking at some by-products. Large standard errors and very highly correlations between parameter estimates may signal identification problems, although it can be hard to tell, based only on these values, whether there is an identification problem, a model fit problem, or no problem at all.
Using modern computer algebra techniques, Bekker, Merckens and Wansbeek (1994) show that the identification of the model can be assessed by evaluating the rank of a subset of this augmented Jacobian matrix, and that this evaluation can be conducted symbolically, before the parameters are estimated, and thus independently of any particular set of parameter values. Still, the assumptions upon which this method is constructed make this a test of local, rather than global, identification. However, the output of this procedure is a report on the identification status of every model parameter. This means that the researcher has a complete list of all problem parameters, which makes it more likely that the problem will be properly understood.
This approach has not yet been implemented in a mainstream SEM program. However, the authors have implemented their technique via a set of programs which are provided on a disk that accompanies their book.
McDonald, R. P. (1982). A note on the investigation of local and global identifiability. Psychometrika, 47(1), 101-3.
Reilly, T. (1995). A necessary and sufficient condition for identification of confirmatory factor analysis models of complexity one. Sociological Methods & Research, 23(4), 421-441.
Rigdon, E. E. (1995). A necessary and sufficient identification rule for structural models estimated in practice. Multivariate Behavioral Research, 30(3), 359-383.