x + 2y = 7An infinite number of pairs of values will serve for x and y. These values are "not identified" or "underidentified." There are fewer "knowns" than "unknowns." Here is a different situation:
x + 2y = 7Now there are just as many known as unknowns, and there is one best pair of values (x = 3, y = 2). The system of equation is now "just identified."
3x - y = 7
In structural equation modeling, the knowns consist chiefly of the variances and covariances of the measured variables (but may include other elements as well), while the unknowns consist of model parameters. Identification is an important concern for SEM researchers because the methodology give users the freedom to specify models that are not identified. Here, for example, is a model that is not identified:

Why is this model not identified? Compare knowns and unknowns. If we specify (for convenience) that the latent construct has unit variance, then this model has 4 parameters with unknown values--one loading and one error variance for each X. Now, the variance/covariance matrix of the X's has 3 distinct elements--the variance of each X, and their covariance. So there are 4 unknowns but only 3 knowns, and the model will nto be identified unless additional constraints are imposed.
For measurement models, for example, the "Three Measure Rule" states that a congeneric measurement model will be identified if every latent construct is associated with at least 3 measures. The "Two Measure Rule" states that a congeneric measurement model will be identified if every latent construct is associated with at least 2 measures AND every construct is correlated with at least one other construct. More recent constributions on the identification of measurement models have come from Davis (1993) and Reilly (1995).
The most well-known rule of thumb for the structural model are the "Rank and Order Conditions." These conditions are necessary and sufficient for identification of the structural model when all of the disturbance terms are allowed to correlate. There is also the "Recursive Rule," which says that recursive models are always identified. A recent contribution in this area has come from Rigdon (1995).
Researchers who rely on these heuristics must realize, however, that separately assessing the identification of the measurement model and the structural model can lead to errors. The identification status of the two can be intertwined, so that restrictions in one can aid identification of the other.
In structural equation modeling, the "information matrix" (E, at left: this version is taken from Jöreskog & Sörbom, 1989) is the matrix of second order derivatives of the fit or discrepancy function with respect to all the free parameters of the model. If the model's parameters are all identified, then the rank of the information matrix will be equal to the number of free parameters in the model (equivalently, the matrix will be positive definite); if not identified, then rank will be deficient. This is analogous to checking for multicollinearity in a regression by evaluating the rank of the covariance matrix of the predictors. In fact, some SEM programs, such as EQS, report identification problems detected in this way by saying that one parameter in the model "is linearly dependent on" some other parameter(s).This approach has two shortcomings. First, the rank of the information matrix is only evaluated after the parameters have been estimated, and the evaluation only applies at that point in parameter space. In the words of McDonald (1982), who wrote many fundamental papers in this area, the model is identified locally, rather than globally over the whole parameter space. This suggests that a model might be identified at one point in this space but not identified at others.
The second shortcoming is a result of the way this technique is typically implemented in SEM programs. Typically, a program evaluates the rank of the information matrix sequentially, beginning with one row and column (representing one parameter) then going to the first two rows and columns (representing two parameters) . . . and so on until either the entire matrix is evaluated or a rank deficiency occurs. If a deficiency occurs, the program sends a message saying that the corresponding parameter is problematic. The problem is, the reseacher is not informed of how many other parameters may also be involved in the identification problem. Thus, identification error messages produced in this way may mislead researchers about the true nature of the problem. Besides examining the information matrix itself, researchers may also check identification by looking at some by-products. Large standard errors and very highly correlations between parameter estimates may signal identification problems, although it can be hard to tell, based only on these values, whether there is an identification problem, a model fit problem, or no problem at all.
Recently, Bekker, Merckens and Wansbeek (1994) presented an approach which involves evaluating an augmented version of the Jocobian matrix (at right)--the matrix of first order derivatives of the discrepancy function with respect to the free parameters. Their Jacobian is augmented because it also includes equations representing restrictions, such as equality constraints, imposed on the values of the parameters. It is also modified in ways that reduce the computational burden without affecting the conclusions obtained.Using modern computer algebra techniques, Bekker, Merckens and Wansbeek (1994) show that the identification of the model can be assessed by evaluating the rank of a subset of this augmented Jacobian matrix, and that this evaluation can be conducted symbolically, before the parameters are estimated, and thus independently of any particular set of parameter values. Still, the assumptions upon which this method is constructed make this a test of local, rather than global, identification. However, the output of this procedure is a report on the identification status of every model parameter. This means that the researcher has a complete list of all problem parameters, which makes it more likely that the problem will be properly understood.
This approach has not yet been implemented in a mainstream SEM program. However, the authors have implemented their technique via a set of programs which are provided on a disk that accompanies their book.
McDonald, R. P. (1982). A note on the investigation of local and global identifiability. Psychometrika, 47(1), 101-3.
Reilly, T. (1995). A necessary and sufficient condition for identification of confirmatory factor analysis models of complexity one. Sociological Methods & Research, 23(4), 421-441.
Rigdon, E. E. (1995). A necessary and sufficient identification rule for structural models estimated in practice. Multivariate Behavioral Research, 30(3), 359-383.