Contents
Overview
The lvm family implements latent variable models, including confirmatory factor analysis (CFA) and structural equation models (SEM). The key innovation is that both latent covariance matrices and residual covariance matrices can independently be modeled as networks (Gaussian Graphical Models).
The mathematical model for latent variable models is:
Expected values:
$$\text{E}(\boldsymbol{y}) = \boldsymbol{\nu} + \boldsymbol{\Lambda}(\boldsymbol{I} - \boldsymbol{B})^{-1}\boldsymbol{\nu}_\eta$$
Variance-covariance structure:
$$\text{var}(\boldsymbol{y}) = \boldsymbol{\Lambda}(\boldsymbol{I} - \boldsymbol{B})^{-1}\boldsymbol{\Sigma}_\zeta(\boldsymbol{I} - \boldsymbol{B})^{-1\prime}\boldsymbol{\Lambda}^\prime + \boldsymbol{\Sigma}_\varepsilon$$
Where:
- Λ (lambda) is the factor loading matrix linking observed variables to latent variables
- B (beta) is the matrix of structural paths between latent variables
- Σζ (sigma_zeta) is the latent variable covariance matrix
- Σε (sigma_epsilon) is the residual covariance matrix
- ν (nu) is the vector of observed intercepts
- νη (nu_eta) is the vector of latent intercepts
Core Matrices
The lvm family uses several matrices to specify the model structure. The table below summarizes all core matrices:
| Matrix | Description | Default |
|---|---|---|
lambda |
Factor loadings (observed → latent) | User-specified |
beta |
Structural paths between latent variables | "zero" |
nu |
Observed variable intercepts | Estimated |
nu_eta |
Latent variable intercepts | Estimated |
tau |
Thresholds (for ordinal data) [experimental] | Estimated |
Latent Covariance Decomposition
The latent covariance matrix Σζ can be parameterized in different ways using the latent argument:
| Latent Type | Matrices | Equation |
|---|---|---|
"cov" |
sigma_zeta |
$\boldsymbol{\Sigma}_\zeta$ (directly estimated) |
"chol" |
lowertri_zeta |
$\boldsymbol{\Sigma}_\zeta = \boldsymbol{L}_\zeta\boldsymbol{L}_\zeta'$ |
"prec" |
kappa_zeta |
$\boldsymbol{\Sigma}_\zeta = \boldsymbol{K}_\zeta^{-1}$ |
"ggm" |
omega_zeta, delta_zeta |
$\boldsymbol{\Sigma}_\zeta = \boldsymbol{\Delta}_\zeta(\boldsymbol{I} - \boldsymbol{\Omega}_\zeta)^{-1}\boldsymbol{\Delta}_\zeta$ |
Residual Covariance Decomposition
Similarly, the residual covariance matrix Σε can be parameterized using the residual argument:
| Residual Type | Matrices | Equation |
|---|---|---|
"cov" |
sigma_epsilon |
$\boldsymbol{\Sigma}_\varepsilon$ (directly estimated) |
"chol" |
lowertri_epsilon |
$\boldsymbol{\Sigma}_\varepsilon = \boldsymbol{L}_\varepsilon\boldsymbol{L}_\varepsilon'$ |
"prec" |
kappa_epsilon |
$\boldsymbol{\Sigma}_\varepsilon = \boldsymbol{K}_\varepsilon^{-1}$ |
"ggm" |
omega_epsilon, delta_epsilon |
$\boldsymbol{\Sigma}_\varepsilon = \boldsymbol{\Delta}_\varepsilon(\boldsymbol{I} - \boldsymbol{\Omega}_\varepsilon)^{-1}\boldsymbol{\Delta}_\varepsilon$ |
Wrapper Functions
The lvm family provides several wrapper functions that set appropriate defaults for common model types:
| Wrapper | What it sets | Description |
|---|---|---|
lvm() |
User chooses | Base function for general latent variable models |
lnm() |
latent = "ggm" |
Latent Network Model (network among latent factors) |
rnm() |
residual = "ggm" |
Residual Network Model (network among residuals) |
lrnm() |
Both latent and residual = "ggm" |
Combined Latent + Residual Network Model |
bifactor() |
Augmented $\boldsymbol{\Lambda}$, diagonal latent cov | Bifactor model with general + specific factors |
latentgrowth() |
$\boldsymbol{\Lambda}$ from time, intercept + slope | Latent growth curve model |
Identification
Latent variable models require identification constraints. The identification argument specifies how to achieve this:
"loadings": Fix the first loading of each factor to 1 (default). This allows latent variances to be freely estimated."variance": Fix latent variances to 1. This allows all factor loadings to be freely estimated.
Note: When using identification = "loadings", the first loading of each factor is fixed to 1, so the first indicator for each latent variable must have a non-zero loading on that factor.
Example: CFA on HolzingerSwineford1939
The classic Holzinger-Swineford dataset includes 9 mental ability tests measuring 3 latent factors: visual, textual, and speed abilities. This example demonstrates how to specify and estimate a confirmatory factor analysis model.
The key to specifying a CFA in psychonetrics is constructing the lambda matrix, where rows represent observed variables and columns represent latent factors. A value of 1 indicates a free loading to be estimated, while 0 indicates the loading is fixed to zero.
library("psychonetrics")
library("lavaan")
data(HolzingerSwineford1939)
# Define factor loading structure
# Rows = observed variables (9 tests)
# Columns = latent factors (3 factors)
Lambda <- matrix(0, 9, 3)
Lambda[1:3, 1] <- 1 # Visual factor: x1, x2, x3
Lambda[4:6, 2] <- 1 # Textual factor: x4, x5, x6
Lambda[7:9, 3] <- 1 # Speed factor: x7, x8, x9
# Fit CFA model
mod_cfa <- lvm(HolzingerSwineford1939,
lambda = Lambda,
vars = paste0("x", 1:9),
latents = c("visual", "textual", "speed"),
identification = "variance")
mod_cfa <- mod_cfa %>% runmodel
# Inspect results
mod_cfa %>% fit
mod_cfa %>% parameters
mod_cfa %>% MIs
The identification = "variance" argument fixes all latent variances to 1, allowing all factor loadings to be freely estimated.
Example: Latent Network Model (LNM)
A Latent Network Model estimates the covariances among latent factors as a Gaussian Graphical Model. Instead of freely estimating all correlations, the model estimates a sparse network of partial correlations among the latent factors.
Important: A saturated LNM is always equivalent to a CFA model. The typical workflow is to first fit a CFA model and, if it fits well, reparameterize the latent covariance structure as a network and search for a sparse latent network using prune().
# Latent Network Model
mod_lnm <- lnm(HolzingerSwineford1939,
lambda = Lambda,
vars = paste0("x", 1:9),
latents = c("visual", "textual", "speed"),
identification = "variance")
mod_lnm <- mod_lnm %>% runmodel
# Inspect latent network parameters
mod_lnm %>% parameters
# Prune non-significant latent edges
mod_lnm <- mod_lnm %>% prune(alpha = 0.01)
# Extract latent network (omega_zeta matrix)
omega_latent <- getmatrix(mod_lnm, "omega_zeta")
# Compare CFA vs LNM
compare(cfa = mod_cfa, lnm = mod_lnm)
# Visualize latent network
library("qgraph")
qgraph(omega_latent,
labels = c("visual", "textual", "speed"),
theme = "colorblind",
cut = 0,
vsize = 15)
The omega_zeta matrix contains partial correlations between latent factors, controlling for all other latent factors. This provides insight into direct relationships among latent constructs.
Example: Residual Network Model (RNM)
A Residual Network Model estimates a network among the residuals of observed variables, after accounting for the latent factors. This can reveal conditional dependencies between observed variables that are not explained by the latent factor structure.
Residual networks are particularly useful for detecting local dependencies and method effects that violate local independence assumptions in traditional CFA.
# Residual Network Model (start with empty residual network)
mod_rnm <- rnm(HolzingerSwineford1939,
lambda = Lambda,
vars = paste0("x", 1:9),
latents = c("visual", "textual", "speed"),
identification = "variance",
omega_epsilon = "empty") # Start with no residual edges
mod_rnm <- mod_rnm %>% runmodel
# Search for significant residual edges
mod_rnm <- mod_rnm %>% stepup(alpha = 0.01, criterion = "bic")
# Inspect residual network
mod_rnm %>% parameters
omega_resid <- getmatrix(mod_rnm, "omega_epsilon")
# Visualize residual network
qgraph(omega_resid,
labels = paste0("x", 1:9),
theme = "colorblind",
cut = 0,
layout = "spring")
The omega_epsilon matrix contains partial correlations between observed variable residuals. Non-zero edges indicate relationships between variables that are not captured by the latent factor structure alone.
RNM with Pseudo-Maximum Likelihood (PML) [experimental]
With ML estimation, an RNM with a saturated (full) residual network is not identified because the CFA model plus all residual edges is overparameterized. PML estimation avoids this problem by estimating the model based on bivariate marginal likelihoods, which makes even a saturated residual network identified. This allows a top-down approach: start with a full residual network and prune non-significant edges.
Note: PML is an experimental feature and has not yet been fully validated.
# RNM with PML: start with a full residual network
mod_rnm_pml <- rnm(HolzingerSwineford1939,
lambda = Lambda,
vars = paste0("x", 1:9),
latents = c("visual", "textual", "speed"),
identification = "variance",
omega_epsilon = "full",
estimator = "PML")
mod_rnm_pml <- mod_rnm_pml %>% runmodel
# Prune non-significant residual edges
mod_rnm_pml <- mod_rnm_pml %>% prune(alpha = 0.01)
# Inspect and visualize
mod_rnm_pml %>% parameters
omega_resid_pml <- getmatrix(mod_rnm_pml, "omega_epsilon")
qgraph(omega_resid_pml,
labels = paste0("x", 1:9),
theme = "colorblind",
cut = 0,
layout = "spring")
Example: SEM with Structural Paths
Structural equation models extend CFA by adding directional paths between latent variables. This example uses the classic PoliticalDemocracy dataset from lavaan, modeling the effect of industrialization in 1960 on democracy in 1960 and 1965.
The beta matrix specifies structural paths between latent variables. In the beta matrix, rows represent outcome variables and columns represent predictor variables.
data(PoliticalDemocracy)
# Factor loadings
# ind60: x1, x2, x3 (industrialization indicators)
# dem60: y1, y2, y3, y4 (democracy 1960)
# dem65: y5, y6, y7, y8 (democracy 1965)
Lambda <- matrix(0, 11, 3)
Lambda[1:3, 1] <- 1 # ind60 indicators: x1, x2, x3
Lambda[4:7, 2] <- 1 # dem60 indicators: y1, y2, y3, y4
Lambda[8:11, 3] <- 1 # dem65 indicators: y5, y6, y7, y8
# Structural paths (beta matrix)
# Rows = outcomes, Columns = predictors
Beta <- matrix(0, 3, 3)
Beta[2, 1] <- 1 # dem60 ~ ind60
Beta[3, 1] <- 1 # dem65 ~ ind60
Beta[3, 2] <- 1 # dem65 ~ dem60
# Equality constraints on factor loadings
# Loadings of corresponding items in dem60 and dem65 are equal
Lambda[5, 2] <- 2; Lambda[9, 3] <- 2 # y2 == y6
Lambda[6, 2] <- 3; Lambda[10, 3] <- 3 # y3 == y7
Lambda[7, 2] <- 4; Lambda[11, 3] <- 4 # y4 == y8
# Residual covariances (correlated errors across time)
Theta <- diag(1, 11)
Theta[4, 8] <- Theta[8, 4] <- 1 # y1 ~~ y5
Theta[5, 9] <- Theta[9, 5] <- 1 # y2 ~~ y6
Theta[6, 10] <- Theta[10, 6] <- 1 # y3 ~~ y7
Theta[7, 11] <- Theta[11, 7] <- 1 # y4 ~~ y8
# Fit SEM model
mod_sem <- lvm(PoliticalDemocracy,
lambda = Lambda,
beta = Beta,
sigma_epsilon = Theta,
vars = c(paste0("x", 1:3), paste0("y", 1:8)),
latents = c("ind60", "dem60", "dem65"),
identification = "loadings")
mod_sem <- mod_sem %>% runmodel
# Inspect results
mod_sem %>% fit
mod_sem %>% parameters
Key points:
- Values in the beta matrix greater than 1 represent equality constraints (parameters with the same number are constrained equal)
- The same applies to the lambda and sigma_epsilon matrices
identification = "loadings"fixes the first loading of each factor to 1- Correlated residuals between corresponding dem60 and dem65 indicators are specified via
sigma_epsilon
Example: Measurement Invariance
Measurement invariance testing examines whether a measurement model operates equivalently across groups. This is essential for valid group comparisons. psychonetrics supports multi-group analysis with flexible equality constraints.
This example uses the StarWars dataset to test measurement invariance of a three-factor model across age groups (young vs. older fans).
data(StarWars)
# Define measurement model
# Q1 cross-loads on all three factors
Lambda <- matrix(0, 10, 3)
Lambda[1:4, 1] <- 1 # Prequels: Q1, Q2, Q3, Q4
Lambda[c(1,5:7), 2] <- 1 # Original trilogy: Q1, Q5, Q6, Q7
Lambda[c(1,8:10), 3] <- 1 # Sequels: Q1, Q8, Q9, Q10
obsvars <- paste0("Q", 1:10)
latents <- c("Prequels", "Original", "Sequels")
# Create age groups
StarWars$agegroup <- ifelse(StarWars$Q12 < 30, "young", "older")
# Configural invariance (free across groups)
mod_config <- lvm(StarWars,
lambda = Lambda,
vars = obsvars,
latents = latents,
identification = "variance",
groups = "agegroup")
mod_config <- mod_config %>% runmodel
# Weak invariance (equal loadings)
mod_weak <- mod_config %>% groupequal("lambda") %>% runmodel
# Strong invariance (equal intercepts)
mod_strong <- mod_weak %>% groupequal("nu") %>% runmodel
# Strict invariance (equal residual variances)
mod_strict <- mod_strong %>% groupequal("sigma_epsilon") %>% runmodel
# Compare all levels of invariance
compare(configural = mod_config,
weak = mod_weak,
strong = mod_strong,
strict = mod_strict)
Levels of measurement invariance:
- Configural: Same model structure across groups (no equality constraints)
- Weak (metric): Equal factor loadings (
lambda) - Strong (scalar): Equal intercepts (
nu) in addition to loadings - Strict: Equal residual variances (
sigma_epsilon) in addition to loadings and intercepts
The compare() function provides chi-square difference tests and fit indices to evaluate whether invariance constraints are tenable.
Summary
The lvm family provides a flexible and powerful framework for estimating latent variable models in psychonetrics. Key features include:
- Confirmatory Factor Analysis (CFA): Standard measurement models with freely estimated latent covariances
- Structural Equation Models (SEM): Directional paths between latent variables via the beta matrix
- Latent Network Models (LNM): Model latent covariances as a sparse network of partial correlations
- Residual Network Models (RNM): Model residual covariances as a network, revealing local dependencies
- Multi-group analysis: Full support for measurement invariance testing across groups
- Flexible parameterizations: Choose between covariance, precision, Cholesky, or GGM decompositions for both latent and residual structures
For more examples and applications, see the Examples page. For general information on working with psychonetrics models (e.g., model modification, pruning, stepwise search), see the General Tutorial.