Contents
Overview
The varcov family models the variance-covariance structure of multivariate normal data, with an optional mean structure. This is the foundation for modeling observed variables directly without latent constructs.
The mathematical model is straightforward:
- Mean structure: $\text{E}(\boldsymbol{y}) = \boldsymbol{\mu}$
- Variance-covariance structure: $\text{var}(\boldsymbol{y}) = \boldsymbol{\Sigma}$
The key question in varcov modeling is: How do we parameterize $\boldsymbol{\Sigma}$?
The psychonetrics package offers five different parameterizations, each with unique properties and use cases. The most important for network analysis is the GGM (Gaussian Graphical Model) parameterization, which represents networks through partial correlations.
The Five Types
The varcov family includes five different parameterizations of the variance-covariance matrix. Each type uses different underlying matrices and transformations:
| Type | Wrapper | Main Matrix | Scaling | Equation |
|---|---|---|---|---|
| cov | varcov() |
sigma | — | $\boldsymbol{\Sigma}$ |
| chol | cholesky() |
lowertri | — | $\boldsymbol{LL}'$ |
| prec | precision() |
kappa | — | $\boldsymbol{K}^{-1}$ |
| ggm | ggm() |
omega | delta | $\boldsymbol{\Delta}(\boldsymbol{I} - \boldsymbol{\Omega})^{-1}\boldsymbol{\Delta}$ |
| cor | corr() |
P | SD | $\boldsymbol{SD} \cdot \boldsymbol{P} \cdot \boldsymbol{SD}$ |
Brief Descriptions
- cov: Direct parameterization of the covariance matrix $\boldsymbol{\Sigma}$. Most straightforward but doesn't guarantee positive definiteness.
- chol: Uses Cholesky decomposition $\boldsymbol{\Sigma} = \boldsymbol{LL}'$ where $\boldsymbol{L}$ is lower triangular. Guarantees positive definiteness.
- prec: Models the precision matrix (inverse covariance) $\boldsymbol{K} = \boldsymbol{\Sigma}^{-1}$. Useful for sparse models.
- ggm: Gaussian Graphical Model using partial correlations. The standard choice for network analysis.
- cor: Separates correlations from standard deviations. Useful when you want to model correlation structure separately.
The GGM Parameterization
The Gaussian Graphical Model (GGM) is the most important parameterization for network psychometrics. It represents the variance-covariance matrix as:
$$\boldsymbol{\Sigma} = \boldsymbol{\Delta}(\boldsymbol{I} - \boldsymbol{\Omega})^{-1}\boldsymbol{\Delta}$$
Where:
- $\boldsymbol{\Omega}$ (omega): The partial correlation matrix, representing network edges. This is the key parameter of interest in network models.
- $\boldsymbol{\Delta}$ (delta): A diagonal matrix containing standard deviations for scaling.
- $\boldsymbol{I}$: The identity matrix.
Why GGM for Networks?
The GGM is a pairwise Markov random field (PMRF) for continuous data—an undirected network model in which edges represent the strength of conditional association between two variables after controlling for all other variables. This makes it a natural fit for psychological network analysis:
- Conditional independence structure: When $\omega_{ij} = 0$, variables $i$ and $j$ are conditionally independent given all other variables. The absence of an edge means the two variables have no direct relationship once all other variables are accounted for.
- Sparse and testable models: A network with only a subset of edges estimated is a constrained statistical model with degrees of freedom, meaning that its fit can be evaluated. For example, a network of 8 variables with 12 edges aims to explain 28 correlations with 12 parameters, leaving 16 degrees of freedom for model testing.
- Predictive interpretation: Edges in the GGM correspond to regression coefficients in multiple regression. A strong edge between two nodes means one is a strong predictor of the other, after controlling for all remaining variables. The network thus visualizes the results of many regression analyses simultaneously, mapping out the multicollinearity structure of the data.
- Hypothesis generation: Although undirected, the GGM can generate causal hypotheses. If coffee — energetic — productive, we might hypothesize that coffee increases energy, which increases productivity. Unexpected edge signs may indicate common effect structures, where two variables that both cause a third become conditionally dependent.
- Connection to factor models: Clusters in a GGM correspond to latent variables in a factor model. This means that the GGM can be used as a tool for exploring potential factor structures, a principle underlying exploratory graph analysis (EGA).
A key advantage of multivariate maximum likelihood estimation in psychonetrics over univariate (nodewise regression) approaches is that it estimates all parameters simultaneously. This enables fit indices, multi-group comparisons, latent variable extensions, and meta-analytic methods that are not available with univariate estimation.
Relationship to Precision Matrix
The omega matrix is closely related to the precision matrix $\boldsymbol{K} = \boldsymbol{\Sigma}^{-1}$. Specifically, omega contains the off-diagonal elements of the standardized precision matrix:
$$\boldsymbol{\Omega} = -\boldsymbol{\Delta}_K^{-1}\boldsymbol{K}\boldsymbol{\Delta}_K^{-1} + \boldsymbol{I}$$
where $\boldsymbol{\Delta}_K$ is a diagonal matrix with the diagonal elements of $\boldsymbol{K}$.
Example: Exploratory GGM on BFI Data
This example demonstrates the typical workflow for exploratory network analysis using the Big Five Inventory (BFI) dataset from the psych package.
Step 1: Load and Prepare Data
library("psychonetrics")
library("dplyr")
library("psych")
# Load Big Five Inventory data
data(bfi)
# Select the 25 personality items, remove missing values
vars <- names(bfi)[1:25]
data_bfi <- bfi %>% select(all_of(vars)) %>% na.omit()
# Check dimensions
dim(data_bfi) # Should be 2436 x 25
Step 2: Fit Saturated GGM
Start with a fully saturated model where all edges are freely estimated:
# Fit a saturated GGM (all edges free)
mod_sat <- ggm(data_bfi, vars = vars, omega = "full")
mod_sat <- mod_sat %>% runmodel
# Inspect fit (should be perfect for saturated model)
mod_sat %>% fit
# View parameters (will show all 300 unique edges)
mod_sat %>% parameters
# Confidence interval plot for all partial correlations:
CIplot(mod_sat, "omega")
# Edges are colored by significance level (alpha = 0.05, 0.01, 0.001, 0.0001)
# Non-significant edges are shown in grey
Step 3: Prune Non-Significant Edges
Remove edges that are not statistically significant:
# Prune edges with p > 0.01
mod_pruned <- mod_sat %>% prune(alpha = 0.01)
# Check how many edges remain
mod_pruned %>% parameters %>% filter(matrix == "omega", !fixed) %>% nrow()
Step 4: Stepup Search
Add back edges that significantly improve model fit:
# Stepup search: add edges that improve fit
mod_final <- mod_pruned %>% stepup
# Compare all three models
compare(saturated = mod_sat, pruned = mod_pruned, final = mod_final)
Step 5: Extract and Visualize Network
# Extract the omega (partial correlation) matrix
omega <- getmatrix(mod_final, "omega")
# Visualize with qgraph (if installed)
library("qgraph")
qgraph(omega,
labels = vars,
theme = "colorblind",
cut = 0,
layout = "spring",
title = "Big Five Personality Network")
prune() function removes edges based on statistical significance, while stepup() uses modification indices to add back edges that improve model fit. This two-step procedure is more sensitive than pruning alone, but slower.
Example: Confirmatory GGM
In confirmatory network analysis, we discover the network structure on one dataset (training data) and test whether the same structure holds in another dataset (test data). This is crucial for replication and validation.
Split Data into Training and Test Sets
# Set seed for reproducibility
set.seed(1)
# Randomly split data: 1000 for training, rest for testing
train_idx <- sample(nrow(data_bfi), 1000)
train_data <- data_bfi[train_idx, ]
test_data <- data_bfi[-train_idx, ]
dim(train_data) # 1000 x 25
dim(test_data) # 1436 x 25
Discover Structure on Training Data
# Estimate network structure on training data
mod_train <- ggm(train_data, vars = vars) %>%
runmodel %>%
prune(alpha = 0.01) %>%
stepup
# Check fit on training data
mod_train %>% fit
Extract Adjacency Matrix
# Get the omega matrix from training data
omega_train <- getmatrix(mod_train, "omega")
# Create adjacency matrix (1 = edge present, 0 = edge absent)
adjacency <- 1 * (omega_train != 0)
diag(adjacency) <- 0 # Remove diagonal
# Count number of edges
sum(adjacency) / 2 # Divide by 2 because matrix is symmetric
Confirm Structure on Test Data
# Fit model on test data using the structure from training data
mod_test <- ggm(test_data, vars = vars, omega = adjacency) %>%
runmodel
# Evaluate fit on test data
mod_test %>% fit
# If fit is good (e.g., CFI > 0.90, RMSEA < 0.08),
# the network structure replicates!
# View parameter estimates on test data
mod_test %>% parameters
Example: Multi-Group Network Comparison
Multi-group models allow us to test whether network structures differ across groups (e.g., gender, age, clinical status). This example compares personality networks between males and females.
Prepare Multi-Group Data
# Include gender variable (1 = male, 2 = female in BFI dataset)
data_groups <- bfi %>%
select(all_of(vars), gender) %>%
na.omit()
# Check group sizes
table(data_groups$gender)
Fit Configural Model
First, fit a configural model where each group has its own freely estimated network:
# Configural model: networks free to differ across groups
mod_config <- ggm(data_groups, vars = vars, groups = "gender") %>%
runmodel
# Check fit
mod_config %>% fit
# View parameters for each group
mod_config %>% parameters %>% filter(group == 1) # Males
mod_config %>% parameters %>% filter(group == 2) # Females
Test Network Invariance
Constrain the omega matrices to be equal across groups:
# Constrain networks to be equal across groups
mod_equal <- mod_config %>%
groupequal("omega") %>%
runmodel
# Check fit
mod_equal %>% fit
Compare Models
# Likelihood ratio test and fit comparison
compare(configural = mod_config, equal_networks = mod_equal)
# If p-value is significant, networks differ across groups
# If not significant, networks are statistically equivalent
Additional Tests
You can test specific constraints:
# Test if only scaling (delta) differs, but network structure is same
mod_partial <- mod_config %>%
groupequal("omega") %>% # Equal networks
groupfree("delta") %>% # Free scaling
runmodel
compare(configural = mod_config,
equal_structure = mod_partial,
fully_equal = mod_equal)
partialprune() to identify which specific edges differ between groups.
Example: Penalized GGM (PML) [experimental]
Penalized Maximum Likelihood (PML) estimation uses regularization (typically LASSO) to automatically select a sparse network. The penalty parameter lambda is chosen to optimize model fit.
Fit Penalized Model
# Fit GGM with automatic lambda selection
mod_pml <- ggm(data_bfi, vars = vars, estimator = "PML") %>%
runmodel
# The model automatically searches for optimal lambda
# View the lambda search results
mod_pml@optim$lambda_search
# Check which lambda was selected
mod_pml@optim$lambda
Extract Sparse Network
# Get the regularized network
omega_pml <- getmatrix(mod_pml, "omega")
# Count non-zero edges
sum(omega_pml != 0) / 2 # Divide by 2 for undirected network
# The network should be sparser than the unregularized version
Refit for Inference
The penalized estimates are biased (shrunk toward zero). To get unbiased estimates, standard errors, and p-values, refit the selected structure without penalty:
# Refit without penalty for inference
mod_refit <- mod_pml %>% refit
# Now we have standard errors and p-values
mod_refit %>% parameters
# Compare fit
compare(penalized = mod_pml, unpenalized = mod_refit)
Custom Lambda Values
You can also specify your own lambda values to test:
# Try specific lambda values
mod_custom <- ggm(data_bfi, vars = vars, estimator = "PML") %>%
setoptimizer(lambda_search = seq(0, 0.5, by = 0.05)) %>%
runmodel
# View results for all lambda values
mod_custom@optim$lambda_search
Summary
The varcov family provides the foundation for observed-variable network modeling in psychonetrics. Key takeaways:
- Five different parameterizations are available, each suited to different modeling goals
- The GGM parameterization (
ggm()) is the standard choice for network analysis - GGMs represent networks through partial correlations, which control for all other variables
- Exploratory analysis typically involves: saturated model → pruning → stepup search
- Confirmatory analysis tests whether a discovered structure replicates in new data
- Multi-group models test whether networks differ across populations
- Penalized estimation (PML) provides automatic sparsity but requires refitting for inference (experimental)
Next Steps
Now that you understand varcov models, you can explore:
- General Tutorial for the full psychonetrics workflow
- LVM Family for latent variable models and confirmatory factor analysis
- Examples for more advanced applications
Further Reading
For theoretical background on GGMs and network psychometrics:
- Epskamp, S., Waldorp, L. J., Mottus, R., & Borsboom, D. (2018). The Gaussian graphical model in cross-sectional and time-series data. Multivariate Behavioral Research, 53(4), 453-480.
- Epskamp, S. (2020). Psychometric network models from time-series and panel data. Psychometrika, 85(1), 206-231.
- Epskamp, S., Haslbeck, J. M. B., Isvoranu, A. M., & Van Borkulo, C. D. (2022). Pairwise Markov random fields. In A. M. Isvoranu, S. Epskamp, L. J. Waldorp, & D. Borsboom (Eds.), Network psychometrics with R: A guide for behavioral and social scientists (pp. 95–118). Routledge.