Varcov Family

Overview
The Five Types
The GGM Parameterization
Example: Exploratory GGM on BFI Data
Example: Confirmatory GGM
Example: Multi-Group Network Comparison
Example: Penalized GGM (PML)
Summary

Overview

The varcov family models the variance-covariance structure of multivariate normal data, with an optional mean structure. This is the foundation for modeling observed variables directly without latent constructs.

The mathematical model is straightforward:

Mean structure: $\text{E}(\boldsymbol{y}) = \boldsymbol{\mu}$
Variance-covariance structure: $\text{var}(\boldsymbol{y}) = \boldsymbol{\Sigma}$

The key question in varcov modeling is: How do we parameterize $\boldsymbol{\Sigma}$?

The psychonetrics package offers five different parameterizations, each with unique properties and use cases. The most important for network analysis is the GGM (Gaussian Graphical Model) parameterization, which represents networks through partial correlations.

The Five Types

The varcov family includes five different parameterizations of the variance-covariance matrix. Each type uses different underlying matrices and transformations:

Type	Wrapper	Main Matrix	Scaling	Equation
cov	`varcov()`	sigma	—	$\boldsymbol{\Sigma}$
chol	`cholesky()`	lowertri	—	$\boldsymbol{LL}'$
prec	`precision()`	kappa	—	$\boldsymbol{K}^{-1}$
ggm	`ggm()`	omega	delta	$\boldsymbol{\Delta}(\boldsymbol{I} - \boldsymbol{\Omega})^{-1}\boldsymbol{\Delta}$
cor	`corr()`	P	SD	$\boldsymbol{SD} \cdot \boldsymbol{P} \cdot \boldsymbol{SD}$

Brief Descriptions

cov: Direct parameterization of the covariance matrix $\boldsymbol{\Sigma}$. Most straightforward but doesn't guarantee positive definiteness.
chol: Uses Cholesky decomposition $\boldsymbol{\Sigma} = \boldsymbol{LL}'$ where $\boldsymbol{L}$ is lower triangular. Guarantees positive definiteness.
prec: Models the precision matrix (inverse covariance) $\boldsymbol{K} = \boldsymbol{\Sigma}^{-1}$. Useful for sparse models.
ggm: Gaussian Graphical Model using partial correlations. The standard choice for network analysis.
cor: Separates correlations from standard deviations. Useful when you want to model correlation structure separately.

The GGM Parameterization

The Gaussian Graphical Model (GGM) is the most important parameterization for network psychometrics. It represents the variance-covariance matrix as:

$$\boldsymbol{\Sigma} = \boldsymbol{\Delta}(\boldsymbol{I} - \boldsymbol{\Omega})^{-1}\boldsymbol{\Delta}$$

Where:

$\boldsymbol{\Omega}$ (omega): The partial correlation matrix, representing network edges. This is the key parameter of interest in network models.
$\boldsymbol{\Delta}$ (delta): A diagonal matrix containing standard deviations for scaling.
$\boldsymbol{I}$: The identity matrix.

Why GGM for Networks?

The GGM is a pairwise Markov random field (PMRF) for continuous data—an undirected network model in which edges represent the strength of conditional association between two variables after controlling for all other variables. This makes it a natural fit for psychological network analysis:

Conditional independence structure: When $\omega_{ij} = 0$, variables $i$ and $j$ are conditionally independent given all other variables. The absence of an edge means the two variables have no direct relationship once all other variables are accounted for.
Sparse and testable models: A network with only a subset of edges estimated is a constrained statistical model with degrees of freedom, meaning that its fit can be evaluated. For example, a network of 8 variables with 12 edges aims to explain 28 correlations with 12 parameters, leaving 16 degrees of freedom for model testing.
Predictive interpretation: Edges in the GGM correspond to regression coefficients in multiple regression. A strong edge between two nodes means one is a strong predictor of the other, after controlling for all remaining variables. The network thus visualizes the results of many regression analyses simultaneously, mapping out the multicollinearity structure of the data.
Hypothesis generation: Although undirected, the GGM can generate causal hypotheses. If coffee — energetic — productive, we might hypothesize that coffee increases energy, which increases productivity. Unexpected edge signs may indicate common effect structures, where two variables that both cause a third become conditionally dependent.
Connection to factor models: Clusters in a GGM correspond to latent variables in a factor model. This means that the GGM can be used as a tool for exploring potential factor structures, a principle underlying exploratory graph analysis (EGA).

A key advantage of multivariate maximum likelihood estimation in psychonetrics over univariate (nodewise regression) approaches is that it estimates all parameters simultaneously. This enables fit indices, multi-group comparisons, latent variable extensions, and meta-analytic methods that are not available with univariate estimation.

Relationship to Precision Matrix

The omega matrix is closely related to the precision matrix $\boldsymbol{K} = \boldsymbol{\Sigma}^{-1}$. Specifically, omega contains the off-diagonal elements of the standardized precision matrix:

$$\boldsymbol{\Omega} = -\boldsymbol{\Delta}_K^{-1}\boldsymbol{K}\boldsymbol{\Delta}_K^{-1} + \boldsymbol{I}$$

where $\boldsymbol{\Delta}_K$ is a diagonal matrix with the diagonal elements of $\boldsymbol{K}$.

Example: Exploratory GGM on BFI Data

This example demonstrates the typical workflow for exploratory network analysis using the Big Five Inventory (BFI) dataset from the psych package.

Step 1: Load and Prepare Data

library("psychonetrics")
library("dplyr")
library("psych")

# Load Big Five Inventory data
data(bfi)

# Select the 25 personality items, remove missing values
vars <- names(bfi)[1:25]
data_bfi <- bfi %>% select(all_of(vars)) %>% na.omit()

# Check dimensions
dim(data_bfi)  # Should be 2436 x 25

Step 2: Fit Saturated GGM

Start with a fully saturated model where all edges are freely estimated:

# Fit a saturated GGM (all edges free)
mod_sat <- ggm(data_bfi, vars = vars, omega = "full")
mod_sat <- mod_sat %>% runmodel

# Inspect fit (should be perfect for saturated model)
mod_sat %>% fit

# View parameters (will show all 300 unique edges)
mod_sat %>% parameters

# Confidence interval plot for all partial correlations:
CIplot(mod_sat, "omega")
# Edges are colored by significance level (alpha = 0.05, 0.01, 0.001, 0.0001)
# Non-significant edges are shown in grey

Step 3: Prune Non-Significant Edges

Remove edges that are not statistically significant:

# Prune edges with p > 0.01
mod_pruned <- mod_sat %>% prune(alpha = 0.01)

# Check how many edges remain
mod_pruned %>% parameters %>% filter(matrix == "omega", !fixed) %>% nrow()

Step 4: Stepup Search

Add back edges that significantly improve model fit:

# Stepup search: add edges that improve fit
mod_final <- mod_pruned %>% stepup

# Compare all three models
compare(saturated = mod_sat, pruned = mod_pruned, final = mod_final)

Step 5: Extract and Visualize Network

# Extract the omega (partial correlation) matrix
omega <- getmatrix(mod_final, "omega")

# Visualize with qgraph (if installed)
library("qgraph")
qgraph(omega,
       labels = vars,
       theme = "colorblind",
       cut = 0,
       layout = "spring",
       title = "Big Five Personality Network")

Note: The prune() function removes edges based on statistical significance, while stepup() uses modification indices to add back edges that improve model fit. This two-step procedure is more sensitive than pruning alone, but slower.

Example: Confirmatory GGM

In confirmatory network analysis, we discover the network structure on one dataset (training data) and test whether the same structure holds in another dataset (test data). This is crucial for replication and validation.

Split Data into Training and Test Sets

# Set seed for reproducibility
set.seed(1)

# Randomly split data: 1000 for training, rest for testing
train_idx <- sample(nrow(data_bfi), 1000)
train_data <- data_bfi[train_idx, ]
test_data  <- data_bfi[-train_idx, ]

dim(train_data)  # 1000 x 25
dim(test_data)   # 1436 x 25

Discover Structure on Training Data

# Estimate network structure on training data
mod_train <- ggm(train_data, vars = vars) %>%
  runmodel %>%
  prune(alpha = 0.01) %>%
  stepup

# Check fit on training data
mod_train %>% fit

Extract Adjacency Matrix

# Get the omega matrix from training data
omega_train <- getmatrix(mod_train, "omega")

# Create adjacency matrix (1 = edge present, 0 = edge absent)
adjacency <- 1 * (omega_train != 0)
diag(adjacency) <- 0  # Remove diagonal

# Count number of edges
sum(adjacency) / 2  # Divide by 2 because matrix is symmetric

Confirm Structure on Test Data

# Fit model on test data using the structure from training data
mod_test <- ggm(test_data, vars = vars, omega = adjacency) %>%
  runmodel

# Evaluate fit on test data
mod_test %>% fit

# If fit is good (e.g., CFI > 0.90, RMSEA < 0.08),
# the network structure replicates!

# View parameter estimates on test data
mod_test %>% parameters

Interpretation: If the model fits well on the test data, this provides evidence that the network structure discovered in the training data generalizes. Poor fit suggests the structure may be sample-specific or overfitted.

Example: Multi-Group Network Comparison

Multi-group models allow us to test whether network structures differ across groups (e.g., gender, age, clinical status). This example compares personality networks between males and females.

Prepare Multi-Group Data

# Include gender variable (1 = male, 2 = female in BFI dataset)
data_groups <- bfi %>%
  select(all_of(vars), gender) %>%
  na.omit()

# Check group sizes
table(data_groups$gender)

Fit Configural Model

First, fit a configural model where each group has its own freely estimated network:

# Configural model: networks free to differ across groups
mod_config <- ggm(data_groups, vars = vars, groups = "gender") %>%
  runmodel

# Check fit
mod_config %>% fit

# View parameters for each group
mod_config %>% parameters %>% filter(group == 1)  # Males
mod_config %>% parameters %>% filter(group == 2)  # Females

Test Network Invariance

Constrain the omega matrices to be equal across groups:

# Constrain networks to be equal across groups
mod_equal <- mod_config %>%
  groupequal("omega") %>%
  runmodel

# Check fit
mod_equal %>% fit

Compare Models

# Likelihood ratio test and fit comparison
compare(configural = mod_config, equal_networks = mod_equal)

# If p-value is significant, networks differ across groups
# If not significant, networks are statistically equivalent

Additional Tests

You can test specific constraints:

# Test if only scaling (delta) differs, but network structure is same
mod_partial <- mod_config %>%
  groupequal("omega") %>%   # Equal networks
  groupfree("delta") %>%    # Free scaling
  runmodel

compare(configural = mod_config,
        equal_structure = mod_partial,
        fully_equal = mod_equal)

Tip: When networks differ across groups, you can use partialprune() to identify which specific edges differ between groups.

Example: Penalized GGM (PML) [experimental]

Experimental: Penalized estimation in psychonetrics is experimental and has not yet been fully validated. Use with caution and verify results.

Penalized Maximum Likelihood (PML) estimation uses regularization (typically LASSO) to automatically select a sparse network. The penalty parameter lambda is chosen to optimize model fit.

Fit Penalized Model

# Fit GGM with automatic lambda selection
mod_pml <- ggm(data_bfi, vars = vars, estimator = "PML") %>%
  runmodel

# The model automatically searches for optimal lambda
# View the lambda search results
mod_pml@optim$lambda_search

# Check which lambda was selected
mod_pml@optim$lambda

Extract Sparse Network

# Get the regularized network
omega_pml <- getmatrix(mod_pml, "omega")

# Count non-zero edges
sum(omega_pml != 0) / 2  # Divide by 2 for undirected network

# The network should be sparser than the unregularized version

Refit for Inference

The penalized estimates are biased (shrunk toward zero). To get unbiased estimates, standard errors, and p-values, refit the selected structure without penalty:

# Refit without penalty for inference
mod_refit <- mod_pml %>% refit

# Now we have standard errors and p-values
mod_refit %>% parameters

# Compare fit
compare(penalized = mod_pml, unpenalized = mod_refit)

Custom Lambda Values

You can also specify your own lambda values to test:

# Try specific lambda values
mod_custom <- ggm(data_bfi, vars = vars, estimator = "PML") %>%
  setoptimizer(lambda_search = seq(0, 0.5, by = 0.05)) %>%
  runmodel

# View results for all lambda values
mod_custom@optim$lambda_search

When to use PML? Penalized estimation is particularly useful when you have many variables relative to sample size, or when you want an automatic sparse network without manual pruning.

Summary

The varcov family provides the foundation for observed-variable network modeling in psychonetrics. Key takeaways:

Five different parameterizations are available, each suited to different modeling goals
The GGM parameterization (ggm()) is the standard choice for network analysis
GGMs represent networks through partial correlations, which control for all other variables
Exploratory analysis typically involves: saturated model → pruning → stepup search
Confirmatory analysis tests whether a discovered structure replicates in new data
Multi-group models test whether networks differ across populations
Penalized estimation (PML) provides automatic sparsity but requires refitting for inference (experimental)

Next Steps

Now that you understand varcov models, you can explore:

General Tutorial for the full psychonetrics workflow
LVM Family for latent variable models and confirmatory factor analysis
Examples for more advanced applications

Contents