Skip to contents

Returns multiple sparse principal components of a dataset using an iterative deflation heuristic. As in the elasticnet package, the data is passed as a single argument M whose interpretation is set by type: "Sigma" (the default) treats M as a covariance/correlation matrix (p x p) and "X" treats M as a raw data matrix (n observations x p variables). With type = "X" the algorithm operates on the data directly via the products \(X^\top(X\beta)\) and never forms the p x p matrix, which is substantially more scalable when \(n \ll p\).

Usage

mspca(
  M,
  r,
  ks,
  type = c("Sigma", "X"),
  feasibilityConstraintType = 0,
  verbose = TRUE,
  maxIter = 200,
  feasibilityTolerance = 1e-04,
  stallingTolerance = 1e-08,
  timeLimitTPM = 20,
  maxRestartTPM = 30,
  minRestartTPM = 20,
  center = TRUE,
  scale = TRUE,
  divisor = c("n-1", "n"),
  checkPSD = TRUE,
  symTolerance = 1e-08,
  psdTolerance = 1e-08
)

Arguments

M

A matrix. The data, interpreted according to type: a covariance/ correlation matrix (p x p) when type = "Sigma", or a raw data matrix (n x p) when type = "X".

r

An integer. Number of principal components (PCs) to be computed.

ks

An integer vector. Target sparsity of each PC.

type

(optional) Either "Sigma" (default; M is a covariance/correlation matrix) or "X" (M is a raw data matrix).

feasibilityConstraintType

(optional) An integer. Type of feasibility constraints to be enforced. 0: orthogonality constraints; 1: uncorrelatedness constraints. Default 0.

verbose

(optional) A Boolean. Controls console output. Default TRUE.

maxIter

(optional) An integer. Maximum number of iterations of the algorithm. Default 200.

feasibilityTolerance

(optional) A float. Tolerance for constraint violation (orthogonality/uncorrelatedness, according to feasibilityConstraintType). Default 1e-4.

stallingTolerance

(optional) A float. Controls the objective improvement below which the algorithm is considered to have stalled. Default 1e-8.

timeLimitTPM

(optional) An integer. Maximum time in seconds for the truncated power method (inner iteration). Default 20.

maxRestartTPM

(optional) An integer. Number of random restarts of the truncated power method (inner iteration) for the first outer iteration. Default 30.

minRestartTPM

(optional) An integer. Number of random restarts of the truncated power method (inner iteration) for outer iterations >= 2. Default 20.

center

(optional, type = "X") A Boolean. Center the columns of M before computing the covariance. Default TRUE.

scale

(optional, type = "X") A Boolean. Scale the columns of M to unit variance, i.e. operate on the correlation matrix. Default TRUE.

divisor

(optional, type = "X") Either "n-1" (default, sample covariance, matches cov/cor) or "n" (population covariance). Default "n-1".

checkPSD

(optional, type = "Sigma") A Boolean. Verify that M is positive semidefinite. Default TRUE.

symTolerance

(optional, type = "Sigma") A float. Tolerance for the symmetry check on M. Default 1e-8.

psdTolerance

(optional, type = "Sigma") A float. Tolerance (on the smallest eigenvalue) for the PSD check on M. Default 1e-8.

Value

An object of class "mspca" (a list) with fields: x_best (p x r matrix of sparse PC loadings), objective_value, feasibility_violation, runtime, variance_explained (per-PC explained variance), and total_variance (trace of the covariance matrix). With type = "X" it additionally records inputType, center, scale, divisor, nObs, and p. Use print() to display the sparse loadings and summary() for a full per-PC breakdown.

Examples

# From a covariance/correlation matrix (the default type):
TestMat <- cor(mtcars)
res <- mspca(TestMat, r = 2, ks = c(4, 4), verbose = FALSE)
print(res, TestMat)
#> 
#> msPCA solution: 2 sparse PCs
#> Pct. variance explained: 32.45835 27.98031 
#> Non-zero loadings per PC: 4 4 
#> 
#> Sparse PCs
#>            [,1]       [,2]
#> mpg   0.4994876  0.0000000
#> cyl  -0.4952711  0.0000000
#> disp -0.5096594  0.0000000
#> hp    0.0000000  0.5180512
#> wt   -0.4954453  0.0000000
#> qsec  0.0000000 -0.5056634
#> vs    0.0000000 -0.4935970
#> carb  0.0000000  0.4819641
# Equivalent call from the raw data matrix (C need not be passed to print):
res_X <- mspca(as.matrix(mtcars), r = 2, ks = c(4, 4), type = "X", verbose = FALSE)
print(res_X)
#> 
#> msPCA solution: 2 sparse PCs
#> Pct. variance explained: 32.15198 27.98031 
#> Non-zero loadings per PC: 4 4 
#> 
#> Sparse PCs
#>            [,1]       [,2]
#> mpg  -0.5250073  0.0000000
#> cyl   0.4063729  0.0000000
#> disp  0.5345591  0.0000000
#> hp    0.0000000 -0.5180502
#> wt    0.5229483  0.0000000
#> qsec  0.0000000  0.5056644
#> vs    0.0000000  0.4935988
#> carb  0.0000000 -0.4819624