Multiple Sparse PCA
mspca.RdReturns multiple sparse principal components of a dataset using an iterative
deflation heuristic. As in the elasticnet package, the data is passed as a
single argument M whose interpretation is set by type: "Sigma" (the
default) treats M as a covariance/correlation matrix (p x p) and "X" treats
M as a raw data matrix (n observations x p variables). With type = "X" the
algorithm operates on the data directly via the products \(X^\top(X\beta)\)
and never forms the p x p matrix, which is substantially more scalable when
\(n \ll p\).
Usage
mspca(
M,
r,
ks,
type = c("Sigma", "X"),
feasibilityConstraintType = 0,
verbose = TRUE,
maxIter = 200,
feasibilityTolerance = 1e-04,
stallingTolerance = 1e-08,
timeLimitTPM = 20,
maxRestartTPM = 30,
minRestartTPM = 20,
center = TRUE,
scale = TRUE,
divisor = c("n-1", "n"),
checkPSD = TRUE,
symTolerance = 1e-08,
psdTolerance = 1e-08
)Arguments
- M
A matrix. The data, interpreted according to
type: a covariance/ correlation matrix (p x p) whentype = "Sigma", or a raw data matrix (n x p) whentype = "X".- r
An integer. Number of principal components (PCs) to be computed.
- ks
An integer vector. Target sparsity of each PC.
- type
(optional) Either "Sigma" (default;
Mis a covariance/correlation matrix) or "X" (Mis a raw data matrix).- feasibilityConstraintType
(optional) An integer. Type of feasibility constraints to be enforced. 0: orthogonality constraints; 1: uncorrelatedness constraints. Default 0.
- verbose
(optional) A Boolean. Controls console output. Default TRUE.
- maxIter
(optional) An integer. Maximum number of iterations of the algorithm. Default 200.
- feasibilityTolerance
(optional) A float. Tolerance for constraint violation (orthogonality/uncorrelatedness, according to
feasibilityConstraintType). Default 1e-4.- stallingTolerance
(optional) A float. Controls the objective improvement below which the algorithm is considered to have stalled. Default 1e-8.
- timeLimitTPM
(optional) An integer. Maximum time in seconds for the truncated power method (inner iteration). Default 20.
- maxRestartTPM
(optional) An integer. Number of random restarts of the truncated power method (inner iteration) for the first outer iteration. Default 30.
- minRestartTPM
(optional) An integer. Number of random restarts of the truncated power method (inner iteration) for outer iterations >= 2. Default 20.
- center
(optional, type = "X") A Boolean. Center the columns of
Mbefore computing the covariance. Default TRUE.- scale
(optional, type = "X") A Boolean. Scale the columns of
Mto unit variance, i.e. operate on the correlation matrix. Default TRUE.- divisor
(optional, type = "X") Either "n-1" (default, sample covariance, matches
cov/cor) or "n" (population covariance). Default "n-1".- checkPSD
(optional, type = "Sigma") A Boolean. Verify that
Mis positive semidefinite. Default TRUE.- symTolerance
(optional, type = "Sigma") A float. Tolerance for the symmetry check on
M. Default 1e-8.- psdTolerance
(optional, type = "Sigma") A float. Tolerance (on the smallest eigenvalue) for the PSD check on
M. Default 1e-8.
Value
An object of class "mspca" (a list) with fields: x_best (p x r
matrix of sparse PC loadings), objective_value, feasibility_violation,
runtime, variance_explained (per-PC explained variance), and
total_variance (trace of the covariance matrix). With type = "X" it
additionally records inputType, center, scale, divisor, nObs,
and p. Use print() to display the sparse loadings and summary() for
a full per-PC breakdown.
Examples
# From a covariance/correlation matrix (the default type):
TestMat <- cor(mtcars)
res <- mspca(TestMat, r = 2, ks = c(4, 4), verbose = FALSE)
print(res, TestMat)
#>
#> msPCA solution: 2 sparse PCs
#> Pct. variance explained: 32.45835 27.98031
#> Non-zero loadings per PC: 4 4
#>
#> Sparse PCs
#> [,1] [,2]
#> mpg 0.4994876 0.0000000
#> cyl -0.4952711 0.0000000
#> disp -0.5096594 0.0000000
#> hp 0.0000000 0.5180512
#> wt -0.4954453 0.0000000
#> qsec 0.0000000 -0.5056634
#> vs 0.0000000 -0.4935970
#> carb 0.0000000 0.4819641
# Equivalent call from the raw data matrix (C need not be passed to print):
res_X <- mspca(as.matrix(mtcars), r = 2, ks = c(4, 4), type = "X", verbose = FALSE)
print(res_X)
#>
#> msPCA solution: 2 sparse PCs
#> Pct. variance explained: 32.15198 27.98031
#> Non-zero loadings per PC: 4 4
#>
#> Sparse PCs
#> [,1] [,2]
#> mpg -0.5250073 0.0000000
#> cyl 0.4063729 0.0000000
#> disp 0.5345591 0.0000000
#> hp 0.0000000 -0.5180502
#> wt 0.5229483 0.0000000
#> qsec 0.0000000 0.5056644
#> vs 0.0000000 0.4935988
#> carb 0.0000000 -0.4819624