Title: | Sparse Principal Component Analysis (SPCA) |
---|---|
Description: | Sparse principal component analysis (SPCA) attempts to find sparse weight vectors (loadings), i.e., a weight vector with only a few 'active' (nonzero) values. This approach provides better interpretability for the principal components in high-dimensional data settings. This is, because the principal components are formed as a linear combination of only a few of the original variables. This package provides efficient routines to compute SPCA. Specifically, a variable projection solver is used to compute the sparse solution. In addition, a fast randomized accelerated SPCA routine and a robust SPCA routine is provided. Robust SPCA allows to capture grossly corrupted entries in the data. |
Authors: | N. Benjamin Erichson, Peng Zheng, and Sasha Aravkin |
Maintainer: | N. Benjamin Erichson <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.0 |
Built: | 2024-10-29 03:00:11 UTC |
Source: | https://github.com/erichson/spca |
Implementation of robust SPCA, using variable projection as an optimization strategy.
robspca(X, k = NULL, alpha = 1e-04, beta = 1e-04, gamma = 100, center = TRUE, scale = FALSE, max_iter = 1000, tol = 1e-05, verbose = TRUE)
robspca(X, k = NULL, alpha = 1e-04, beta = 1e-04, gamma = 100, center = TRUE, scale = FALSE, max_iter = 1000, tol = 1e-05, verbose = TRUE)
X |
array_like; |
k |
integer; |
alpha |
float; |
beta |
float; |
gamma |
float; |
center |
bool; |
scale |
bool; |
max_iter |
integer; |
tol |
float; |
verbose |
bool; |
Sparse principal component analysis is a modern variant of PCA. Specifically, SPCA attempts to find sparse
weight vectors (loadings), i.e., a weight vector with only a few 'active' (nonzero) values. This approach
leads to an improved interpretability of the model, because the principal components are formed as a
linear combination of only a few of the original variables. Further, SPCA avoids overfitting in a
high-dimensional data setting where the number of variables is greater than the number of
observations
.
Such a parsimonious model is obtained by introducing prior information like sparsity promoting regularizers.
More concreatly, given an data matrix
, robust SPCA attemps to minimize the following
objective function:
where is the sparse weight matrix (loadings) and
is an orthonormal matrix.
denotes a sparsity inducing regularizer such as the LASSO (
norm) or the elastic net
(a combination of the
and
norm). The matrix
captures grossly corrupted outliers in the data.
The principal components are formed as
and the data can be approximately rotated back as
The print and summary method can be used to present the results in a nice format.
spca
returns a list containing the following three components:
loadings |
array_like; |
transform |
array_like; |
scores |
array_like; |
sparse |
array_like; |
eigenvalues |
array_like; |
center , scale
|
array_like; |
N. Benjamin Erichson, Peng Zheng, and Sasha Aravkin
[1] N. B. Erichson, P. Zheng, K. Manohar, S. Brunton, J. N. Kutz, A. Y. Aravkin. "Sparse Principal Component Analysis via Variable Projection." Submitted to IEEE Journal of Selected Topics on Signal Processing (2018). (available at 'arXiv https://arxiv.org/abs/1804.00341).
# Create artifical data m <- 10000 V1 <- rnorm(m, 0, 290) V2 <- rnorm(m, 0, 300) V3 <- -0.1*V1 + 0.1*V2 + rnorm(m,0,100) X <- cbind(V1,V1,V1,V1, V2,V2,V2,V2, V3,V3) X <- X + matrix(rnorm(length(X),0,1), ncol = ncol(X), nrow = nrow(X)) # Compute SPCA out <- robspca(X, k=3, alpha=1e-3, beta=1e-5, gamma=5, center = TRUE, scale = FALSE, verbose=0) print(out) summary(out)
# Create artifical data m <- 10000 V1 <- rnorm(m, 0, 290) V2 <- rnorm(m, 0, 300) V3 <- -0.1*V1 + 0.1*V2 + rnorm(m,0,100) X <- cbind(V1,V1,V1,V1, V2,V2,V2,V2, V3,V3) X <- X + matrix(rnorm(length(X),0,1), ncol = ncol(X), nrow = nrow(X)) # Compute SPCA out <- robspca(X, k=3, alpha=1e-3, beta=1e-5, gamma=5, center = TRUE, scale = FALSE, verbose=0) print(out) summary(out)
Randomized accelerated implementation of SPCA, using variable projection as an optimization strategy.
rspca(X, k = NULL, alpha = 1e-04, beta = 1e-04, center = TRUE, scale = FALSE, max_iter = 1000, tol = 1e-05, o = 20, q = 2, verbose = TRUE)
rspca(X, k = NULL, alpha = 1e-04, beta = 1e-04, center = TRUE, scale = FALSE, max_iter = 1000, tol = 1e-05, o = 20, q = 2, verbose = TRUE)
X |
array_like; |
k |
integer; |
alpha |
float; |
beta |
float; |
center |
bool; |
scale |
bool; |
max_iter |
integer; |
tol |
float; |
o |
integer; |
q |
integer; |
verbose |
bool; |
Sparse principal component analysis is a modern variant of PCA. Specifically, SPCA attempts to find sparse
weight vectors (loadings), i.e., a weight vector with only a few 'active' (nonzero) values. This approach
leads to an improved interpretability of the model, because the principal components are formed as a
linear combination of only a few of the original variables. Further, SPCA avoids overfitting in a
high-dimensional data setting where the number of variables is greater than the number of
observations
.
Such a parsimonious model is obtained by introducing prior information like sparsity promoting regularizers.
More concreatly, given an data matrix
, SPCA attemps to minimize the following
objective function:
where is the sparse weight (loadings) matrix and
is an orthonormal matrix.
denotes a sparsity inducing regularizer such as the LASSO (
norm) or the elastic net
(a combination of the
and
norm). The principal components
are formed as
and the data can be approximately rotated back as
The print and summary method can be used to present the results in a nice format.
spca
returns a list containing the following three components:
loadings |
array_like; |
transform |
array_like; |
scores |
array_like; |
eigenvalues |
array_like; |
center , scale
|
array_like; |
This implementation uses randomized methods for linear algebra to speedup the computations.
is an oversampling parameter to improve the approximation.
A value of at least 10 is recommended, and
is set by default.
The parameter specifies the number of power (subspace) iterations
to reduce the approximation error. The power scheme is recommended,
if the singular values decay slowly. In practice, 2 or 3 iterations
achieve good results, however, computing power iterations increases the
computational costs. The power scheme is set to
by default.
If , a the deterministic
spca
algorithm might be faster.
N. Benjamin Erichson, Peng Zheng, and Sasha Aravkin
[1] N. B. Erichson, P. Zheng, K. Manohar, S. Brunton, J. N. Kutz, A. Y. Aravkin. "Sparse Principal Component Analysis via Variable Projection." Submitted to IEEE Journal of Selected Topics on Signal Processing (2018). (available at 'arXiv https://arxiv.org/abs/1804.00341).
[1] N. B. Erichson, S. Voronin, S. Brunton, J. N. Kutz. "Randomized matrix decompositions using R." Submitted to Journal of Statistical Software (2016). (available at 'arXiv http://arxiv.org/abs/1608.02148).
# Create artifical data m <- 10000 V1 <- rnorm(m, 0, 290) V2 <- rnorm(m, 0, 300) V3 <- -0.1*V1 + 0.1*V2 + rnorm(m,0,100) X <- cbind(V1,V1,V1,V1, V2,V2,V2,V2, V3,V3) X <- X + matrix(rnorm(length(X),0,1), ncol = ncol(X), nrow = nrow(X)) # Compute SPCA out <- rspca(X, k=3, alpha=1e-3, beta=1e-3, center = TRUE, scale = FALSE, verbose=0) print(out) summary(out)
# Create artifical data m <- 10000 V1 <- rnorm(m, 0, 290) V2 <- rnorm(m, 0, 300) V3 <- -0.1*V1 + 0.1*V2 + rnorm(m,0,100) X <- cbind(V1,V1,V1,V1, V2,V2,V2,V2, V3,V3) X <- X + matrix(rnorm(length(X),0,1), ncol = ncol(X), nrow = nrow(X)) # Compute SPCA out <- rspca(X, k=3, alpha=1e-3, beta=1e-3, center = TRUE, scale = FALSE, verbose=0) print(out) summary(out)
Implementation of SPCA, using variable projection as an optimization strategy.
spca(X, k = NULL, alpha = 1e-04, beta = 1e-04, center = TRUE, scale = FALSE, max_iter = 1000, tol = 1e-05, verbose = TRUE)
spca(X, k = NULL, alpha = 1e-04, beta = 1e-04, center = TRUE, scale = FALSE, max_iter = 1000, tol = 1e-05, verbose = TRUE)
X |
array_like; |
k |
integer; |
alpha |
float; |
beta |
float; |
center |
bool; |
scale |
bool; |
max_iter |
integer; |
tol |
float; |
verbose |
bool; |
Sparse principal component analysis is a modern variant of PCA. Specifically, SPCA attempts to find sparse
weight vectors (loadings), i.e., a weight vector with only a few 'active' (nonzero) values. This approach
leads to an improved interpretability of the model, because the principal components are formed as a
linear combination of only a few of the original variables. Further, SPCA avoids overfitting in a
high-dimensional data setting where the number of variables is greater than the number of
observations
.
Such a parsimonious model is obtained by introducing prior information like sparsity promoting regularizers.
More concreatly, given an data matrix
, SPCA attemps to minimize the following
objective function:
where is the sparse weight (loadings) matrix and
is an orthonormal matrix.
denotes a sparsity inducing regularizer such as the LASSO (
norm) or the elastic net
(a combination of the
and
norm). The principal components
are formed as
and the data can be approximately rotated back as
The print and summary method can be used to present the results in a nice format.
spca
returns a list containing the following three components:
loadings |
array_like; |
transform |
array_like; |
scores |
array_like; |
eigenvalues |
array_like; |
center , scale
|
array_like; |
N. Benjamin Erichson, Peng Zheng, and Sasha Aravkin
[1] N. B. Erichson, P. Zheng, K. Manohar, S. Brunton, J. N. Kutz, A. Y. Aravkin. "Sparse Principal Component Analysis via Variable Projection." Submitted to IEEE Journal of Selected Topics on Signal Processing (2018). (available at 'arXiv https://arxiv.org/abs/1804.00341).
# Create artifical data m <- 10000 V1 <- rnorm(m, 0, 290) V2 <- rnorm(m, 0, 300) V3 <- -0.1*V1 + 0.1*V2 + rnorm(m,0,100) X <- cbind(V1,V1,V1,V1, V2,V2,V2,V2, V3,V3) X <- X + matrix(rnorm(length(X),0,1), ncol = ncol(X), nrow = nrow(X)) # Compute SPCA out <- spca(X, k=3, alpha=1e-3, beta=1e-3, center = TRUE, scale = FALSE, verbose=0) print(out) summary(out)
# Create artifical data m <- 10000 V1 <- rnorm(m, 0, 290) V2 <- rnorm(m, 0, 300) V3 <- -0.1*V1 + 0.1*V2 + rnorm(m,0,100) X <- cbind(V1,V1,V1,V1, V2,V2,V2,V2, V3,V3) X <- X + matrix(rnorm(length(X),0,1), ncol = ncol(X), nrow = nrow(X)) # Compute SPCA out <- spca(X, k=3, alpha=1e-3, beta=1e-3, center = TRUE, scale = FALSE, verbose=0) print(out) summary(out)