Title: | Randomized Singular Value Decomposition |
---|---|
Description: | Low-rank matrix decompositions are fundamental tools and widely used for data analysis, dimension reduction, and data compression. Classically, highly accurate deterministic matrix algorithms are used for this task. However, the emergence of large-scale data has severely challenged our computational ability to analyze big data. The concept of randomness has been demonstrated as an effective strategy to quickly produce approximate answers to familiar problems such as the singular value decomposition (SVD). The rsvd package provides several randomized matrix algorithms such as the randomized singular value decomposition (rsvd), randomized principal component analysis (rpca), randomized robust principal component analysis (rrpca), randomized interpolative decomposition (rid), and the randomized CUR decomposition (rcur). In addition several plot functions are provided. |
Authors: | N. Benjamin Erichson [aut, cre] |
Maintainer: | N. Benjamin Erichson <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.4 |
Built: | 2024-10-29 03:00:54 UTC |
Source: | https://github.com/erichson/rsvd |
Subsampled MNIST database of handwritten digits. This smaller dataset has 3000 samples for each of the digits corresponding to the class labels 0,1,2,3. Each 28x28 image patch is stored as a flattened row vector.
data('digits')
data('digits')
An object of class rsvd
.
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11):2278-2324, November 1998.
## Not run: library('rsvd') data('digits') #Display first digit digit <- matrix(digits[1,], nrow = 28, ncol = 28) image(digit[,28:1], col = gray(255:0 / 255)) ## End(Not run)
## Not run: library('rsvd') data('digits') #Display first digit digit <- matrix(digits[1,], nrow = 28, ncol = 28) image(digit[,28:1], col = gray(255:0 / 255)) ## End(Not run)
rpca
using ggplot
.Creates a pretty biplot which is showing the individual factor map overlayed by the variables factor map, i.e, plotting both the principal component scores and directions.
ggbiplot( rpcaObj, pcs = c(1, 2), loadings = TRUE, groups = NULL, alpha = 0.6, ellipse = TRUE, alpha.ellipse = 0.2, var_labels = TRUE, var_labels.names = NULL, ind_labels = TRUE, ind_labels.names = NULL )
ggbiplot( rpcaObj, pcs = c(1, 2), loadings = TRUE, groups = NULL, alpha = 0.6, ellipse = TRUE, alpha.ellipse = 0.2, var_labels = TRUE, var_labels.names = NULL, ind_labels = TRUE, ind_labels.names = NULL )
rpcaObj |
Object returned by the |
pcs |
Array_like. |
loadings |
Bool ( |
groups |
Factor, optional. |
alpha |
Scalar, optional. |
ellipse |
Bool ( |
alpha.ellipse |
Scalar, optional. |
var_labels |
Bool ( |
var_labels.names |
Array_like, optional. |
ind_labels |
Bool ( |
ind_labels.names |
Array_like, optional. |
N. Benjamin Erichson, [email protected]
#See ?rpca
#See ?rpca
rpca
using ggplot
.Creates a pretty plot which is showing the correlation of the original variable with the principal component (PCs).
ggcorplot( rpcaObj, pcs = c(1, 2), loadings = TRUE, var_labels = FALSE, var_labels.names = NULL, alpha = 1, top.n = NULL )
ggcorplot( rpcaObj, pcs = c(1, 2), loadings = TRUE, var_labels = FALSE, var_labels.names = NULL, alpha = 1, top.n = NULL )
rpcaObj |
Object returned by the |
pcs |
Array_like. |
loadings |
Bool ( |
var_labels |
Bool ( |
var_labels.names |
Array_like, optional. |
alpha |
Scalar, optional. |
top.n |
Scalar, optional. |
N. Benjamin Erichson, [email protected]
#
#
rpca
using ggplot
.Creates a pretty plot which is showing the individual factor map, i.e, plotting the principal component scores.
ggindplot( rpcaObj, pcs = c(1, 2), groups = NULL, alpha = 0.6, ellipse = TRUE, alpha.ellipse = 0.2, ind_labels = TRUE, ind_labels.names = NULL )
ggindplot( rpcaObj, pcs = c(1, 2), groups = NULL, alpha = 0.6, ellipse = TRUE, alpha.ellipse = 0.2, ind_labels = TRUE, ind_labels.names = NULL )
rpcaObj |
Object returned by the |
pcs |
Array_like. |
groups |
Factor, optional. |
alpha |
Scalar, optional. |
ellipse |
Bool ( |
alpha.ellipse |
Scalar, optional. |
ind_labels |
Bool ( |
ind_labels.names |
Array_like, optional. |
N. Benjamin Erichson, [email protected]
#See ?rpca
#See ?rpca
Creates a pretty screeplpot using ggplot
. By default the explained variance is plotted
agaings the number of the principal component.
Alternatively the explained variance ratio, the cumulative
explained variance ratio, or the eigenvalues can be plotted.
ggscreeplot(rpcaObj, type = c("var", "ratio", "cum", "eigenvals"))
ggscreeplot(rpcaObj, type = c("var", "ratio", "cum", "eigenvals"))
rpcaObj |
Object returned by the |
type |
String c('var', 'ratio', 'cum', 'eigenvals'), optional. |
N. Benjamin Erichson, [email protected]
#
#
Creates a screeplot, variables and individual factor maps to
summarize the results of the rpca
function.
## S3 method for class 'rpca' plot(x, ...)
## S3 method for class 'rpca' plot(x, ...)
x |
Object returned by the |
... |
Additional arguments passed to the individual plot functions (see below). |
ggscreeplot
, ggcorplot
, ggindplot
#
#
Randomized CUR matrix decomposition.
rcur(A, k = NULL, p = 10, q = 0, idx_only = FALSE, rand = TRUE)
rcur(A, k = NULL, p = 10, q = 0, idx_only = FALSE, rand = TRUE)
A |
array_like; |
k |
integer; |
p |
integer, optional; |
q |
integer, optional; |
idx_only |
bool, optional; |
rand |
bool, optional; |
Algorithm for computing the CUR matrix decomposition of a rectangular matrix
,
with target rank
. The input matrix is factored as
using the rid
decomposition. The factor matrix is formed using actual
columns of
, also called the partial column skeleton. The factor matrix
is formed
using actual rows of
, also called the partial row skeleton.
If a probabilistic strategy is used to compute the decomposition, otherwise a
deterministic algorithm is used.
rcur
returns a list with class containing the following components:
array_like;
column subset ;
dimensional array.
Array_like.
row subset ;
dimensional array.
array_like;
connector matrix; dimensional array.
array_like;
index set of the selected columns used to form
.
array_like;
index set of the selected rows used to form
.
array_like;
scores of the selected columns.
array_like;
scores of the selected rows.
N. Benjamin Erichson, [email protected]
[1] N. B. Erichson, S. Voronin, S. L. Brunton and J. N. Kutz. 2019. Randomized Matrix Decompositions Using R. Journal of Statistical Software, 89(11), 1-48. http://doi.org/10.18637/jss.v089.i11.
[2] N. Halko, P. Martinsson, and J. Tropp. "Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions" (2009). (available at arXiv http://arxiv.org/abs/0909.4061).
## Not run: # Load test image data('tiger') # Compute (column) randomized interpolative decompsition # Note that the image needs to be transposed for correct plotting out <- rcur(tiger, k = 150) # Reconstruct image tiger.re <- out$C %*% out$U %*% out$R # Compute relative error print(norm(tiger-tiger.re, 'F') / norm(tiger, 'F')) # Plot approximated image image(tiger.re, col = gray((0:255)/255)) ## End(Not run)
## Not run: # Load test image data('tiger') # Compute (column) randomized interpolative decompsition # Note that the image needs to be transposed for correct plotting out <- rcur(tiger, k = 150) # Reconstruct image tiger.re <- out$C %*% out$U %*% out$R # Compute relative error print(norm(tiger-tiger.re, 'F') / norm(tiger, 'F')) # Plot approximated image image(tiger.re, col = gray((0:255)/255)) ## End(Not run)
Randomized interpolative decomposition.
rid(A, k = NULL, mode = "column", p = 10, q = 0, idx_only = FALSE, rand = TRUE)
rid(A, k = NULL, mode = "column", p = 10, q = 0, idx_only = FALSE, rand = TRUE)
A |
array_like; |
k |
integer, optional; |
mode |
string c('column', 'row'), optional; |
p |
integer, optional; |
q |
integer, optional. |
idx_only |
bool, optional; |
rand |
bool, optional; |
Algorithm for computing the ID of a rectangular matrix
, with target rank
. The input matrix is factored as
using the column pivoted QR decomposition. The factor matrix is formed as a subset of
columns of
, also called the partial column skeleton.
If
mode='row'
, then the input matrix is factored as
using the row pivoted QR decomposition. The factor matrix is now formed as
a subset of rows of
, also called the partial row skeleton.
The factor matrix
contains a
identity matrix as a submatrix,
and is well-conditioned.
If a probabilistic strategy is used to compute the decomposition, otherwise a
deterministic algorithm is used.
rid
returns a list containing the following components:
array_like;
column subset , if
mode='column'
; array with dimensions .
array_like;
row subset , if
mode='row'
; array with dimensions .
array_like;
well conditioned matrix; Depending on the selected mode, this is an
array with dimensions or
.
array_like;
index set of the selected columns or rows used to form
or
.
array_like;
information on the pivoting strategy used during the decomposition.
array_like;
scores of the columns or rows of the input matrix .
array_like;
scores of the selected columns or rows in
or
.
N. Benjamin Erichson, [email protected]
[1] N. Halko, P. Martinsson, and J. Tropp. "Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions" (2009). (available at arXiv http://arxiv.org/abs/0909.4061).
[2] N. B. Erichson, S. Voronin, S. Brunton, J. N. Kutz. "Randomized matrix decompositions using R" (2016). (available at 'arXiv http://arxiv.org/abs/1608.02148).
rcur
,
## Not run: # Load test image data("tiger") # Compute (column) randomized interpolative decompsition # Note that the image needs to be transposed for correct plotting out <- rid(t(tiger), k = 150) # Show selected columns tiger.partial <- matrix(0, 1200, 1600) tiger.partial[,out$idx] <- t(tiger)[,out$idx] image(t(tiger.partial), col = gray((0:255)/255), useRaster = TRUE) # Reconstruct image tiger.re <- t(out$C %*% out$Z) # Compute relative error print(norm(tiger-tiger.re, 'F') / norm(tiger, 'F')) # Plot approximated image image(tiger.re, col = gray((0:255)/255)) ## End(Not run)
## Not run: # Load test image data("tiger") # Compute (column) randomized interpolative decompsition # Note that the image needs to be transposed for correct plotting out <- rid(t(tiger), k = 150) # Show selected columns tiger.partial <- matrix(0, 1200, 1600) tiger.partial[,out$idx] <- t(tiger)[,out$idx] image(t(tiger.partial), col = gray((0:255)/255), useRaster = TRUE) # Reconstruct image tiger.re <- t(out$C %*% out$Z) # Compute relative error print(norm(tiger-tiger.re, 'F') / norm(tiger, 'F')) # Plot approximated image image(tiger.re, col = gray((0:255)/255)) ## End(Not run)
Fast computation of the principal components analysis using the randomized singular value decomposition.
rpca( A, k = NULL, center = TRUE, scale = TRUE, retx = TRUE, p = 10, q = 2, rand = TRUE )
rpca( A, k = NULL, center = TRUE, scale = TRUE, retx = TRUE, p = 10, q = 2, rand = TRUE )
A |
array_like; |
k |
integer; |
center |
bool, optional; |
scale |
bool, optional; |
retx |
bool, optional; |
p |
integer, optional; |
q |
integer, optional; |
rand |
bool, optional; |
Principal component analysis is an important linear dimension reduction technique.
Randomized PCA is computed via the randomized SVD algorithm (rsvd
).
The computational gain is substantial, if the desired number of principal components
is relatively small, i.e. .
The print and summary method can be used to present the results in a nice format.
A scree plot can be produced with ggscreeplot
.
The individuals factor map can be produced with ggindplot
,
and a correlation plot with ggcorplot
.
The predict function can be used to compute the scores of new observations. The data will automatically be centered (and scaled if requested). This is not fully supported for complex input matrices.
rpca
returns a list with class containing the following components:
array_like;
the rotation (eigenvectors); dimensional array.
array_like;
eigenvalues; dimensional vector.
array_like;
standard deviations of the principal components; dimensional vector.
array_like;
the scores / rotated data; dimensional array.
array_like;
the centering and scaling used.
The principal components are not unique and only defined up to sign (a constant of modulus one in the complex case) and so may differ between different PCA implementations.
Similar to prcomp
the variances are computed with the usual divisor N - 1.
N. Benjamin Erichson, [email protected]
[1] N. B. Erichson, S. Voronin, S. L. Brunton and J. N. Kutz. 2019. Randomized Matrix Decompositions Using R. Journal of Statistical Software, 89(11), 1-48. http://doi.org/10.18637/jss.v089.i11.
[2] N. Halko, P. Martinsson, and J. Tropp. "Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions" (2009). (available at arXiv http://arxiv.org/abs/0909.4061).
ggscreeplot
, ggindplot
,
ggcorplot
, plot.rpca
,
predict
, rsvd
library('rsvd') # # Load Edgar Anderson's Iris Data # data('iris') # # log transform # log.iris <- log( iris[ , 1:4] ) iris.species <- iris[ , 5] # # Perform rPCA and compute only the first two PCs # iris.rpca <- rpca(log.iris, k=2) summary(iris.rpca) # Summary print(iris.rpca) # Prints the rotations # # Use rPCA to compute all PCs, similar to \code{\link{prcomp}} # iris.rpca <- rpca(log.iris) summary(iris.rpca) # Summary print(iris.rpca) # Prints the rotations plot(iris.rpca) # Produce screeplot, variable and individuls factor maps.
library('rsvd') # # Load Edgar Anderson's Iris Data # data('iris') # # log transform # log.iris <- log( iris[ , 1:4] ) iris.species <- iris[ , 5] # # Perform rPCA and compute only the first two PCs # iris.rpca <- rpca(log.iris, k=2) summary(iris.rpca) # Summary print(iris.rpca) # Prints the rotations # # Use rPCA to compute all PCs, similar to \code{\link{prcomp}} # iris.rpca <- rpca(log.iris) summary(iris.rpca) # Summary print(iris.rpca) # Prints the rotations plot(iris.rpca) # Produce screeplot, variable and individuls factor maps.
Compute the near-optimal QB decomposition of a rectangular matrix.
rqb(A, k = NULL, p = 10, q = 2, sdist = "normal", rand = TRUE)
rqb(A, k = NULL, p = 10, q = 2, sdist = "normal", rand = TRUE)
A |
array_like; |
k |
integer, optional; |
p |
integer, optional; |
q |
integer, optional; |
sdist |
string |
rand |
bool, optional; |
The randomized QB decomposition factors a rectangular matrix
as
.
is an
matrix with orthogonal columns, and
a
matrix.
The target rank is assumed to be
.
is an oversampling parameter to improve the approximation.
A value between 5 and 10 is recommended, and
is set by default.
The parameter specifies the number of power (subspace) iterations
to reduce the approximation error. This is recommended
if the the singular values decay slowly. In practice 1 or 2 iterations
achieve good results, however, computing power iterations increases the
computational time. The number of power iterations is set to
by default.
rqb
returns a list containing the following components:
array_like;
matrix with orthogonal columns; dimensional array.
array_like;
smaller matrix; dimensional array.
N. Benjamin Erichson, [email protected]
[1] N. B. Erichson, S. Voronin, S. L. Brunton and J. N. Kutz. 2019. Randomized Matrix Decompositions Using R. Journal of Statistical Software, 89(11), 1-48. http://doi.org/10.18637/jss.v089.i11.
[2] N. Halko, P. Martinsson, and J. Tropp. "Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions" (2009). (available at arXiv http://arxiv.org/abs/0909.4061).
Robust principal components analysis separates a matrix into a low-rank plus sparse component.
rrpca( A, lambda = NULL, maxiter = 50, tol = 1e-05, p = 10, q = 2, trace = FALSE, rand = TRUE )
rrpca( A, lambda = NULL, maxiter = 50, tol = 1e-05, p = 10, q = 2, trace = FALSE, rand = TRUE )
A |
array_like; |
lambda |
scalar, optional; |
maxiter |
integer, optional; |
tol |
scalar, optional; |
p |
integer, optional; |
q |
integer, optional; |
trace |
bool, optional; |
rand |
bool, optional; |
Robust principal component analysis (RPCA) is a method for the robust seperation of a
a rectangular matrix
into a low-rank component
and a
sparse comonent
:
To decompose the matrix, we use the inexact augmented Lagrange multiplier method (IALM). The algorithm can be used in combination with either the randomized or deterministic SVD.
rrpca
returns a list containing the following components:
array_like;
low-rank component; dimensional array.
array_like
sparse component; dimensional array.
N. Benjamin Erichson, [email protected]
[1] N. B. Erichson, S. Voronin, S. L. Brunton and J. N. Kutz. 2019. Randomized Matrix Decompositions Using R. Journal of Statistical Software, 89(11), 1-48. http://doi.org/10.18637/jss.v089.i11.
[2] Lin, Zhouchen, Minming Chen, and Yi Ma. "The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices." (2010). (available at arXiv http://arxiv.org/abs/1009.5055).
library('rsvd') # Create toy video # background frame xy <- seq(-50, 50, length.out=100) mgrid <- list( x=outer(xy*0,xy,FUN="+"), y=outer(xy,xy*0,FUN="+") ) bg <- 0.1*exp(sin(-mgrid$x**2-mgrid$y**2)) toyVideo <- matrix(rep(c(bg), 100), 100*100, 100) # add moving object for(i in 1:90) { mobject <- matrix(0, 100, 100) mobject[i:(10+i), 45:55] <- 0.2 toyVideo[,i] = toyVideo[,i] + c( mobject ) } # Foreground/Background separation out <- rrpca(toyVideo, trace=TRUE) # Display results of the seperation for the 10th frame par(mfrow=c(1,4)) image(matrix(bg, ncol=100, nrow=100)) #true background image(matrix(toyVideo[,10], ncol=100, nrow=100)) # frame image(matrix(out$L[,10], ncol=100, nrow=100)) # seperated background image(matrix(out$S[,10], ncol=100, nrow=100)) #seperated foreground
library('rsvd') # Create toy video # background frame xy <- seq(-50, 50, length.out=100) mgrid <- list( x=outer(xy*0,xy,FUN="+"), y=outer(xy,xy*0,FUN="+") ) bg <- 0.1*exp(sin(-mgrid$x**2-mgrid$y**2)) toyVideo <- matrix(rep(c(bg), 100), 100*100, 100) # add moving object for(i in 1:90) { mobject <- matrix(0, 100, 100) mobject[i:(10+i), 45:55] <- 0.2 toyVideo[,i] = toyVideo[,i] + c( mobject ) } # Foreground/Background separation out <- rrpca(toyVideo, trace=TRUE) # Display results of the seperation for the 10th frame par(mfrow=c(1,4)) image(matrix(bg, ncol=100, nrow=100)) #true background image(matrix(toyVideo[,10], ncol=100, nrow=100)) # frame image(matrix(out$L[,10], ncol=100, nrow=100)) # seperated background image(matrix(out$S[,10], ncol=100, nrow=100)) #seperated foreground
The randomized SVD computes the near-optimal low-rank approximation of a rectangular matrix using a fast probablistic algorithm.
rsvd(A, k = NULL, nu = NULL, nv = NULL, p = 10, q = 2, sdist = "normal")
rsvd(A, k = NULL, nu = NULL, nv = NULL, p = 10, q = 2, sdist = "normal")
A |
array_like; |
k |
integer; |
nu |
integer, optional; |
nv |
integer, optional; |
p |
integer, optional; |
q |
integer, optional; |
sdist |
string |
The singular value decomposition (SVD) plays an important role in data analysis, and scientific computing.
Given a rectangular matrix
, and a target rank
,
the SVD factors the input matrix
as
The left singular vectors are the columns of the
real or complex unitary matrix
. The
right singular vectors are the columns
of the real or complex unitary matrix
. The
dominant singular values are the
entries of
, and non-negative and real numbers.
is an oversampling parameter to improve the approximation.
A value of at least 10 is recommended, and
is set by default.
The parameter specifies the number of power (subspace) iterations
to reduce the approximation error. The power scheme is recommended,
if the singular values decay slowly. In practice, 2 or 3 iterations
achieve good results, however, computing power iterations increases the
computational costs. The power scheme is set to
by default.
If , a deterministic partial or truncated
svd
algorithm might be faster.
rsvd
returns a list containing the following three components:
array_like;
singular values; vector of length .
array_like;
left singular vectors; or
dimensional array.
array_like;
right singular vectors; or
dimensional array.
The singular vectors are not unique and only defined up to sign (a constant of modulus one in the complex case). If a left singular vector has its sign changed, changing the sign of the corresponding right vector gives an equivalent decomposition.
N. Benjamin Erichson, [email protected]
[1] N. B. Erichson, S. Voronin, S. L. Brunton and J. N. Kutz. 2019. Randomized Matrix Decompositions Using R. Journal of Statistical Software, 89(11), 1-48. http://doi.org/10.18637/jss.v089.i11.
[2] N. Halko, P. Martinsson, and J. Tropp. "Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions" (2009). (available at arXiv http://arxiv.org/abs/0909.4061).
library('rsvd') # Create a n x n Hilbert matrix of order n, # with entries H[i,j] = 1 / (i + j + 1). hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, "+") } H <- hilbert(n=50) # Low-rank (k=10) matrix approximation using rsvd k=10 s <- rsvd(H, k=k) Hre <- s$u %*% diag(s$d) %*% t(s$v) # matrix approximation print(100 * norm( H - Hre, 'F') / norm( H,'F')) # percentage error # Compare to truncated base svd s <- svd(H) Hre <- s$u[,1:k] %*% diag(s$d[1:k]) %*% t(s$v[,1:k]) # matrix approximation print(100 * norm( H - Hre, 'F') / norm( H,'F')) # percentage error
library('rsvd') # Create a n x n Hilbert matrix of order n, # with entries H[i,j] = 1 / (i + j + 1). hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, "+") } H <- hilbert(n=50) # Low-rank (k=10) matrix approximation using rsvd k=10 s <- rsvd(H, k=k) Hre <- s$u %*% diag(s$d) %*% t(s$v) # matrix approximation print(100 * norm( H - Hre, 'F') / norm( H,'F')) # percentage error # Compare to truncated base svd s <- svd(H) Hre <- s$u[,1:k] %*% diag(s$d[1:k]) %*% t(s$v[,1:k]) # matrix approximation print(100 * norm( H - Hre, 'F') / norm( H,'F')) # percentage error
1600x1200 grayscaled (8 bit [0-255]/255) image.
data('tiger')
data('tiger')
An object of class rsvd
.
S. Taheri (2006). "Panthera tigris altaica", (Online image)
## Not run: library('rsvd') data('tiger') #Display image image(tiger, col = gray((0:255)/255)) ## End(Not run)
## Not run: library('rsvd') data('tiger') #Display image image(tiger, col = gray((0:255)/255)) ## End(Not run)