Architecture of linear systems solver

by Pavel Holoborodko on October 7, 2016

Computational complexity of direct algorithms for solving linear systems of equations is O(n3). The only way to reduce enormous effect of cubic growth is to take advantage of various matrix properties and use specialized solvers1.

Toolbox follows this strategy by relying on poly-solver which automatically detects matrix properties and applies the best matching algorithm for the particular case. In this post we outline our solver architecture for full/dense matrices.

Poly-solver for dense input matrices (click on flowchart to see it in higher resolution):

Architecture of linear systems solver in Advanpix toolbox

Toolbox analyses the structure of input matrix, computes its bandwidth, checks symmetry/Hermitianity, determines permutation to convert matrix to trivial cases (diagonal or triangular) if possible. Then solver selects the most appropriate algorithm depending on matrix properties. Special attention is paid to n-diagonal matrices (banded), frequently encountered in solution of PDE.

Solver relies on following decompositions (from slowest to fastest): rank-revealing QR, LU with partial pivoting, LDLT and Cholesky2. Each decomposition is implemented in several variants, tailored for particular matrix type (tridiagonal3, n-diagonal, full) and enabled with parallelism – thus performance scales with number of CPU cores in all cases.

The need for such diversity of algorithms can be seen from timings comparison4:

Timing comparison LU vs. LDLT and Cholesky

Solving 2000x2000 matrix by LU under 25 seconds is notable fact on its own. However timing can be reduced further to 17 seconds (LDLT) if matrix is symmetric, or even to 13 seconds if matrix is positive definite (Cholesky).

In case of n-diagonal matrices, specialized solver delivers significantly better timings as well:

Performance gain is around 20 times for 2000x2000 matrices. But most importantly, speed-up increases with matrix size (since complexity growth of banded LU decomposition is sub-cubical).

Similarly to quadruple (34-digits), specialized solvers works well in higher precision:

Still we see substantial speed-up (>4 times) but it is lower since higher precision needs more memory and algorithms slowly become memory bound in addition to large constant in front of n3.

1Iterative solvers use another strategy by relying only on matrix-vector operations of O(n2).

2The LDLT decomposition is considered not stable in some cases, but we rely on Bunch-Kaufman diagonal pivoting which is as stable as LU with pivoting.

3Computation of reciprocal condition number might take up to 50% of total solver time, thus in trivial cases (including tridiagonal) it is not computed.

4We used newest version of toolbox ( and MATLAB R2016a on Core i7 990x to built the plots above.

{ 2 comments… read them below or add one }

Mark L. Stone January 17, 2017 at 3:19 pm

For linear system solve for real symmetric positive definite matrix which is not-banded, flowchart shows Cholesky solver if Cholesky succeeds, and LDL’ solver if Cholesky fails. Is there a way to find out whether Cholesky succeeded without making a separate call [R,p] = chol(A) ?

Consider a case in which A is extremely ill-conditioned. Presumably, for a given A, there will be some minimum precision at which Cholesky succeeds.


Pavel Holoborodko January 17, 2017 at 3:51 pm

@”Is there a way to find out whether Cholesky succeeded without making a separate call [R,p] = chol(A) ?”
Currently MLDIVIDE does not provide this information.

To avoid extra call for “chol” and to get precise control you can use something like:

 [R,p] = chol(A);
 if p==0
   x = R\(R'\b);
   [L, D, p] = ldl(A, 'vector');
   x(p,:) = L'\(D\(L\(b(p,:))));

This mimics the flowchart branch for symmetric matrices, but it might be slower than MLDIVIDE itself (since we do not store factors explicitly and there is no data exchange with MATLAB).


Leave a Comment

Previous post:

Next post: