Version 3.9.0 Release Notes

by Pavel Holoborodko on November 20, 2015

Over the course of the last three months we have added many important improvements and bug fixes into toolbox. We decided to produce the new official release as it contains critical updates for all platforms (even though eigensolver for large sparse matrices is not included yet).

Here is the main changes:

  1. The Core: Multithreading manager has been updated with better load-balancing and non-blocking synchronization. This results in 15-20% performance boost in all operations with dense matrices, even including the matrix multiplication.
  2. Eigenvalues.
    • Speed of unsymmetric generalized eigen-decomposition has been improved. Now toolbox uses blocked version of Hessenberg tridiagonal reduction[1] which gives us timings lower by 20%.
    • Speed of symmetric eigen-decomposition has been boosted up as well. All cases are covered, including the standard and generalized problems in quadruple and arbitrary precision modes and both complexities.

      Actual speed-up factor depends on number of cores and matrix size. In our environment Core i7/R2015a/Win7 x64 we see x3 times ratio for 1Kx1K matrix:

      mp.Digits(34)
      A = mp(rand(1000)-0.5);
      A = A+A';
       
      % Previous versions
      tic; eig(A); toc;
      Elapsed time is 48.154 seconds.
       
      % 3.9.0.9919
      tic; eig(A); toc;      % x3.4 times faster
      Elapsed time is 14.212 seconds.

      Speed-up is even higher for other precision levels:

      mp.Digits(50)
      A = mp(rand(1000)-0.5);
      A = A+A';
       
      % Previous versions
      tic; eig(A); toc;
      Elapsed time is 204.431 seconds.
       
      % 3.9.0.9919
      tic; eig(A); toc;      % x5.3 times faster
      Elapsed time is 38.534 seconds.

      In fact, all operations with dense symmetric/Hermitian and SPD/HPD matrices have became faster: chol, inv, det, solvers, etc. For example, speed-up ratio for inv is up to x10 times depending on matrix structure and number of cores.

  3. Read More

{ 0 comments }

In connection with our previous study, in this post we analyse existing rational approximations for modified Bessel function of the first kind of order one – I_1(x).

At first, we consider well known and widely used methods, then propose new approximations which deliver the lowest relative error and have favorable computation structure (pure polynomials).

We study properties of the following rational approximations: Blair & Edwards[1], Numerical Recipes[2], SPECFUN 2.5[3], and Boost 1.59.0[4]. Similar to I_0(x), all of them have the structure and intervals, proposed by Blair & Edwards[1]:

(1)    \begin{alignat*}{2} I_1(x)&\approx xP_{n}(x^2)/Q_{m}(x^2),              \qquad &&|x| &\le 15.0 \\ x^{1/2} e^{-x} I_1(x)&\approx P_{r}(1/x)/Q_{s}(1/x), \qquad &&  x &\ge 15.0 \end{alignat*}

where P_*(x) and Q_*(x) are polynomials of corresponding degrees.

Actual accuracy of each approximation in terms of machine epsilon, eps = 2-52 ˜ 2.22e-16, is plotted below:
(Please see our report on I0(x) for technical details).

Blair & Edwards[1]


Read More

{ 0 comments }

In this post we will study properties of rational approximations for modified Bessel function of the first kind I_0(x) commonly used to compute the function values in double precision for real arguments. We compare the actual accuracy of each method, discuss ways for improvement and propose new approximation which delivers the lowest relative error.

BesselI Functions (1st Kind, n=0,1,2,3)

We consider following rational approximations: Blair & Edwards[1], Numerical Recipes[2], SPECFUN 2.5[3], and Boost 1.59.0[4]. All libraries use approximation of the same form, proposed in the work of Blair & Edwards[1]:

(1)    \begin{alignat*}{2} I_0(x)&\approx P_n(x^2)/Q_m(x^2),              \qquad &&|x| &\le 15.0 \\ x^{1/2} e^{-x} I_0(x)&\approx P_r(1/x)/Q_s(1/x), \qquad &&  x &\ge 15.0 \end{alignat*}

where P_*(x) and Q_*(x) are polynomials of corresponding degrees.

Read More

{ 8 comments }

DevNotes #1. Multiplicative Operations with Complex Infinity

by Pavel Holoborodko on October 19, 2015

Every day we learn something new, gain experience and insights, collect interesting facts and tips. This process is especially intensive for software development when you deal with various algorithms, libraries, cross-platfrom development, performance optimization, etc.

To collect the summaries, ideas and facts I have been keeping development notes on a paper for a long time. But old style notebooks started to piled up and I have decided to write short posts for the blog instead. Hopefully it will be interesting for someone beside myself. Today will be the first attempt.
***

Different programming languages follow different rules in complex multiplicative operations when one of the operands is infinity. This came quite a surprise (thanks to Ciprian Necula who has drawn my attention to the issue).

All main mathematical packages (MATLAB, Mathematica and Maple) agreed on the following:

>> 1/(Inf+1i)                  % 1 
ans =
     0
 
>> Inf*(0+1i)
ans =
            0 +        Inf*i    % 2

Read More

{ 0 comments }

Multiquadric (MQ) collocation method [1-3] is one of the most efficient algorithms for numerical solution of partial differential equations. It achieves exponential error convergence rate [4] and can handle irregular shaped domains well (in contrast to spectral methods).

Accuracy of the solution is controlled by shape parameter \tiny{c} of MQ-radial basis function (RBF). With the increase of c (flat function) accuracy also grows. However, in the same time, large c leads to ill-conditioned system matrix. This effect received much attention from researchers [5-9] with focus on finding pre-conditioners, optimal values for c, etc.

In this post we want to study the MQ collocation method by applying it to Poisson equation. To tackle the ill-conditioning and see the related effects we use different levels of precision.

At first, using arbitrary precision computations we evaluate empirical dependency of approximation error vs. shape parameter. In the second simulation we check how accuracy changes with grid refinement. See similar studies in [10-12].

Idea of running such tests was inspired by excellent book of Scott A. Sarra and Edward J. Kansa: Multiquadric Radial Basis Function Approximation Methods for the Numerical Solution of Partial Differential Equations (pdf). Original scripts from the book were modified to work with toolbox.

Problem

We consider two-dimensional Poisson problem on a unit disk domain (see section 3.1 in the book):

    \[ u_{xx}+u_{yy}=f(x,y) \]

The function f(x,y) is specified so that the exact solution is

    \[ u(x,y)=\frac{65}{65+\left(x-\frac{1}{5}\right)^2+\left(y+\frac{1}{10}\right)^2} \]

Dirichlet boundary conditions are prescribed using the exact solution.

To solve the problem we apply asymmetric MQ RBF collocation method using the set of N near-equidistant centers generated in the domain:

Collocation centers, N=346
Collocation centers, N=660


Read More

{ 5 comments }

Citations Digest v.4

by Pavel Holoborodko on April 2, 2015

New toolbox citations found in Google Scholar or reported by the authors:

Books

Read More

{ 0 comments }

Version 3.8.0 release notes

by Pavel Holoborodko on February 12, 2015

Over last several months we have been focusing on making toolbox fast in processing the large matrices. Although many updates were released along the way (see full changelog) here we outline only the most important changes:

  1. High-level dense linear algebra routines have been updated to use modern performance-oriented algorithms:
    • Divide & Conquer for SVD [1, 2].
    • Multi-shift QR for nonsymmetric eigenproblems [3, 4, 7].
    • MRRR for Hermitian, symmetric and tridiagonal symmetric eigenproblems [5, 6].
  2. All algorithms are implemented in quadruple as well as in arbitrary precision modes.

  3. Most of the dense matrix algorithms have been enabled with parallel execution on multi-core CPUs. In future we plan to add support for GPU (at least for quadruple precision).
  4. Many significant improvements have been made in toolbox architecture: low-level kernel modules are re-designed to minimize memory fragmentation, cache misses, copy-overhead in data exchange with MATLAB, etc.

All combined allows toolbox to reach the following results:

UPDATE (April 22, 2017): Timings for Mathematica 11.1 have been added to the table, thanks to test script contributed by Bor Plestenjak.


Matrix factorization in quadruple precision (500×500, 34 digits)
Factorization Timing (sec)Speed-up (times)
MATLAB Maple MathematicaAdvanpixOver MATLABOver MapleOver Mathematica
A & B are pseudo-random real matrices (500×500):
[L,U] = lu(A) 249.13 85.1615.120.47534.38 182.6732.42
[Q,R] = qr(A) 514.34 458.8644.393.08167.25 149.2114.43
[U,S,V] = svd(A) 5595.26 4317.40376.949.57584.62 451.1139.38
S = svd(A) 2765.11 645.6056.962.98927.17 216.4819.10
[V,D] = eig(A) 21806.30 6060.90584.8633.75646.05 179.5717.33
lambda = eig(A) 3384.58 7822.30205.4623.55143.70 332.118.72
[V,D] = eig(A,B) n/a 11358.00928.89112.05101.36 8.29
lambda = eig(A,B) n/a 5273.00510.8060.3787.34 8.46

Advanpix toolbox outperforms VPA by 500 times, Maple by 210 times and Wolfram Mathematica by 18 times in average.
Read more

{ 0 comments }

Advanpix vs.VPA – Array Manipulation Operations

by Pavel Holoborodko on November 14, 2014

New version of toolbox (≥3.7.2) has updated engine for array manipulation operations. We would like to show the improved timings and compare with previous version.

The very basic array manipulation operations (subscripted assignment, reference and concatenation) play crucial role in overall performance of the program. The poorly designed and implemented they might deteriorate the speed to the point when program actually spends more time in array indexing then computation itself.

MATLAB interpreter has extremely optimized basic operations for built-in “double” and “single” precision matrices. However situation is quite different when it comes to arrays of custom-defined types (like our multiple-precision class – mp).

As it was reported earlier [1], array manipulation functions provided by MATLAB for custom-defined types have absolutely unacceptable level of performance. We have investigated the issue and proposed our solution [1,2,3].

As a quick demo, here is comparison of sequential element-wise assignment (without pre-allocation):

% Variable-precision arithmetic (Symbolic Math Toolbox, MATLAB R2014a)
p = vpa('pi');
tic
for ii = 1:300
for jj = 1:300
   A(ii,jj) = p;
end
end
toc
 
Elapsed time is 848.209 seconds.
 
% Multiprecision Computing Toolbox >= 3.7.2
p = mp('pi');
tic
for ii = 1:300
for jj = 1:300
   A(ii,jj) = p;
end
end
toc
 
Elapsed time is 7.17263 seconds.

Symbolic Math Toolbox/VPA requires 14 minutes to just fill the 300x300 matrix!
Advanpix toolbox does the same in 7 seconds (118 times faster).

Read More

{ 0 comments }

Citations Digest v.3

by Pavel Holoborodko on February 3, 2014

Last few months were fruitful for toolbox citations:

Also toolbox was mentioned in following works:

Please let me know if you know other papers to be added to the list.

If you are using the toolbox in your research please cite it in your paper – this greatly supports future development.

{ 0 comments }

We continue to compare our toolbox with Maple and Symbolic Math Toolbox (VPA). This time we focus on main operations in dense linear algebra – solvers and factorizations.

Every case/routine is tested on a random 100 x 100 matrix. Average timing over ten tests is used.

Comparison of linear system solver (“\”) in quadruple precision
Matrix Type Timing (sec)Speed-up (times)
VPA MapleAdvanpixOver VPAOver Maple
General Real 0.99 0.580.0242.2224.67
Symmetric Positive Definite 0.99 0.630.0177.1049.31
General Complex 3.13 2.430.0838.3429.74
Hermitian Positive Definite 3.72 2.560.0570.8548.82

In average Advanpix solver is faster by 60 times over VPA and 40 times over Maple.

Comparison of matrix factorization in quadruple precision
Factorization Timing (sec)Speed-up (times)
VPA MapleAdvanpixOver VPAOver Maple
A is a general real matrix:
[L,U] = lu(A) 1.67 0.610.0278.26 28.29
[Q,R] = qr(A) 3.46 2.600.0844.44 33.46
[U,S,V] = svd(A) 30.8520.990.30101.6669.17
S = svd(A) 15.882.510.07223.5435.35
[V,D] = eig(A) 57.2529.540.7279.20 40.86
lambda = eig(A) 21.2126.900.3462.31 79.01
A is a general complex matrix:
[L,U] = lu(A) 2.742.560.0741.6338.89
[Q,R] = qr(A) 10.9911.390.2543.3844.99
[U,S,V] = svd(A) 65.5657.530.8180.9671.04
S = svd(A) 25.2810.220.2698.2439.71
[V,D] = eig(A) 89.30129.542.8531.3445.46
lambda = eig(A) 33.14113.991.2925.6788.32

As for factorizations, Advanpix toolbox outperforms VPA by 75 times and Maple by 50 times in average. Worth to note the massive speed gain of singular value decomposition (SVD) in particular.

Read More

{ 0 comments }