Performance of Elementary Functions

UPDATE (April 22, 2017): Timings for Mathematica 11.1 have been added to the table, thanks to test script contributed by Bor Plestenjak. I suggest to take a look at his excellent toolbox for multiparameter eigenvalue problems – MultiParEig.

UPDATE (November 17, 2016): All timings in table have been updated to reflect speed improvements in new version of toolbox (4.3.0). Now toolbox computes elementary functions using multi-core parallelism. Also we included timings for the the latest version of MATLAB – 2016b.

UPDATE (June 1, 2016): Initial version of the post included statement that newest version of MATLAB R2016a uses MAPLE engine for variable precision arithmetic (instead of MuPAD as in previous versions). After more detailed checks we have detected that this is not true. As it turned out, MAPLE 2016 silently replaced VPA functionality of MATLAB during installation. Thus we (without knowing it) tested MAPLE Toolbox for MATLAB instead of MathWorks Symbolic Math Toolbox. We apologize for misinformation. Now post provides correct comparison results with Symbolic Math Toolbox/VPA.

Thanks to Nick Higham, Massimiliano Fasi and Samuel Relton for their help in finding this mistake!

From the very beginning we have been focusing on improving performance of matrix computations, linear algebra, solvers and other high level algorithms (e.g. 3.8.0 release notes).

With time, as speed of advanced algorithms has been increasing, elementary functions started to bubble up in top list of hot-spots more frequently. For example the main bottleneck of the multiquadric collocation method in extended precision was the coefficient-wise power function (.^).

Thus we decided to polish our library for computing elementary functions. Here we present intermediate results of this work and traditional comparison with the latest MATLAB R2016b (Symbolic Math Toolbox/Variable Precision Arithmetic), MAPLE 2016 and Wolfram Mathematica 11.1.0.0.

Timing of logarithmic and power functions in 3.9.4.10481:

>> mp.Digits(34);
>> A = mp(rand(2000)-0.5);
>> B = mp(rand(2000)-0.5);
 
>> tic; C = A.^B; toc;
Elapsed time is 67.199782 seconds.
 
>> tic; C = log(A); toc;
Elapsed time is 22.570701 seconds.

Speed of the same functions after optimization, in 4.3.0.12057:

>> mp.Digits(34);
>> A = mp(rand(2000)-0.5);
>> B = mp(rand(2000)-0.5);
 
>> tic; C = A.^B; toc;             % 130 times faster
Elapsed time is 0.514553 seconds.
 
>> tic; C = log(A); toc;           % 95 times faster
Elapsed time is 0.238416 seconds.

Now toolbox computes 4 millions of logarithms in quadruple precision (including negative arguments) in less than a second!

Inspired by this result, we have applied our ideas to speed-up some other elementary functions. Summary table with timings and comparison against MATLAB R2016b (VPA), MAPLE 2016 and Wolfram Mathematica 11.1.0.0 on Core i7 990x / Windows 7 64-bit:

Computation of elementary functions using quadruple precision
(argument is real pseudo-random 2Kx2K matrix)^†
Function	Timing (sec)				Speed-up (times)
Function	MATLAB (VPA)	Maple	Mathematica	Advanpix	Over VPA	Over Maple	Over Mathematica
Power & exponential:
EXP	107.34	756.14	4.54	0.12	886.34	6243.90	37.49
LOG	1161.18	593.98	6.61	0.23	5133.40	2625.91	29.21
LOG10	1438.91	639.46	11.13	0.24	5958.23	2647.88	46.09
LOG2	1442.71	643.17	11.08	0.25	5789.35	2580.94	44.48
SQRT	28.75	427.40	2.60	0.27	105.74	1571.90	9.55
Trigonometric:
SIN	85.28	736.89	6.07	0.15	570.80	4932.33	40.62
COS	78.96	513.73	6.10	0.15	516.44	3359.92	39.89
TAN	1261.92	844.05	8.91	0.17	7277.51	4867.64	51.37
ASIN	105.12	1181.83	12.39	0.39	266.40	2995.01	31.39
ACOS	100.49	1330.99	23.10	0.39	257.55	3411.03	59.19
ATAN	131.92	1039.55	5.71	0.14	974.28	7677.64	42.17
SEC	1466.09	778.14	8.00	0.18	8199.59	4352.01	44.76
CSC	1503.75	793.87	8.35	0.18	8490.95	4482.60	47.13
COT	1511.67	1014.76	10.46	0.20	7728.36	5187.95	53.48
ASEC	1610.29	1962.87	18.45	0.28	5815.44	7088.72	66.62
ACSC	1648.31	1720.76	21.96	0.28	5965.65	6227.86	79.47
ACOT	140.37	1179.84	16.61	0.16	867.58	7291.96	102.63
SINH	117.85	781.78	6.88	0.13	910.78	6041.59	53.17
COSH	117.73	795.34	7.00	0.13	924.06	6242.87	54.92
TANH	121.37	976.78	9.20	0.10	1198.14	9642.45	90.78
ASINH	92.55	778.46	13.51	0.14	656.38	5521.02	95.81
ACOSH	103.78	1349.79	20.51	0.31	332.10	4319.31	65.65
ATANH	121.46	2287.94	11.60	0.32	378.49	7129.76	36.14
SECH	1922.54	978.91	9.10	0.17	11602.53	5907.73	54.93
CSCH	1947.11	960.78	8.96	0.17	11652.35	5749.72	53.63
COTH	1958.51	1268.98	10.90	0.12	16266.72	10539.72	90.502
ASECH	2378.24	2921.78	18.75	0.43	5476.04	6727.56	43.18
ACSCH	2087.72	1188.18	17.78	0.16	12831.71	7302.87	109.26
ACOTH	2117.19	2335.23	19.77	0.26	8083.95	8916.49	75.47
Selected special:
gamma	2491.81	7734.53	228.35	0.76	3266.23	13018.78	299.31
erf	104.11	321.20	125.88	0.16	669.96	2163.26	810.02
bessely(0,x)	7855.70	14923.53	250.38	0.83	9482.98	18014.89	302.25
bessely(1,x)	7302.29	14964.26	267.94	0.83	8786.29	18005.36	322.39
besselj(0,x)	7273.29	9998.60	90.54	0.75	9684.81	13313.72	120.55
besselj(1,x)	5987.67	10153.13	91.89	0.74	8077.25	13696.38	123.96

Advanpix toolbox outperforms MATLAB/VPA by 5000 times, MAPLE by 6766 times and Wolfram Mathematica by 100 times by speed in average. Test scripts are available for download:

Timings of elementary functions: Matlab, Maple, Mathematica and Advanpix (zip)

Run timing_elementary_advanpix to test Advanpix toolbox, and timing_elementary_vpa to test VPA. Don’t forget to add toolbox directory to search path before running the toolbox tests!

***
^†Toolbox’s timings are higher on GNU Linux & Apple Mac OSX. We can do deeper performance optimization on Windows since we have full license of Intel Developer tools on the platform.

Performance of Elementary Functions

Latest Updates

July 23, 2024

July 16, 2024

June 26, 2024

May 24, 2024

Being Used At: