Benchmarks

Script Benchmarks

How fast should your scripts run? If you read the Lua literature or follow user groups, you'll often read about lua's remarkable speed. This results from Lua's design which converts the script to machine-dependent byte code at run-time; this avoids reinterpreting the same statements inside loops and functions. The Pro Script library is designed to place numerically intensive array processing in Mira's core library of highly optimized functions. However, there are times when there is no provided method or function so that array processing must be done within the script itself. The table below gives some examples of what to expect expect from your numerically intensive scripts.

The test machine was a minimal Windows machine. This machine used a 3.0 GHz Pentium Core-2 Duo E-6850 CPU with a 1333 MHz front-side bus and 4 GB of 800 MHz DDR-2 RAM. The operating system was 32-bit Windows XP/SP3. Modern 64-bit machines significantly outperform the results listed below.

To increase the significance of the timings, most procedures were repeated in a loop of 10 to 10000 cycles, with the execution time was divided accordingly. Each such timing was repeated 3 to 10 times to obtain different timings, and the average time was adopted as the benchmark.

Benchmark Results

Benchmark	Time (sec)	Speed
Unpack a 1 million element 1-dimensional array and save it to memory as 64-bit real numbers: CImage:Set(table) where table has 1 million elements.	0.0791	12.7 million elements / sec
Unpack a 250,000 element 1-dimensional array and save it to memory as 64-bit real numbers: CImage:Set(table) using 250,000 elements.	0.0181	13.8 million elements / sec
Unpack a 10,000 element 1-dimensional array and save it to memory as 64-bit real numbers: CImage:Set(table) using 10,000 elements.	0.000766	13.1 million elements / sec
Store 1 million table elements: t={} for k=1,1000000 do t[k]=k end	0.220	4.54 million elements / sec
Store 10 million table elements in a global table: t={} for k=1,10000000 do t[k]=k end	2.53	3.95 million elements / sec
Perform 10 million multiply operations using local values: local n=0 local m=0 for i=1,10000000 do k=n*m end	0.314	31.9 million multiply's / sec
Perform 10 million divides using local values: local n=0 local m=15 for i=1,10000000 do k=n/m end	0.323	31.0 million divides's / sec
Perform 10 million adds using local values: local m=0 local n=0 for i=1,10000000 do k=n+m end	0.316	31.6 million adds / sec
Perform 10 million adds using global values: n=0 m=0 for i = 1,10000000 do k=n+m end	0.898	11.1 million adds / sec
Perform 10 million adds to a number using global values: k=0 for i=1,10000000 do k=k+1 end	0.809	12.4 million adds / sec
Perform 10 million empty loops: k=0 for k=1,100000000 do end	0.177	56.5 million loops / sec
Perform 10 million divides and save in a local table: local t={} local m=3 for i=1,10000000 do t[k]=k/m end	1.46	6.85 million / sec
Least squares solution of 100 points with 4 parameters and 3 variables using a "hyperplane" basis function declared in the script	0.042	24 fits / sec
Least squares solution of 100 points with 4 parameters and 3 variables using internal "hyperplane" basis function	0.000556	1,800 fits / sec
Least squares solution of 10 points with 4 parameters and 3 variables using internal "hyperplane" basis function	0.000055	18,000 fits / sec
Least squares solution of 1000 points using a 3x2 (6 parameter) 2-D polynomial.	0.0008	1,250 fits / sec
Least squares solution using CLsqFit class to fit 10 points with a 3x2 (6 parameter) 2-D polynomial.	0.000041	24,400 fits / sec
Least squares solution using CLsqFit class to fit 1000 points with a 6-th order 1-D polynomial.	0.0011	900 fits /sec
Create 1 million uniformly distributed randomillion numbers.	0.243	4.1 million numbers / sec
Create 1 million Gaussian distributed randomillion numbers	1.16	862,000 numbers / sec
Histogramillion of 1 million real numbers using 100 bins	0.167	6 million numbers / sec
Add two 1200x800 64-bit real images	0.00425	236 images / sec
Add two 1200x800 32-bit real images	0.00198	500 images / sec
Add two 1200x800 16-bit integer images	0.00142	704 images / sec
Add two 1200x800 24-bit RGB images	0.00414	242 images / sec
Add two 1200x960 48-bit URGB images	0.00444	225 images / sec
Multiply 1200x800 32-bit real images	0.0033	300 images / sec
Multiply 1200x800 32-bit real image by a number	0.00475	210 images / sec
Divide two 1200x800 32-bit real images	0.0094	106 images sec
The following sequence of arithmetic operations is executed: Create a table of CImage objects, as Im = {} Im[1] = create 16-bit image of size 1200x800 Im[1] = convert to 32-bit real pixel data Im[2] = Im[1] + 1000 Im[3] = Im[1] / Im[2] Im[4] = Im[1] ^ Im[3] All 4 images have 32-bit real pixel type. This process involves creation of four 4MB images plus the operations applied to them. The last operation, an exponentiation, is the most numerically intensive computation.	0.097	10.3 million pixels / sec
Apply the sequence of operations in the previous example but also display all four images in a new image window. This requires computing the image histogram, transfer function, and palette mapping for each image.	0.369
Load a 1 megapixel 16-bit image from a hard drive, compute the image histogram, autoscale the transfer function using gamma=0.6, and display in a new window.	0.125	8 images / sec

Interpretation of Benchmarks

The results above lead to the following conclusions:

Lua provides high-performance scripting language for a scripting language.

Mira's high-performance array processing inside Lua provides high numeric performance for repetitive operations.

Benchmark Results

Interpretation of Benchmarks

Related Topics