Script Benchmarks
How fast should your scripts run? If you read the Lua literature or follow user groups, you'll often read about lua's remarkable speed. This results from Lua's design which converts the script to machine-dependent byte code at run-time; this avoids reinterpreting the same statements inside loops and functions. The Pro Script library is designed to place numerically intensive array processing in Mira's core library of highly optimized functions. However, there are times when there is no provided method or function so that array processing must be done within the script itself. The table below gives some examples of what to expect expect from your numerically intensive scripts.
The test machine was a minimal Windows machine. This machine used a 3.0 GHz Pentium Core-2 Duo E-6850 CPU with a 1333 MHz front-side bus and 4 GB of 800 MHz DDR-2 RAM. The operating system was 32-bit Windows XP/SP3. Modern 64-bit machines significantly outperform the results listed below.
To increase the significance of the timings, most procedures were repeated in a loop of 10 to 10000 cycles, with the execution time was divided accordingly. Each such timing was repeated 3 to 10 times to obtain different timings, and the average time was adopted as the benchmark.
Benchmark |
Time (sec) |
Speed |
Unpack a 1 million element 1-dimensional array and save it to memory as 64-bit real numbers: CImage:Set(table) where table has 1 million elements. |
0.0791 |
12.7 million elements / sec |
Unpack a 250,000 element 1-dimensional array and save it to memory as 64-bit real numbers: CImage:Set(table) using 250,000 elements. |
0.0181 |
13.8 million elements / sec |
Unpack a 10,000 element 1-dimensional array and save it to memory as 64-bit real numbers: CImage:Set(table) using 10,000 elements. |
0.000766 |
13.1 million elements / sec |
Store 1 million table elements: t={} for k=1,1000000 do t[k]=k end |
0.220 |
4.54 million elements / sec |
Store 10 million table elements in a global table: t={} for k=1,10000000 do t[k]=k end |
2.53 |
3.95 million elements / sec |
Perform 10 million multiply operations using local values: local n=0 local m=0 for i=1,10000000 do k=n*m end |
0.314 |
31.9 million multiply's / sec |
Perform 10 million divides using local values: local n=0 local m=15 for i=1,10000000 do k=n/m end |
0.323 |
31.0 million divides's / sec |
Perform 10 million adds using local values: local m=0 local n=0 for i=1,10000000 do k=n+m end |
0.316 |
31.6 million adds / sec |
Perform 10 million adds using global values: n=0 m=0 for i = 1,10000000 do k=n+m end |
0.898 |
11.1 million adds / sec |
Perform 10 million adds to a number using global values: k=0 for i=1,10000000 do k=k+1 end |
0.809 |
12.4 million adds / sec |
Perform 10 million empty loops: k=0 for k=1,100000000 do end |
0.177 |
56.5 million loops / sec |
Perform 10 million divides and save in a local table: local t={} local m=3 for i=1,10000000 do t[k]=k/m end |
1.46 |
6.85 million / sec |
Least squares solution of 100 points with 4 parameters and 3 variables using a "hyperplane" basis function declared in the script |
0.042 |
24 fits / sec |
Least squares solution of 100 points with 4 parameters and 3 variables using internal "hyperplane" basis function |
0.000556 |
1,800 fits / sec |
Least squares solution of 10 points with 4 parameters and 3 variables using internal "hyperplane" basis function |
0.000055 |
18,000 fits / sec |
Least squares solution of 1000 points using a 3x2 (6 parameter) 2-D polynomial. |
0.0008 |
1,250 fits / sec |
Least squares solution using CLsqFit class to fit 10 points with a 3x2 (6 parameter) 2-D polynomial. |
0.000041 |
24,400 fits / sec |
Least squares solution using CLsqFit class to fit 1000 points with a 6-th order 1-D polynomial. |
0.0011 |
900 fits /sec |
Create 1 million uniformly distributed randomillion numbers. |
0.243 |
4.1 million numbers / sec |
Create 1 million Gaussian distributed randomillion numbers |
1.16 |
862,000 numbers / sec |
Histogramillion of 1 million real numbers using 100 bins |
0.167 |
6 million numbers / sec |
Add two 1200x800 64-bit real images |
0.00425 |
236 images / sec |
Add two 1200x800 32-bit real images |
0.00198 |
500 images / sec |
Add two 1200x800 16-bit integer images |
0.00142 |
704 images / sec |
Add two 1200x800 24-bit RGB images |
0.00414 |
242 images / sec |
Add two 1200x960 48-bit URGB images |
0.00444 |
225 images / sec |
Multiply 1200x800 32-bit real images |
0.0033 |
300 images / sec |
Multiply 1200x800 32-bit real image by a number |
0.00475 |
210 images / sec |
Divide two 1200x800 32-bit real images |
0.0094 |
106 images sec |
The following sequence of arithmetic operations is executed: Create a table of CImage objects, as Im = {} Im[1] = create 16-bit image of size 1200x800 Im[1] = convert to 32-bit real pixel data Im[2] = Im[1] + 1000 Im[3] = Im[1] / Im[2] Im[4] = Im[1] ^ Im[3] All 4 images have 32-bit real pixel type. This process involves creation of four 4MB images plus the operations applied to them. The last operation, an exponentiation, is the most numerically intensive computation. |
0.097 |
10.3 million pixels / sec |
Apply the sequence of operations in the previous example but also display all four images in a new image window. This requires computing the image histogram, transfer function, and palette mapping for each image. |
0.369 |
|
Load a 1 megapixel 16-bit image from a hard drive, compute the image histogram, autoscale the transfer function using gamma=0.6, and display in a new window. |
0.125 |
8 images / sec |
The results above lead to the following conclusions:
Lua provides high-performance scripting language for a scripting language.
Mira's high-performance array processing inside Lua provides high numeric performance for repetitive operations.
Mira Pro x64 Script User's Guide, Copyright Ⓒ 2023 Mirametrics,
Inc. All Rights Reserved.