Script Benchmarks
How fast should your scripts run? If you read the Lua literature or follow user groups, you'll often read about lua's remarkable speed.. The Pro Script library is designed so that most numerically intensive processing uses Mira's core library of highly optimized array functions. However, there are times when you need to do intensive numeric processing inside the script itself, as when working with large tables of data, or processing millions of values using your own script code. So you aren't left in the dark about what "fast" means, here are some benchmarks we've measured at Mirametrics. The table below gives you some idea of what you might expect from your numerically intensive scripts.
The test machine used was chosen to be representative of a typical "fast" machine in use by Mira users. This machine uses a 3.0 GHz Pentium Core-2 Duo E-6850 CPU with a 1333 MHz front-side bus and 4 GB of 800 MHz DDR-2 RAM. The operating system was Windows XP/SP3. Running screen applications included Visual Studio 2008, Calculator, Outlook, Windows Explorer, and Mira MX. To increase the significance of the timings, most procedures were repeated in a loop of 10 to 10000 cycles and the time value was divided accordingly. Each timing was then repeated 3 to 10 times and the typical value, rather than the lowest value, was adopted as the benchmark.
Benchmark |
Time (sec) |
Speed |
Unpack a 1 million element lua table and save it to a memory buffer as 64-bit real numbers. CImage:Set(table) where table has 1 million elements. |
0.0791 |
12.7 million elements / sec |
Unpack a 250,000 element lua table and save it to a memory buffer as 64-bit real numbers. CImage:Set(table) using 250,000 elements. |
0.0181 |
13.8 million elements / sec |
Unpack a 10,000 element lua table and save it to a memory buffer as 64-bit real numbers. CImage:Set(table) using 10,000 elements. |
0.000766 |
13.1 million elements / sec |
Set 1 million table elements t={} for k=1,1000000 do t[k]=k end |
0.220 |
4.54 million elements / sec |
Set 10 million table elements in a global table t={} for k=1,10000000 do t[k]=k end |
2.53 |
3.95 million elements / sec |
Create and set 1 million elements in a local table local t={} for k=1,10000000 do t[k]=k end |
0.096 |
10.4 million elements / sec |
Perform 10 million multiply's using local values local n=0 local m=0 for i=1,10000000 do k=n*m end |
0.314 |
31.9 million multiply's / sec |
Perform 10 million divides using local values local n=0 local m=0 for i=1,10000000 do k=n/m end |
0.323 |
31.0 million divides's / sec |
Perform 10 million adds using local values local m=0 local n=0 for i=1,10000000 do k=n+m end |
0.316 |
31.6 million adds / sec |
Perform 10 million adds using global values n=0 m=0 for i = 1,10000000 do k=n+m end |
0.898 |
11.1 million adds / sec |
Perform 10 million adds to a number using global values k=0 for i=1,10000000 do k=k+1 end |
0.809 |
12.4 million adds / sec |
Perform 10 million empty loops k=0 for k=1,100000000 do end |
0.177 |
56.5 million loops / sec |
Perform 10 million divides and save in a local table local t={} local m=3 for i=1,10000000 do t[k]=k/m end |
1.46 |
6.85 million / sec |
Least squares solution of 100 points with 4 parameters and 3 variables using a "hyperplane" basis function declared in the script |
0.042 |
24 fits / sec |
Least squares solution of 100 points with 4 parameters and 3 variables using internal "hyperplane" basis function |
0.000556 |
1,800 fits / sec |
Least squares solution of 10 points with 4 parameters and 3 variables using internal "hyperplane" basis function |
0.000055 |
18,000 fits / sec |
Least squares solution of 1000 points using a 3x2 (6 parameter) 2-D polynomial. |
0.0008 |
1,250 fits / sec |
Least squares solution using CLsqFit class to fit 10 points with a 3x2 (6 parameter) 2-D polynomial. |
0.000041 |
24,400 fits / sec |
Least squares solution using CLsqFit class to fit 1000 points with a 6-th order 1-D polynomial. |
0.0011 |
900 fits /sec |
Create 1 million uniformly distributed random numbers. |
0.243 |
4.1 million numbers / sec |
Create 1 million Gaussian distributed random numbers |
1.16 |
862,000 numbers / sec |
Histogram of 1 million real numbers using 100 bins |
0.167 |
6 million numbers / sec |
Add two 1200x800 64-bit real images |
0.00425 |
236 images / sec |
Add two 1200x800 32-bit real images |
0.00198 |
500 images / sec |
Add two 1200x800 16-bit integer images |
0.00142 |
704 images / sec |
Add two 1200x800 24-bit RGB images |
0.00414 |
242 images / sec |
Add two 1200x960 48-bit URGB images |
0.00444 |
225 images / sec |
Multiply 1200x800 32-bit real images |
0.0033 |
300 images / sec |
Multiply 1200x800 32-bit real image by a number |
0.00475 |
210 images / sec |
Divide two 1200x800 32-bit real images |
0.0094 |
106 images sec |
1200x800 16-bit image converted to 32-bit real as I[1], then I[2] = I[1] + 1000, I[3] = I[1] / I[2], and I[4] = I[1] ^ I[3]. All 4 images had 32-bit real pixel type. This process involved creation of 4 4MB images as well as the image mathematical operations between them. The last operation raised I[1] to the power of I[3], pixel by pixel (a very intensive computation). |
0.097 |
10.3 million pixels / sec |
Same as above but also display all 4 32-bit real images in a new image window. This also includes computation of an image histogram, transfer function, and palette mapping for each image. |
0.369 |
|
Load 1 Megapixel image of 16 bit pixels from hard drive, compute image histogram, autoscale transfer function using gamma=0.6, and display in a new window. |
0.125 |
8 images / sec |
If there are 2 major results to be gleaned from the table above, they are as follows: