Script Benchmarks


How fast should your scripts run? If you read the Lua literature or follow user groups, you'll often read about lua's remarkable speed. This results from Lua's design which converts the script to machine-dependent byte code at run-time; this avoids reinterpreting the same statements inside loops and functions. The Pro Script library is designed to place numerically intensive array processing in Mira's core library of highly optimized functions. However, there are times when there is no provided method or function so that array processing must be done within the script itself. The table below gives some examples of what to expect expect from your numerically intensive scripts.

The test machine was a minimal Windows machine. This machine used a 3.0 GHz Pentium Core-2 Duo E-6850 CPU with a 1333 MHz front-side bus and 4 GB of 800 MHz DDR-2 RAM. The operating system was 32-bit Windows XP/SP3. Modern 64-bit machines significantly outperform the results listed below.

To increase the significance of the timings, most procedures were repeated in a loop of 10 to 10000 cycles, with the execution time was divided accordingly. Each such timing was repeated 3 to 10 times to obtain different timings, and the average time was adopted as the benchmark.

Benchmark Results

 

Benchmark

Time (sec)

Speed

Unpack a 1 million element 1-dimensional array and save it to memory as 64-bit real numbers:

CImage:Set(table) where table has 1 million elements.

0.0791

12.7 million elements / sec

Unpack a 250,000 element 1-dimensional array and save it to memory as 64-bit real numbers:

CImage:Set(table) using 250,000 elements.

0.0181

13.8 million elements / sec

Unpack a 10,000 element 1-dimensional array and save it to memory as 64-bit real numbers:

CImage:Set(table) using 10,000 elements.

0.000766

13.1 million elements / sec

Store 1 million table elements:

t={}

for k=1,1000000 do

  t[k]=k

end

0.220

4.54 million elements / sec

Store 10 million table elements in a global table:

t={}

for k=1,10000000 do

  t[k]=k

end

2.53

3.95 million elements / sec

Perform 10 million multiply operations using local values:

local n=0

local m=0

for i=1,10000000 do

  k=n*m

end

0.314

31.9 million multiply's / sec

Perform 10 million divides using local values:

local n=0

local m=15

for i=1,10000000 do

  k=n/m

end

0.323

31.0 million divides's / sec

Perform 10 million adds using local values:

local m=0

local n=0

for i=1,10000000 do

  k=n+m

end

0.316

31.6 million adds / sec

Perform 10 million adds using global values:

n=0

m=0

for i = 1,10000000 do

  k=n+m

end

0.898

11.1 million adds / sec

Perform 10 million adds to a number using global values:

k=0

for i=1,10000000 do

  k=k+1

end

0.809

12.4 million adds / sec

Perform 10 million empty loops:

k=0

for k=1,100000000 do

end

0.177

56.5 million loops / sec

Perform 10 million divides and save in a local table:

local t={}

local m=3

for i=1,10000000 do

  t[k]=k/m

end

1.46

6.85 million / sec

Least squares solution of 100 points with 4 parameters and 3 variables using a "hyperplane" basis function declared in the script

0.042

24 fits / sec

Least squares solution of 100 points with 4 parameters and 3 variables using internal "hyperplane" basis function

0.000556

1,800 fits / sec

Least squares solution of 10 points with 4 parameters and 3 variables using internal "hyperplane" basis function

0.000055

18,000 fits / sec

Least squares solution of 1000 points using a 3x2 (6 parameter) 2-D polynomial.

0.0008

1,250 fits / sec

Least squares solution using CLsqFit class to fit 10 points with a 3x2 (6 parameter) 2-D polynomial.

0.000041

24,400 fits / sec

Least squares solution using CLsqFit class to fit 1000 points with a 6-th order 1-D polynomial.

0.0011

900 fits /sec

Create 1 million uniformly distributed randomillion numbers.

0.243

4.1 million numbers / sec

Create 1 million Gaussian distributed randomillion numbers

1.16

862,000 numbers / sec

Histogramillion of 1 million real numbers using 100 bins

0.167

6 million numbers / sec

Add two 1200x800 64-bit real images

0.00425

236 images / sec

Add two 1200x800 32-bit real images

0.00198

500 images / sec

Add two 1200x800 16-bit integer images

0.00142

704 images / sec

Add two 1200x800 24-bit RGB images

0.00414

242 images / sec

Add two 1200x960 48-bit URGB images

0.00444

225 images / sec

Multiply 1200x800 32-bit real images

0.0033

300 images / sec

Multiply 1200x800 32-bit real image by a number

0.00475

210 images / sec

Divide two 1200x800 32-bit real images

0.0094

106 images sec

The following sequence of arithmetic operations is executed:

bullet.gif    Create a table of CImage objects, as Im = {}

bullet.gif    Im[1] = create 16-bit image of size 1200x800

bullet.gif    Im[1] = convert to 32-bit real pixel data

bullet.gif    Im[2] = Im[1] + 1000

bullet.gif    Im[3] = Im[1] / Im[2]

bullet.gif    Im[4] = Im[1] ^ Im[3]

All 4 images have 32-bit real pixel type. This process involves creation of four 4MB images plus the operations applied to them. The last operation, an exponentiation, is the most numerically intensive computation.

0.097

10.3 million pixels / sec

Apply the sequence of operations in the previous example but also display all four images in a new image window. This requires computing the image histogram, transfer function, and palette mapping for each image.

0.369

 

Load a 1 megapixel 16-bit image from a hard drive, compute the image histogram, autoscale the transfer function using gamma=0.6, and display in a new window.

0.125

8 images / sec

Interpretation of Benchmarks

The results above lead to the following conclusions:

bullet.gif    Lua provides high-performance scripting language for a scripting language.

bullet.gif    Mira's high-performance array processing inside Lua provides high numeric performance for repetitive operations.

Related Topics

Contents

Working with Scripts

CImage class


Mira Pro x64 Script User's Guide, Copyright Ⓒ 2023 Mirametrics, Inc. All Rights Reserved.