How fast should your scripts run? If you read the
Lua literature or follow user groups, you'll often read about lua's
remarkable speed. This results from Lua's design which converts the
script to machine-dependent byte code at run-time; this avoids
reinterpreting the same statements inside loops and functions. The
Pro Script library is designed to place numerically intensive array
processing in Mira's core library of highly optimized functions.
However, there are times when there is no provided method or
function so that array processing must be done within the script
itself. The table below gives some examples of what to expect
expect from your numerically intensive scripts.
The test machine was a minimal Windows machine.
This machine used a 3.0 GHz Pentium Core-2 Duo E-6850 CPU with a
1333 MHz front-side bus and 4 GB of 800 MHz DDR-2 RAM. The
operating system was 32-bit Windows XP/SP3. Modern 64-bit machines
significantly outperform the results listed below.
To increase the significance of the timings, most
procedures were repeated in a loop of 10 to 10000 cycles, with the
execution time was divided accordingly. Each such timing was
repeated 3 to 10 times to obtain different timings, and the average
time was adopted as the benchmark.
|
Benchmark
|
Time (sec)
|
Speed
|
|
Unpack a 1 million element 1-dimensional array and
save it to memory as 64-bit real numbers:
CImage:Set(table) where table has 1
million elements.
|
0.0791
|
12.7 million elements / sec
|
|
Unpack a 250,000 element 1-dimensional array and
save it to memory as 64-bit real numbers:
CImage:Set(table) using 250,000
elements.
|
0.0181
|
13.8 million elements / sec
|
|
Unpack a 10,000 element 1-dimensional array and
save it to memory as 64-bit real numbers:
CImage:Set(table) using 10,000
elements.
|
0.000766
|
13.1 million elements / sec
|
|
Store 1 million table elements:
t={}
for k=1,1000000
do
t[k]=k
end
|
0.220
|
4.54 million elements / sec
|
|
Store 10 million table elements in a global
table:
t={}
for k=1,10000000
do
t[k]=k
end
|
2.53
|
3.95 million elements / sec
|
|
Perform 10 million multiply operations using local
values:
local n=0
local m=0
for i=1,10000000 do
k=n*m
end
|
0.314
|
31.9 million multiply's / sec
|
|
Perform 10 million divides using local values:
local n=0
local m=15
for i=1,10000000
do
k=n/m
end
|
0.323
|
31.0 million divides's / sec
|
|
Perform 10 million adds using local values:
local m=0
local n=0
for i=1,10000000
do
k=n+m
end
|
0.316
|
31.6 million adds / sec
|
|
Perform 10 million adds using global values:
n=0
m=0
for i = 1,10000000
do
k=n+m
end
|
0.898
|
11.1 million adds / sec
|
|
Perform 10 million adds to a number using global
values:
k=0
for i=1,10000000
do
k=k+1
end
|
0.809
|
12.4 million adds / sec
|
|
Perform 10 million empty loops:
k=0
for k=1,100000000
do
end
|
0.177
|
56.5 million loops / sec
|
|
Perform 10 million divides and save in a local
table:
local t={}
local m=3
for i=1,10000000
do
t[k]=k/m
end
|
1.46
|
6.85 million / sec
|
|
Least squares solution of 100 points with 4
parameters and 3 variables using a "hyperplane" basis function
declared in the script
|
0.042
|
24 fits / sec
|
|
Least squares solution of 100 points with 4
parameters and 3 variables using internal "hyperplane" basis
function
|
0.000556
|
1,800 fits / sec
|
|
Least squares solution of 10 points with 4
parameters and 3 variables using internal "hyperplane" basis
function
|
0.000055
|
18,000 fits / sec
|
|
Least squares solution of 1000 points using a 3x2
(6 parameter) 2-D polynomial.
|
0.0008
|
1,250 fits / sec
|
|
Least squares solution using CLsqFit class to fit
10 points with a 3x2 (6 parameter) 2-D polynomial.
|
0.000041
|
24,400 fits / sec
|
|
Least squares solution using CLsqFit class to fit
1000 points with a 6-th order 1-D polynomial.
|
0.0011
|
900 fits /sec
|
|
Create 1 million uniformly distributed
randomillion numbers.
|
0.243
|
4.1 million numbers / sec
|
|
Create 1 million Gaussian distributed randomillion
numbers
|
1.16
|
862,000 numbers / sec
|
|
Histogramillion of 1 million real numbers using
100 bins
|
0.167
|
6 million numbers / sec
|
|
Add two 1200x800 64-bit real images
|
0.00425
|
236 images / sec
|
|
Add two 1200x800 32-bit real images
|
0.00198
|
500 images / sec
|
|
Add two 1200x800 16-bit integer images
|
0.00142
|
704 images / sec
|
|
Add two 1200x800 24-bit RGB images
|
0.00414
|
242 images / sec
|
|
Add two 1200x960 48-bit URGB images
|
0.00444
|
225 images / sec
|
|
Multiply 1200x800 32-bit real images
|
0.0033
|
300 images / sec
|
|
Multiply 1200x800 32-bit real image by a
number
|
0.00475
|
210 images / sec
|
|
Divide two 1200x800 32-bit real images
|
0.0094
|
106 images sec
|
|
The following sequence of arithmetic operations is
executed:
Create a table
of CImage objects, as Im = {}
Im[1] = create
16-bit image of size 1200x800
Im[1] =
convert to 32-bit real pixel data
Im[2] = Im[1]
+ 1000
Im[3] = Im[1]
/ Im[2]
Im[4] = Im[1]
^ Im[3]
All 4 images have 32-bit real pixel type. This
process involves creation of four 4MB images plus the operations
applied to them. The last operation, an exponentiation, is the most
numerically intensive computation.
|
0.097
|
10.3 million pixels / sec
|
|
Apply the sequence of operations in the previous
example but also display all four images in a new image window.
This requires computing the image histogram, transfer function, and
palette mapping for each image.
|
0.369
|
|
|
Load a 1 megapixel 16-bit image from a hard drive,
compute the image histogram, autoscale the transfer function using
gamma=0.6, and display in a new window.
|
0.125
|
8 images / sec
|
Mira Pro x64 Script User's Guide, v.8.76 Copyright Ⓒ 2024
Mirametrics, Inc. All Rights Reserved.