y***@dodgeit.com
2007-08-10 03:58:55 UTC
I have a completely deterministic executable on a idle machine, and
yet I'm getting some wildly fluctuating running times.
The executable in question is a SPEC benchmark, and so should be
completely deterministic. The machine is idle, no one else is logged
on, and the benchmark gets 99% of the CPU for the duration. The
machine is a CELL Blade running Fedora 7 and the benchmark is single-
threaded and running completely on the PPE, although I've seen this on
POWER5 as well. I'm using the time command to get these measurements.
My problem is simple, I have no explanation for the variable running
time, which fluctuates pretty drastically. The last two runs gave me
7m24s and 9m19s. Does anyone have any idea why this would happen? The
only significant difference between the two runs was in involuntary
context switches (457 vs 564), which suggests that for some reason one
run is getting much less work done per time slice than the other...
and I have no clue why. Things like # of page faults and voluntary
context switches are the same. Now, I don't expect the exact same
running time every invocation, since the rest of the machine isn't
free from outside influences, but like I said, I've made sure that
disturbances to the machine have been minimized and so I don't expect
over 2 minutes difference in running time. Typically I most of the
times are distributed closely around 7m and 9m, for whatever reason.
Have I missed something? I've considered cache, the SMP nature of the
PPE, and scheduling, but I don't see how those might contribute to
this problem. If anyone has any ideas I'd love to hear them.
yet I'm getting some wildly fluctuating running times.
The executable in question is a SPEC benchmark, and so should be
completely deterministic. The machine is idle, no one else is logged
on, and the benchmark gets 99% of the CPU for the duration. The
machine is a CELL Blade running Fedora 7 and the benchmark is single-
threaded and running completely on the PPE, although I've seen this on
POWER5 as well. I'm using the time command to get these measurements.
My problem is simple, I have no explanation for the variable running
time, which fluctuates pretty drastically. The last two runs gave me
7m24s and 9m19s. Does anyone have any idea why this would happen? The
only significant difference between the two runs was in involuntary
context switches (457 vs 564), which suggests that for some reason one
run is getting much less work done per time slice than the other...
and I have no clue why. Things like # of page faults and voluntary
context switches are the same. Now, I don't expect the exact same
running time every invocation, since the rest of the machine isn't
free from outside influences, but like I said, I've made sure that
disturbances to the machine have been minimized and so I don't expect
over 2 minutes difference in running time. Typically I most of the
times are distributed closely around 7m and 9m, for whatever reason.
Have I missed something? I've considered cache, the SMP nature of the
PPE, and scheduling, but I don't see how those might contribute to
this problem. If anyone has any ideas I'd love to hear them.