hi all,
i'm playing a bit with hardware prefetcing on our dual L5420 nodes.
most info indicates that the performance results should vary from application to application (eg
http://software.intel.com/en-us/articles/optimizing-application-performa... )
i have disabled one of and even both both the hardware prefetecher and the adjacent cache-line prefetch option (both through bios and through msr), but much to my surprise, i don't see any differences at all.
i've tried both real applciations and synthetic ones, and now i'm starting to suspect something else is not ok.
my question is the following: what simple executable/benchmark should clearly demonstrate a difference? i have already tried the whole HPCC suite, and things link STREAM, randomaccess or hpl don't differ much (< 2%) between on or off, where i would expect otherwise.
thanks for any suggestions,
stijn