So, I fired the profiler and SkeletonInstance::resetToPose appeared in the hotspot list. Since it’s not theoretically a performance sensitive function, that felt strange.
I looked at the generated code, and to my surprise, it was awful. These pictures summarize it all. It affects Visual Studio 2008, 2012, 2013 (and probably 2010).
Basically VS is thinking the address may be unaligned when storing, and is using the movq trick; even though it already knows the address is aligned since aligned operations have been performed a few lines above on the same address.
At least 2013 uses movdqu instead of movq, which is a major improvement. Because VS is using a movaps then movq pattern to store memory “safely”, I don’t need an advanced profiler to tell me that will cause a load blocked by store forwarding.
I’ll be trying to isolate the bug into a test case to file a bug report tomorrow and see what happens.
Update: A bug report has been filed. Turns out the issue is quite easy to trigger.
Well for the record in some of my experiments, aligned/unaligned loads/stores don’t make any difference on core i7 when using SSE2 intrinsics.
I’ve just been recommended by Bruce Dawson that I should explain the implications, and optimally put some performance measurements, so I will try to do that soon.
A variety of hardware has to be tested. In some systems, there is no difference, where on others it can quite mean a lot.
Furthermore, it’s not just the fact that it’s unaligned, but also that the integer variant is being used instead of the floating point one.
On some architectures, mixing integer w/ floating point instructions introduce extra clocks of latency; not to mention movdqu requires more bytes to encode than movups/movaps