Tag: vectorization
-
Why gcc is so much worse at std::vector vectorization of a conditional multiply than clang?
25 Consider following float loop, compiled using -O3 -mavx2 -mfma for (auto i = 0; i < a.size(); ++i) { a[i] = (b[i] > c[i]) ? (b[i] * c[i]) : 0; } Clang done perfect job at vectorizing it. It uses 256-bit ymm registers and understands the difference between vblendps/vandps for the best performance possible.…