c++ - What does clang's `-Ofast` option do in practical terms especially for any differences from gcc? -

August 15, 2010

similar question of what gcc's ffast-math do? , related question of clang optimization levels, i'm wondering clang's -ofast optimization in practical terms , whether these differ @ gcc or more hardware dependent compiler dependent.

according accepted answer clang's optimization levels, -ofast adds -o3 optimizations: -fno-signed-zeros -freciprocal-math -ffp-contract=fast -menable-unsafe-fp-math -menable-no-nans -menable-no-infs. seems entirely floating point math related. these optimizations mean in practical terms things c++ common mathematical functions floating point numbers on cpu intel core i7 , how reliable these differences?

for example, in practical terms:

the code std::isnan(std::numeric_limits<float>::infinity() * 0) returns true me -o3. believe what's expected of ieee math compliant results.

with -ofast however, false return value. additionally, operation (std::numeric_limits<float>::infinity() * 0) == 0.0f returns true.

i don't know whether same what's seen gcc. it's not clear me how architecture dependent results are, nor how compiler dependent are, nor whether there's applicable standard -ofast.

if has perhaps produced set of unit tests or code koans answers this, may ideal. i've started rather not reinvent wheel.

describing how each of these flags affect each of math function require work, i'll try give example each instead.
leaving burden see how each affect given function.

`-fno-signed-zeros`

assumes code doesn't depend on sign of zero.
in fp arithmetic 0 not absorbing element w.r.t. multiplication: 0 · x = x · 0 ≠ 0 because 0 has sign , thus, example -3 · 0 = -0 ≠ 0 (where 0 denotes +0).

you can see live on godbolt multiplication 0 unfolded constant 0 -ofast

float f(float a) {     return a*0; }  ;with -ofast f(float):                                  # @f(float)         xorps   xmm0, xmm0         ret  ;with -o3 f(float): # @f(float)   xorps xmm1, xmm1   mulss xmm0, xmm1   ret

a eof noted in comments depends on finite arithmetic.

`-freciprocal-math`

use reciprocals instead of divisors: a/b = · (1/b).
due limitedness of fp precision, equal sign not there.
multiplication faster division, see fog's tables.
see why-is-freciprocal-math-unsafe-in-gcc?.

live example on godbolt:

float f(float a){     return a/3; }  ;with -ofast .lcpi0_0:         .long   1051372203              # float 0.333333343 f(float):                                  # @f(float)         mulss   xmm0, dword ptr [rip + .lcpi0_0]         ret  ;with -o3 .lcpi0_0:   .long 1077936128 # float 3 f(float): # @f(float)   divss xmm0, dword ptr [rip + .lcpi0_0]   ret

`-ffp-contract=fast`

enable contraction of fp expression.
contraction umbrella term law can apply in field ℝ results in simplified expression.
example, * k / k = a.

however, fp numbers set equipped + , · not field in general due finite precision.
flag allows compiler contract fp expression @ cost of correctness.

live example on godbolt:

float f(float a){     return a/3*3; }  ;with -ofast  f(float):                                  # @f(float)         ret  ;with -o3 .lcpi0_0:   .long 1077936128 # float 3 f(float): # @f(float)   movss xmm1, dword ptr [rip + .lcpi0_0] # xmm1 = mem[0],zero,zero,zero   divss xmm0, xmm1   mulss xmm0, xmm1   ret

`-menable-unsafe-fp-math`

kind of above in broader sense.

enable optimizations make unsafe assumptions ieee math (e.g. addition associative) or may not work input ranges. these optimizations allow code generator make use of instructions otherwise not usable (such fsin on x86).

see this error precision of fsin instruction.

live example @ godbolt a⁴ exanded (a^2/sup>)2:

float f(float a){     return a*a*a*a; }  f(float):                                  # @f(float)         mulss   xmm0, xmm0         mulss   xmm0, xmm0         ret  f(float): # @f(float)   movaps xmm1, xmm0   mulss xmm1, xmm1   mulss xmm1, xmm0   mulss xmm1, xmm0   movaps xmm0, xmm1   ret

`-menable-no-nans`

assumes code generates no nan values.
in previous answer of mine analysed how icc dealt complex number multiplication assuming no nans.

most of fp instruction deals nans automatically.
there exceptions though, such comparisons, can seen in live @ godbolt

bool f(float a, float b){     return a<b; }  ;with -ofast f(float, float):                                 # @f(float, float)         ucomiss xmm0, xmm1         setb    al         ret  ;with -o3 f(float, float): # @f(float, float)   ucomiss xmm1, xmm0   seta al   ret

note 2 versions not equivalent -o3 1 exluded case a , b unordered while other 1 include in true result.
while performance same in case, in complex expression asymmetry can lead different unfolding/optimisations.

`-menable-no-infs`

just above infinities.

i unable reproduce simple example in godbolt trigonometric functions need deal infinities carefully, complex numbers.

if browse a glibc implementation's math dir (e.g. sinc) you'll see lot of checks should omitted on compilation -ofast.

Search This Blog

LP