(x % 2), What's all this and-ing and or-ing?
Posted on Thu 24 February 2011 in blog
While trying to understand some x86 disassembly, I came across one particularly confusing bit of code (modified for this example):
mov eax, edx
L0:
and eax, -2147483647 ; 80000001H
L1:
jns SHORT L4
dec eax
L2:
or eax, -2 ; fffffffeH
L3:
inc eax
L4:
mov DWORD PTR _r$[ebp], eax
I couldn't intuitively look at this one and understand what it meant. I could tell that it was primarily looking at only the highest (sign) and the lowest bits. I ended up putting together a table, showing eax at each line (ie. after the result of the previous line)
L0 L1 L2 L3 L4 result
----------------------------------------------------
-ODD 10..01 10..00 11..10 11..11 -1
-EVEN 10..00 01..11 11..11 00..00 0
+ODD 00..01 00..01 1
+EVEN 00..00 00..00 0
Oh, now it's painfully obvious; it's just a modulo-2 (% 2
) operation. But why
was that so complicated? After some searching, it appears that one could just
use IDIV
(Signed Divide) instruction, which places the quotient in (E)AX
and the remainder in (E)DX
. One instruction, and there's your result, what
was so hard about that?
Turns out that IDIV
is really slow. IDIV
on a Pentium processor takes a
whopping 46 clock cycles! Let's compare that to the worst case of this other
funky algorithm:
and (r,i) = 1
jns (short) = 1 (4 if mispredicted)
dec (r) = 1
or (r,i) = 1
inc (r) = 1
Wow, only 5 clock cycles, (8 if the jns
was mispredicted). No wonder they go
through all that hassle!
So next time you see this bizarre x86 assembly, hopefully you can identify it
as just a %2
!
References: - siyobik.info - agner.org