Создает ли этот простой код, не содержащий никаких циклов, цикл на ассемблере? ⇐ C++
-
Гость
Создает ли этот простой код, не содержащий никаких циклов, цикл на ассемблере?
I was playing with this code to check if an integer is a power of 4:
// C++ version bool is_pow_4(unsigned a) { return (std::popcount(a) == 1) && (std::countr_zero(a) % 2 == 0); } // C version int is_pow_4(unsigned a) { return (__builtin_popcount(a) == 1) && (__builtin_ctz(a) % 2 == 0); } Basically I check that there is just one bit set and it is on an odd position.
I was expecting a branchless code, however I see two jumps and a rep instruction. From what I recall a rep is basically a loop, but on assembly level. It seems that std::countr_zero/__builtin_ctz generates the rep instruction.
C++ output:
is_pow_4(unsigned int): lea edx, [rdi-1] mov ecx, edi xor eax, eax xor ecx, edx cmp edx, ecx jb .L7 .L1: ret .L7: mov eax, 1 test edi, edi je .L1 xor eax, eax rep bsf eax, edi not eax and eax, 1 ret C output is similar.
I understand that the loop is bound by the width of the integer (32), so I think the complexity of the code is O(1), but I was still surprised to find a loop.
Is my understanding correct? Is this code a loop on x86? Is this because while there is a x86 popcount instruction, there is no count leading/trailing zeros x86 instruction?
Источник: https://stackoverflow.com/questions/781 ... n-assembly
I was playing with this code to check if an integer is a power of 4:
// C++ version bool is_pow_4(unsigned a) { return (std::popcount(a) == 1) && (std::countr_zero(a) % 2 == 0); } // C version int is_pow_4(unsigned a) { return (__builtin_popcount(a) == 1) && (__builtin_ctz(a) % 2 == 0); } Basically I check that there is just one bit set and it is on an odd position.
I was expecting a branchless code, however I see two jumps and a rep instruction. From what I recall a rep is basically a loop, but on assembly level. It seems that std::countr_zero/__builtin_ctz generates the rep instruction.
C++ output:
is_pow_4(unsigned int): lea edx, [rdi-1] mov ecx, edi xor eax, eax xor ecx, edx cmp edx, ecx jb .L7 .L1: ret .L7: mov eax, 1 test edi, edi je .L1 xor eax, eax rep bsf eax, edi not eax and eax, 1 ret C output is similar.
I understand that the loop is bound by the width of the integer (32), so I think the complexity of the code is O(1), but I was still surprised to find a loop.
Is my understanding correct? Is this code a loop on x86? Is this because while there is a x86 popcount instruction, there is no count leading/trailing zeros x86 instruction?
Источник: https://stackoverflow.com/questions/781 ... n-assembly
Мобильная версия