I’m meaning to start a series of posts soon about what I’m researching in my PhD soon, but I got distracted this week thinking about a few algorithms I saw to count the number of bits set to one in a given binary number. I’m developing simulations currently looking at DNA damage and I need to classify the severity of strand breaks induced by ionising radiation based on how many base pairs they hit. To do this, you look at a given section of damaged DNA and identify how many breaks occur within given ten base pair long segments, as sketched below.

I kept track of whether, for a given base pair position, each strand had been damaged by radiation with two numbers represented in binary, each 10 digits long. The idea was that if I had the number 0011000000, I would know that the base pairs 7 and 8 places behind the current base pair had been hit.

A cool part of this is that to count the number of damage events in the ten base pairs preceding whatever base pair is being counted, it’s sufficient to just count the number of 1 bits in the binary number representing the ten previous strands. This is rather easy to do, and a quick Google shows a few ways to do this, a few of them are below:

unsigned int loop_count(unsigned int u) { unsigned int uc = u &amp; 1; do { u = u << 1; uc += u & 1; } while(u); return uc; } unsigned int decrement_count(unsigned int u) { unsigned int uc; for (uc = 0; u; uc++) { u &= u - 1; } return uc; } unsigned int fiddle_count(unsigned int u) { unsigned int uc; uc = u - ((u << 1) & 033333333333) - ((u << 2) & 011111111111); return ((uc + (uc << 3)) & 030707070707) % 63; }

The first function, loop_count is the most naive way to count the number of bits that are one, literally checking one by one if each bit is one. The second method, decrement_count decreases the smallest set bit, so the loop only iterates for as many iterations as there are set bits. The final method, fiddle_count (named for the fiddling of bits therein) astonished me and is the reason I’m writing this right now. It’s a very compact way of counting the active bits whilst avoiding a loop.

What’s going on here is quite cool, in effect, fiddle_count sums all the possible active bits in parallel, by breaking the number up into sections. To briefly illustrate this, let’s just consider the last six bits. Our binary number u is then:

And the bit shifted values in fiddle _count are then (let’s forget that higher bits exist for now).

The last six bits of the octal number 033333333333 are 0b011011, and for the octal number 011111111111, these are 0b001001. So then

So the subtraction on the first line is then

And a spot of rearranging gets you:

Because this is binary, each is just 1 or 0. So every three bits here, we have the sum of the number of bits in that octet – cool!. The next line contains two more tricks. ‘uc >> 3’ shift all of these bits along by 3, With our numbers here, that means that we would find:

And so,

So every six bits, the bottom three contain the sum of the bits that were originally in all six. Unfortunately the top three bits contain unnecessary data now and we need to get rid of them. This is done by using the bitwise and with the octal number 030707070707 (which repeats the binary pattern 000111000111). So if you’ve followed me to here, you’ll believe me when I say that if you extended this up to considering 12 bits, what you’re left with is

Going all the way to 32 bits and writing each group of sums as , one gets the total sum s:

A little further mathematical gymnastics lets you recover the following

And recalling the identities for differences of perfect cubes and squares:

Essentially here, the thing to realise is that for every coefficient in the big sum just above, is a factor. And that’s why the last operation in fiddle count is modulo 63. It gets rid of everything else and just leaves the very tidy sum of all the bits.

Anyway, that’s some neat maths. The next obvious question is which way of counting bits is the fastest. And there are plenty of others who’ve found already checked this, this Stack Overflow thread for example. I was curious about how many lines each implementation took up in assembly code however. For the curious, this turns out to be (compiling with clang using no optimisations, and otool to disassemble the code):

- loop_count: 17 instructions, looping over 9
- decrement_count: 17 instructions, looping over 10
- fiddle_count: 27 instructions, no loop.

In most cases it would seem like fiddle_count is the winner in terms of instructions (though not all instructions were created equal). Intriguingly, when you compile with the -O2 flag, both loops become much faster dropping to 11 instructions, and 4 instructions within each loop.

If you’re still interested in learning more about counting bits, here are a few resources:

- This kind of problem is very much related to the Hamming Weight
- Here are some more hacks for fiddling with bits, including the 64-bit version of fiddle_count
- This is the first mathematical explanation of fiddle_count that I read