Datacenters provide low-cost, efficient and flexible computing resources, so
that users and applications (*e.g.*, Google Search, Pokemon Go) share a pool of
resources, and can expand into that pool as needed. Being such a critical
building block of our digital world, every aspect of datacenters undergoes
constant research and development both by industrial and academic communities.

At the physical level, all datacenters are built upon a network of servers, but
there are many variations on exactly which hardware is used, and the topology
(or graph) of the interconnections. A particularly interesting design is the
*server-centric datacenter network* (SCDCN), for several reasons: it encourages
the use of very cheap, and even legacy, components; many network topologies are
possible; it allows the incorporation of existing, and extensive, research on
high-performance interconnection network topologies; communication within the
datacenter is fully programmable within the servers, making it an excellent
platform for independent and academic research and development. Our work is
precisely on algorithms for such communication, called *routing algorithms*.

Before I go on, let me describe how we model SCDCNs as a graph. An SCDCN with N servers and N/n switches, each with n ports, is represented by a graph on N server-nodes and N/n switch-nodes, connected by links. Each server-node, switch-node, and link represents the corresponding physical component of the topology. I’ll avoid excessive formality here by dropping the “-node” part. For reasons detailed in the paper, there are no switch-to-switch connections.

Many DCN topologies can be defined recursively, where a level k network is built from a number of level k-1 networks that are connected to one another with a number of bridge links. In our research we use the recursive structure and the bridge links to yield efficient routing algorithms for recursively defined networks. We apply them to two well-known SCDCNs that were proposed by Microsoft Research, DCell and FiConn,

We show, via an extensive experimental evaluation using our flow-level simulator, that our newly-proposed routing algorithms can yield significant improvements in terms of hop-length, fault-tolerance, load-balance, network-throughput, end-to-end latency, and workload completion time when we compare our algorithms with the existing routing algorithms for (Generalized) DCell and FiConn.

DCell, FiConn, and many other recursively-defined networks ship with a natural
*Dimensional* routing algorithm, which is described for DCell below. Given a
source server *src* and a destination server *dst*:

- Identify the smallest
*m*so that a level*m*DCell contains both*src*and*dst*, - Compute the bridge-link
*(dst’,src’)*between the level*(m-1)*DCell containing*src*and the level*(m-1)*DCell containing*dst*, - Recursively compute paths from
*src*to*dst’*and from*src’*to*dst*; and, - Piece together the recursively computed paths with the bridge-link.

It may be the case, however, that an alternative, *proxy*, level *(m-1)* DCell
can be used such that a shorter path can be constructed from two bridge links
and three recursively computed paths. An algorithm that searches for the proxy
substructure is called a *Proxy* routing algorithm. This is precisely the
situation depicted the the Figure below, for a level 2 FiConn that uses switches
with 4 ports, where the proxy route is 2 server-server hops shorter than the
dimensional route.

In our paper we identify and evaluate efficient ways of searching for and
identifying proxy substructures which yield short paths. It’s a bit technical,
and the details are in the paper, but the basic idea is depicted schematically
in the figure below. We restrict the length of at least one of the recursively
computed paths by looking at the bridge links incident to nodes near *src* or
*dst*, and considering only the proxies that those bridge links are connected
to. We compute the length of the proxy path through each such proxy, as well as
link-states (*e.g.*, traffic congestion) along the path, and then output the
best path.

The details are in our paper:

A. Erickson, J. Pascual Saiz, J. Navaridas, and I. A. Stewart.

**Routing Algorithms for Recursively-Defined Datacenter Networks**.
Submitted to a journal. Previous version in Proc. of
Trustcom/BigDataSE/ISPA, 2015 IEEE, 3, 84–91, 2015.

In 2012-2013 I travelled to several remote First Nations communities around British Columbia to do hands-on mathematical activities with elementary school kids and teachers. Math Catcher, which provided the free math presentations, has also born several mathematical stories about a young boy, named Small Number, that have been translated into Aboriginal languages from across Canada. I have embedded all of Small Number’s stories and translations here (updated 31 August 2016) for your ears to enjoy the sound of these languages.

**Click here to support this program directly by giving to Simon Fraser University.**

### Small Number Counts to 100

Written by Veselin Jungic & Mark MacLean Illustrated by Simon Roy.

### Small Number and the Old Canoe

Written by Veselin Jungic & Mark MacLean Illustrated by Simon Roy

### Small Number and the Basketball Tournament

Written by Veselin Jungic, SFU, and Mark MacLean, UBC Illustrated by Jean-Paul Csuka, Canada

### Small Number and the Big Tree

Written by Veselin Jungic and Mark MacLean Illustrated by Simon Roy and Jess Pollard

### Small Number and the Skateboard Park

Written by Veselin Jungic & Mark MacLean Illustrated by Simon Roy

### Small Number and the Salmon Harvest

Written by Veselin Jungic & Mark MacLean Illustrated by Simon Roy & Jess Pollard

### Small Number and the Old Totem Pole

Written by Veselin Jungic and Mark MacLean Illustrated by Bethani L’Heureux

I am unfamiliar with these (and all) Aboriginal languages, so please let me know if I have made any errors in this blog post, and especially if the special characters are not rendering correctly.

“Cracking the Coding Interview 6e” question 4.10: T1 and T2 are binary trees with m and n nodes, respectively. Test whether T1 contains T2 as a subtree. Here is my solution, which performs much better than the book’s solution.

The problem asks, more specifically, if there is a node in T1 such that the
subtree rooted there has exactly the same structure as T2, as well as the same
values at each node. **Spoilers follow the image below**, so if you are looking
for hints, proceed carefully.

CCtI6e discusses two solutions, the first one makes the observation that a tree
is uniquely identified by its pre-order traversal if you include special
identifiers for null child-nodes, so the tree containment problem is reduced to
a substring problem: does `toString(T1)`

contain `toString(T2)`

? CCTI6e blurs
the solution by using a library call to solve the substring problem
(it is not trivial to do it efficiently),
but they do claim that this solution runs in O(m+n) time and O(m+n) space. It’s
a valid answer, but I want something better.

The second solution tries to improve on the O(m+n) space of the first solution. The naïve way would be to do a depth-first search comparison of T2 and the subtree rooted at each node of T1, resulting in an O(log(m) + log(n)) space algorithm that uses O(min(m*log(m),m*n)) time. CCtI6e suggests using the values of the root of the T1-subtree and the root of T2 to decide whether or not to perform the search, but this ultimately makes no difference at all. First, if the values are different, then the search would abort immediately, and second, the nodes of T1 and T2 could have all the same data, say 0. Below is a drawing of a bad case, where most of T2 occurs at many nodes of T1, but T1 does not contain T2.

I was surprised to see such naïve solution to this problem, as CCtI6e has been very good so far. The recursive solution can be improved upon considerably by using a better heuristic for whether or not to search/compare T2 with the subtree at a given node of T1.

### Hash the structure and data of a binary tree

We want to hash each node so that if the hashes of two trees match, there is a high probability that their subtrees match. As a proof of concept, I adapted Knuth’s multiplicative method to work on binary trees with integer values. The idea is to re-hash the hash values of the left and right children, as well as the height and value of the current node, using distinct 32-bit primes for each part, and then add the four multiplicative hash values. I’ve read that hashing is part guess-work and part theory, and there you have it.

```
uint32_t hash(uint64_t left,
uint64_t right,
uint64_t value,
uint64_t height){
return ((left&(0xFFFFFFFF))*2654435761 +
(right&(0xFFFFFFFF))*2654435741 +
(value&(0xFFFFFFFF))*2654435723 +
(height&(0xFFFFFFFF))*2654435713);
}
uint32_t dfs_hash(node *t){
if(!t) return 0;
return hash(dfs_hash(t->left),dfs_hash(t->right),t->value,t->height);
}
```

My hash function seems to work quite well for randomly generated trees, but it is not thoroughly tested or anything. I’d like to eventually adapt a more sophisticated function, perhaps from Eternally Confuzzled.

### Using the tree-hashes

My algorithm works as follows:

- pre-compute the hash of T2.
- Do a depth-first search of T1 where at each node:
- if one of child nodes of the current node contains T2, return True. Otherwise, record the hashes of the child nodes.
- hash the current node in O(1) time
- if the hash of the current node matches the hash of T2:
- do a depth-first search that compares T2 to the subtree rooted at the current node
- if T2 is identical to the subtree rooted at the current node, then return True

- return the hash of the current node.

Note that I do not cache hash values. The idea in this exercise is not to use unnecessary space, beyond the O(log(m)+log(n)) space necessarily used by depth-first searches. Caching hash values would not only use O(n+m) additional space, but keeping the hash values up to date in a non-static binary tree requires modifying the binary tree implementation.

The above pseudo-code is implemented below, and the complete source-code is at the end of this blog post.

```
typedef struct node{
int value,height;
struct node *left;
struct node *right;
}node;
node *rec_contains_subtree(node *t1, node *t2, uint32_t *t1ChildHash, uint32_t t2hash){
uint32_t leftChildHash,rightChildHash;
node *res;
*t1ChildHash=0;
if(!t1)
return NULL;
if((res=rec_contains_subtree(t1->left,t2,&leftChildHash, t2hash))||
(res=rec_contains_subtree(t1->right,t2,&rightChildHash,t2hash)))
return res;
count_nodes_hashed++;
*t1ChildHash=hash(leftChildHash,rightChildHash,t1->value,t1->height);
if(*t1ChildHash==t2hash){
count_dfs_compare++;
if(dfs_compare_trees(t1,t2))
return t1;
}
return NULL;
}
node *contains_subtree(node *t1, node *t2){
//hash t2
uint32_t t2hash = dfs_hash(t2);
uint32_t t1ChildHash;
if(!t2)
exit(-1);
return rec_contains_subtree(t1,t2,&t1ChildHash,t2hash);
}
```

Below is the other depth-first search, that compares T2 with a subtree of T1. Notice that the CCtI6e solution is implicit in the last conditional.

```
int dfs_compare_trees(node *t1, node *t2){
if(!t1 && !t2)
return 1;
if(!t1 && t2)
return 0;
if(t1 && !t2)
return 0;
if(t1->value!= t2->value)
return 0;
return dfs_compare_trees(t1->left,t2->left) &&
dfs_compare_trees(t1->right,t2->right);
}
```

### Testing on some large inputs

I did some tests on randomly generated trees. This is by no means a robust test, especially as regards hash collisions. I’ll paste some output and then explain it.

Below, I insert 10,000,000 nodes into T1 and 100 nodes into T1, by descending
the trees on randomly selected branches and inserting at the first empty node.
All data values are 0s. Since the result is that T1 doesn’t contain T2, then all
10,000,000 nodes of T1 are searched, but notice that the hashing saves me from
doing *any* node-by-node comparisons with T2.

```
1e-05 seconds to allocate trees
10.3298 seconds to create t1
2e-05 seconds to create t2
t1 has 10000000 nodes
t2 has 100 nodes
0.928297 seconds to check if t1 contains t2
t1 does not contain t2
Did dfs_compare_trees 0 times
Hashed 10000000 nodes of t1
```

With the same tree T1 as above, I regenerated T2 100 times and repeated the same
test as above. Again, the node-by-node comparison, `dfs compares`

, was done 0
times (per search).

```
Test containment in t1 of randomly generated trees t2
Nodes: t1 10000000 t2 100; Trials: 100
All nodes hold value=0
Average per search:
time 0.944213;
dfs compares 0;
prop. nodes hashed 1;
containment 0
```

I repeated the previous experiment, below, with smaller trees T2, on 20 nodes. Now we
see that 6/100 of these T2s are contained in T1, but `dfs_compares`

is done
10 times more often than we find something. It shows that the hashing is giving
quite a few false positives for small T2! More testing needed.

```
Test containment in t1 of randomly generated trees t2
Nodes: t1 10000000 t2 20; Trials: 100
All nodes hold value=0
Average per search:
time 0.96471;
dfs compares 0.58;
prop. nodes hashed 0.964319;
containment 0.06
```

In order to test containment of larger trees T2, I simply used subtrees of T1,
selected (pseudo) uniformly at random. Since my search does not consider the
values of the references to the nodes, it is unaware that T1 and T2 share
locations in memory. Below we see that very few nodes of T1 are hashed, and that
a hash collision is experienced in about 12% of the searches, since
`1.12=dfs_compares/containment`

.

```
Test containment in t1 of randomly selected subtrees of t1
Nodes: t1 10000000 t2 (unknown); Trials: 100
All nodes hold value=0
Average per search:
time 0.0682518;
dfs compares 1.12;
prop. nodes hashed 0.0711984;
containment 1
```

The running time is difficult to compute (and maybe impossible to prove), because of the way it depends on the hash function. Informally, we can state an expected time of O(m+(1+c)n), where c is the number of hash collisions. If c is small enough, this becomes O(m). This is much better than CCtI6e’s time of O(m+kn), where k is the number of nodes of T1 that have the same value as the root of T2, since, in our 0-value trees, that could easily amount to O(m*log(m)); the cost of doing a full depth-first search of T1 at every node.

The code, which is a bit rough and quick, is available as a gist:

I saw a nice question on leetcode today, apparently asked recently at a Google on-site interview.

**Given N rectangles with integer coordinates, check whether they are an exact
cover of a rectangular region**

If you don’t know already, Google’s on-site interview process involves 5 independent interviews in one day. In each one a Google Software Engineer poses a problem that should be solved in 45 min to 1 hr. The interviewee can code on a white board or in Google Docs (check with your recruiter for details) in a language of their choice, but not pseudo-code.

By itself, the question is not all that intimidating, but the time-limit can make it quite challenging. That’s why I’m practising!

As usual, there is a fairly straightforward “brute force” solution, and some better solutions. I’ll give hints and solutions immediately after the jump so avoid spoilers by not looking past the example inputs yet.

### Hint 1

Try to find a brute-force solution by looking at the total area of the rectangles.

### Hint 2

After computing the area, can you avoid comparing every pair of rectangles?

### Hint 3

The problem can be solved in O(N log N), both with and without computing areas.

### Hint 4

What needs to occur at every left-hand segment of an input rectangle that is interior to the smallest rectangle that bounds the input rectangles? At every right-hand segment? At the vertical segments not interior?

### Solutions from area

One solution is to compute the area of the smallest bounding rectangle, and the combined areas of the N input rectangles. The input data is unsorted, so you need to do an O(N) scan of the input to find the minimum and maximum points, both vertically and horizontally (up to 4 points). If the areas differ, output False, and if they are the same, you need to check that no pair of input rectangles intersect. To see why, look at the input below.

Testing whether a pair of rectangles intersect is a simple, O(1) calculation (though I would understand if you failed to get it onto the whiteboard in 5 minutes under pressure), so we know we can finish in O(N^2) time by comparing (N choose 2) pairs of input rectangles.

Some light reading suggests that the intersections can be tested in O(N log N) time in some cases, but the solutions seem far from simple. I didn’t look at this very deeply, but for further reading:

R-trees on StackOverflow

R-trees on Wikipedia

Another answer on StackOverflow

The Princeton lecture cited in the above StackOverflow answer

I’ll now present a solution that does not require any 2D geometric tests.

### My Solution

Imagine, first, that the (smallest) rectangle bounding the inputs is a cylinder, so that its leftmost edge coincides with its rightmost edge. Now we can state the ‘algorithm’ in one sentence:

If **(P) every point interior to a right-hand edge of an input rectangle is unique,
and it coincides with a point of a left-hand edge of an input rectangle, and
vise versa**, then we

**(Q) output True**, and otherwise we output False.

Take a moment to let that sink in. We are claiming that P if and only if Q. Let us make clear here that when we say rectangles intersect, we mean that their interiors intersect. The explanation is a bit dense, but hopefully you’ll come to the same conclusions by making some drawings.

If P is false, then one of two things can happen. If points interior to the
right-hand edges of two distinct rectangles coincide, then the rectangles
intersect. Otherwise, without loss of generality, we may assume there is there
is some point `p`

on a left-hand edge of an input rectangle, call it `A`

, that
does not coincide with a point on a right-hand edge of an input rectangle. Thus,
`p`

is interior to a small open disc that both contains points interior to `A`

and points exterior to `A`

. The same disc is either entirely inside a rectangle
distinct from `A`

, which will necessarily intersect with `A`

, or it contains
points exterior to all rectangles distinct from `A`

, and thus contains a hole in
the bounding cylinder. Thus we output False, so Q is false.

If Q is false, on the other hand, then either two distinct rectangles intersect, or a point in the bounding cylinder falls outside of all input rectangles. If rectangles intersect and they have intersecting vertical segments, then P is false, so we may assume that is not the case. But then, the vertical segment of one rectangle must be interior to another rectangle, which again falsifies P. Finally, if a point falls outside of all rectangles, then there are vertical segments to its left and right that falsify P.

Thus, we have shown that P is a necessary and sufficient condition for the N
rectangles to be an exact cover of a rectangle, *q.e.d*.

Now we need to code the solution. The short of it is, we sort a copy of the input rectangles by left-edge x-coordinates, and we sort another copy by right-edge x-coordinates. The first left edge and last right edge are special cases, but are easily dealt with. We then process the vertical segments in the sorted lists, at each x-coordinate, and simply compare the lists.

The bottleneck in time-complexity is in sorting the inputs, plus we need O(N) space to make left-edge-sorted and right-edge-sorted copies of the inputs.

I am assuming that an input rectangle is given as a bottom-left point and a
top-right point; for example, a unit square is `[[1,1],[2,2]]`

.

A final word of warning before you delve into my Python code. I am primarily a C-programmer leveraging some features of Python, so my code is not very “Pythonic”. Suggestions, improvements, and more test cases welcome.

```
def isBigRect(rectangles):
if rectangles==[]:
return True
L=processBoundaries(rectangles,leftOrRight='left')
R=processBoundaries(rectangles,leftOrRight='right')
if L==False or R==False or not len(L)==len(R):
return False
L[-1][0]=R[-1][0]
R[0][0]=L[0][0]
if not L==R:
return False
else:
return True
def processBoundaries(B,leftOrRight='left'):
x0orx1=0
if leftOrRight=='right':
x0orx1=1
res=[[]]
elif leftOrRight=='left':
x0orx1=0
res=[]
else:
print 'process boundaries input error'
return False
B=[ [x[x0orx1][0],[x[0][1],x[1][1]]] for x in B]
B.sort(cmp=lambda x,y: x[0]-y[0])
i=0
while i<len(B):
x=B[i][0]
res.append( [x,[]] )
while i<len(B) and B[i][0]==x:
res[-1][1].append(B[i][1])
i=i+1
res[-1][1]=mergeSpecialIntervals(res[-1][1])
if res[-1][1]==False:
return False
if leftOrRight=='right':
res[0]=['first',res[-1][1]]
else:
res.append(['last',res[0][1]])
return res
def mergeSpecialIntervals(L):
if L==[]:
return False
if len(L)==1:
return L
L.sort(cmp=lambda x,y: x[0]-y[0])
res=[L[0]]
for i in range(len(L)-1):
if L[i][1]>L[i+1][0]:
return False
elif L[i][1]==L[i+1][0]:
res[-1][1]=L[i+1][1]
i=i+2
else:
res.append(L[i])
i=i+1
if i == len(L)-1:
res.append(L[i])
return res
A=[[[[0,0],[4,1]],
[[7,0],[8,2]],
[[6,2],[8,3]],
[[5,1],[6,3]],
[[4,0],[5,1]],
[[6,0],[7,2]],
[[4,2],[5,3]],
[[2,1],[4,3]],
[[0,1],[2,2]],
[[0,2],[2,3]],
[[4,1],[5,2]],
[[5,0],[6,1]]],
#shuffled the above a little
[[[0,0],[4,1]],
[[7,0],[8,2]],
[[5,1],[6,3]],
[[6,0],[7,2]],
[[4,0],[5,1]],
[[4,2],[5,3]],
[[2,1],[4,3]],
[[0,2],[2,3]],
[[0,1],[2,2]],
[[6,2],[8,3]],
[[5,0],[6,1]],
[[4,1],[5,2]]],
[[[0,0],[4,1]]],
[],
#vertical stack
[[[0,0],[1,1]],
[[0,1],[1,2]],
[[0,2],[1,3]],
[[0,3],[1,4]],],
#horizontal stack
[[[0,0],[1,1]],
[[1,0],[2,1]],
[[2,0],[3,1]],
[[3,0],[4,1]],],
]
print 'should be True'
for a in A:
print isBigRect(a)
B=[
#missing a square
[[[0,0],[4,1]],
[[7,0],[8,2]],
[[6,2],[8,3]],
[[5,1],[6,3]],
[[6,0],[7,2]],
[[4,2],[5,3]],
[[2,1],[4,3]],
[[0,1],[2,2]],
[[0,2],[2,3]],
[[4,1],[5,2]],
[[5,0],[6,1]]],
#exceeds top boundary
[[[0,0],[4,1]],
[[7,0],[8,2]],
[[5,1],[6,4]],
[[6,0],[7,2]],
[[4,0],[5,1]],
[[4,2],[5,3]],
[[2,1],[4,3]],
[[0,2],[2,3]],
[[0,1],[2,2]],
[[6,2],[8,3]],
[[5,0],[6,1]],
[[4,1],[5,2]]],
#two overlapping rectangles
[[[0,0],[4,1]],
[[0,0],[4,1]]],
#exceeds right boundary
[[[0,0],[4,1]],
[[7,0],[8,3]],
[[5,1],[6,3]],
[[6,0],[7,2]],
[[4,0],[5,1]],
[[4,2],[5,3]],
[[2,1],[4,3]],
[[0,2],[2,3]],
[[0,1],[2,2]],
[[6,2],[8,3]],
[[5,0],[6,1]],
[[4,1],[5,2]]],
# has an intersecting pair
[[[0,0],[5,1]],
[[7,0],[8,2]],
[[5,1],[6,3]],
[[6,0],[7,2]],
[[4,0],[5,1]],
[[4,2],[5,3]],
[[2,1],[4,3]],
[[0,2],[2,3]],
[[0,1],[2,2]],
[[6,2],[8,3]],
[[5,0],[6,1]],
[[4,1],[5,2]]],
#skips column 4
[[[0,0],[4,1]],
[[7,0],[8,2]],
[[5,1],[6,3]],
[[6,0],[7,2]],
[[2,1],[4,3]],
[[0,2],[2,3]],
[[0,1],[2,2]],
[[6,2],[8,3]],
[[5,0],[6,1]],],
#exceed left boundary
[[[0,0],[4,1]],
[[7,0],[8,2]],
[[5,1],[6,3]],
[[6,0],[7,2]],
[[4,0],[5,1]],
[[4,2],[5,3]],
[[2,1],[4,3]],
[[-1,2],[2,3]],
[[0,1],[2,2]],
[[6,2],[8,3]],
[[5,0],[6,1]],
[[4,1],[5,2]]],
#horizontal stack with overlapping left boundaries at x=1
[[[0,0],[1,1]],
[[1,0],[2,1]],
[[1,0],[3,1]],
[[3,0],[4,1]],],
]
print 'should be False'
for b in B:
print isBigRect(b)
```

### Bonus: what if the input rectangles can be moved?

If you are a mathematician you might be inclined to view a rectangle as simply a width and height, with no fixed location in the plane. This is exactly what Tony Huynh, self-described Mathematical Vagabond (and matroid theorist by trade), pointed out to me, so I asked him if the alternate interpretation of the problem is NP-hard?

The new problem:

**Given N rectangles with integer dimensions, check whether they can be
translated and rotated to exactly cover a rectangular region.**

Indeed the answer is yes, and Tony came back to me a few minutes later with this very nice reduction:

Consider the following input of rectangles, where a rectangle is given by a pair (height, width): (1,a1), (1,a2), …, (1,aN) and (S,S) where S is half the sum of all the ai. If you can solve the rectangle partitioning problem for this instance, then you can find a subset of a1, …, aN whose sum is exactly half the total sum (which is NP-hard).

Suppose you have two unsigned 16-bit integers,

`X=0000 0000 0111 1111 1000 1010 0010 0100`

and

`Y=0000 0000 0000 0000 0011 0000 1010 1111`

.

How can you tell if `Y`

is more than half as long as `X`

? I came across this
because I wanted some code to branch on whether or not `Y`

was a lot bigger or
smaller than the square root of a large positive integer `X`

(behaviour when `Y`

was around the square root was arbitrary). The square root of `X`

has half as
many bits as `X`

does, and I figured that I could get the lengths of these
numbers super fast to make a fairly coarse comparison.

I was right about that, but the efficiency of the answer depends on the CPU the code is running on. Even so, the portable answers should be quite fast, and they are interesting, so let’s look at them.

As usual, I did some research on the internet for this, and the bit twiddling I did is from Stackoverflow and Haacker’s Delight (it’s a book, of course, but I linked to some snippets on their website).

I wrote some code that explains the steps of a smearing function and a
population count. In fact, that’s the spoiler: I will smear the bits of `X`

to
the right, and then count the population of 1-bits.

```
//smear
x = x | x>>1;
x = x | x>>2;
x = x | x>>4;
x = x | x>>8;
x = x | x>>16;
//population count
x = ( x & 0x55555555 ) + ( (x >> 1 ) & 0x55555555 );
x = ( x & 0x33333333 ) + ( (x >> 2 ) & 0x33333333 );
x = ( x & 0x0F0F0F0F ) + ( (x >> 4 ) & 0x0F0F0F0F );
x = ( x & 0x00FF00FF ) + ( (x >> 8 ) & 0x00FF00FF );
x = ( x & 0x0000FFFF ) + ( (x >> 16) & 0x0000FFFF );
```

The idea behind the population count is to divide and conquer. For example with
`01 11`

, I first count the 1-bits in `01`

: there is 1 1-bit on the right, and
there are 0 1-bits on the left, so I record that as `01`

(in place). Similarly,
`11`

becomes `10`

, so the updated bit-string is `01 10`

, and now I will add the
numbers in buckets of size 2, and replace the pair of them with the result;
1+2=3, so the bit string becomes `0011`

, and we are done. The original
bit-string is replaced with the population count.

There are faster ways to do the pop count given in Hacker’s Delight, but this one is easier to explain, and seems to be the basis for most of the others. You can get my code as a Gist here..

`X=0000 0000 0111 1111 1000 1010 0010 0100`

Set every bit that is 1 place to the right of a 1

`0000 0000 0111 1111 1100 1111 0011 0110`

Set every bit that is 2 places to the right of a 1

`0000 0000 0111 1111 1111 1111 1111 1111`

Set every bit that is 4 places to the right of a 1

`0000 0000 0111 1111 1111 1111 1111 1111`

Set every bit that is 8 places to the right of a 1

`0000 0000 0111 1111 1111 1111 1111 1111`

Set every bit that is 16 places to the right of a 1

`0000 0000 0111 1111 1111 1111 1111 1111`

Accumulate pop counts of bit buckets size 2

`0000 0000 0110 1010 1010 1010 1010 1010`

Accumulate pop counts of bit buckets size 4

`0000 0000 0011 0100 0100 0100 0100 0100`

Accumulate pop counts of bit buckets size 8

`0000 0000 0000 0111 0000 1000 0000 1000`

Accumulate pop counts of bit buckets size 16

`0000 0000 0000 0111 0000 0000 0001 0000`

Accumulate pop counts of bit buckets size 32

`0000 0000 0000 0000 0000 0000 0001 0111`

The length of 8358436 is 23 bits

`Y=0000 0000 0000 0000 0011 0000 1010 1111`

Set every bit that is 1 place to the right of a 1

`0000 0000 0000 0000 0011 1000 1111 1111`

Set every bit that is 2 places to the right of a 1

`0000 0000 0000 0000 0011 1110 1111 1111`

Set every bit that is 4 places to the right of a 1

`0000 0000 0000 0000 0011 1111 1111 1111`

Set every bit that is 8 places to the right of a 1

`0000 0000 0000 0000 0011 1111 1111 1111`

Set every bit that is 16 places to the right of a 1

`0000 0000 0000 0000 0011 1111 1111 1111`

Accumulate pop counts of bit buckets size 2

`0000 0000 0000 0000 0010 1010 1010 1010`

Accumulate pop counts of bit buckets size 4

`0000 0000 0000 0000 0010 0100 0100 0100`

Accumulate pop counts of bit buckets size 8

`0000 0000 0000 0000 0000 0110 0000 1000`

Accumulate pop counts of bit buckets size 16

`0000 0000 0000 0000 0000 0000 0000 1110`

Accumulate pop counts of bit buckets size 32

`0000 0000 0000 0000 0000 0000 0000 1110`

The length of 12463 is 14 bits

So now I know that 12463 is significantly larger than the square root of 8358436, without taking square roots, or casting to floats, or dividing or multiplying.

It needs to be said that there are built-in functions that get the number of
leading or trailing 0s. They are `__builtin_clz()`

and `__builtin_ctz()`

(*i.e.*, clz = count leading zeros), and I read that most modern CPUs will do
this computation in 1 instruction. So, alternatively, the length of `uint32_t x`

is

```
32-__builtin_clz(x);
```