Alejandro Erickson
Postdoctoral Research Associate with Iain Stewart at Durham University
flickr/ geoburst/ youtube/ instructables/ facebook/ linkedin/

Datacenters provide low-cost, efficient and flexible computing resources, so that users and applications (e.g., Google Search, Pokemon Go) share a pool of resources, and can expand into that pool as needed. Being such a critical building block of our digital world, every aspect of datacenters undergoes constant research and development both by industrial and academic communities.

At the physical level, all datacenters are built upon a network of servers, but there are many variations on exactly which hardware is used, and the topology (or graph) of the interconnections. A particularly interesting design is the server-centric datacenter network (SCDCN), for several reasons: it encourages the use of very cheap, and even legacy, components; many network topologies are possible; it allows the incorporation of existing, and extensive, research on high-performance interconnection network topologies; communication within the datacenter is fully programmable within the servers, making it an excellent platform for independent and academic research and development. Our work is precisely on algorithms for such communication, called routing algorithms.

Before I go on, let me describe how we model SCDCNs as a graph. An SCDCN with N servers and N/n switches, each with n ports, is represented by a graph on N server-nodes and N/n switch-nodes, connected by links. Each server-node, switch-node, and link represents the corresponding physical component of the topology. I’ll avoid excessive formality here by dropping the “-node” part. For reasons detailed in the paper, there are no switch-to-switch connections.

Many DCN topologies can be defined recursively, where a level k network is built from a number of level k-1 networks that are connected to one another with a number of bridge links. In our research we use the recursive structure and the bridge links to yield efficient routing algorithms for recursively defined networks. We apply them to two well-known SCDCNs that were proposed by Microsoft Research, DCell and FiConn,

We show, via an extensive experimental evaluation using our flow-level simulator, that our newly-proposed routing algorithms can yield significant improvements in terms of hop-length, fault-tolerance, load-balance, network-throughput, end-to-end latency, and workload completion time when we compare our algorithms with the existing routing algorithms for (Generalized) DCell and FiConn.

DCell, FiConn, and many other recursively-defined networks ship with a natural Dimensional routing algorithm, which is described for DCell below. Given a source server src and a destination server dst:

  • Identify the smallest m so that a level m DCell contains both src and dst,
  • Compute the bridge-link (dst’,src’) between the level (m-1) DCell containing src and the level (m-1) DCell containing dst,
  • Recursively compute paths from src to dst’ and from src’ to dst; and,
  • Piece together the recursively computed paths with the bridge-link.

It may be the case, however, that an alternative, proxy, level (m-1) DCell can be used such that a shorter path can be constructed from two bridge links and three recursively computed paths. An algorithm that searches for the proxy substructure is called a Proxy routing algorithm. This is precisely the situation depicted the the Figure below, for a level 2 FiConn that uses switches with 4 ports, where the proxy route is 2 server-server hops shorter than the dimensional route.

Proxy Routing in FiConn

Dimensional route (dashed black-grey), with 7 hops, and proxy route (thick black), with 5 hops, highlighted on a FiConn(2,4).

In our paper we identify and evaluate efficient ways of searching for and identifying proxy substructures which yield short paths. It’s a bit technical, and the details are in the paper, but the basic idea is depicted schematically in the figure below. We restrict the length of at least one of the recursively computed paths by looking at the bridge links incident to nodes near src or dst, and considering only the proxies that those bridge links are connected to. We compute the length of the proxy path through each such proxy, as well as link-states (e.g., traffic congestion) along the path, and then output the best path.

Intelligent search for proxies

Strategy for reducing size of proxy search: exploit the structure of the network to choose proxy substructures X_{k-1}^c that are inherently close to src or dst. Substructures are depicted as rounded rectangles, where GP_I and GP_0 are two possible schemes.

The details are in our paper:

A. Erickson, J. Pascual Saiz, J. Navaridas, and I. A. Stewart.
Routing Algorithms for Recursively-Defined Datacenter Networks. Submitted to a journal. Previous version in Proc. of Trustcom/BigDataSE/ISPA, 2015 IEEE, 3, 84–91, 2015.


In 2012-2013 I travelled to several remote First Nations communities around British Columbia to do hands-on mathematical activities with elementary school kids and teachers. Math Catcher, which provided the free math presentations, has also born several mathematical stories about a young boy, named Small Number, that have been translated into Aboriginal languages from across Canada. I have embedded all of Small Number’s stories and translations here (updated 22 August 2016) for your ears to enjoy the sound of these languages.

Small Number Counts to 100

Written by Veselin Jungic & Mark MacLean Illustrated by Simon Roy.

Original English.

Apsch Akichikiwan akichikeet isko mitatomitano. Cree Translation by Barry Cardinal.

Poksskaaksini Kiipipo Ito’tokstakiiwa. Blackfoot Translation by Connie Crop Eared Wolf and Eldon Yellowhorn.

Small Number and the Old Canoe

Written by Veselin Jungic & Mark MacLean Illustrated by Simon Roy

Original English.

Small Number and the Old Canoe: Squamish. Squamish Translation by T'naxwtn, Peter Jacobs of the Squamish Nation.

Skwi’ikwexam Qes te Wet’ot’e Slexwelh. Halq'eméylem Translation by Siyamiyateliyot, Kwelaxtelot, and Kwosel, Seabird Island First Nation.

šɛ nuxʷɛɬ hɛga Mɛnaθey. Sliammon Translation by Mabel Harry, Karen Galligos, and Oshelle from the Sliammon Nation.

Gadim G̱an g̱anhl W̓ii Mm̓aal. Nisga'a Translation by Hlguwilksihlgum Maaksgum Hlbin (Emma Nyce), Ksim Git Wil Aksnakw (Edna Nyce-Tait), and Wilp Sim’oogit Hleeḵ (Allison Nyce).

Háusayáyu'u du glwa. Heiltsuk Translation by Constance Tallio and Evelyn Windsor.

ʔanaḥʔis Huksyuu ʔuḥʔiš qiickʷiiʔaƛʔi č̓apac. Huu-ay-aht Translation by Benson Nookemis from the Nuu-Chah-Nulth Nation.

Small Number and the Old Canoe. Hul'q'umi'num' Translation by Ruby Peter (Sti’tum’at).

Small Number and the Basketball Tournament

Written by Veselin Jungic, SFU, and Mark MacLean, UBC Illustrated by Jean-Paul Csuka, Canada

Original English.

APSCH AKICHIKIWAN IKWA KAMISIKTIT KOSKEWNTOWAN METAWEWINA. Cree Translation by Barry Cardinal from Bigstone Cree Nation.

Poksskaaksini Itsinssiwa Ki’tsa’komootsiisini. Blackfoot Translation by Connie Crop Eared Wolf and Eldon Yellowhorn.

čišɛtawɬ kʷ ƛiqɛtəs ta p̓əʔač - Mɛnaθey. Sliammon Translation by Mabel Harry, Karen Galligos, and Oshelle from the Sliammon Nation.

I am unfamiliar with these (and all) Aboriginal languages, so please let me know if I have made any errors in this blog post, and especially if the special characters are not rendering correctly.


“Cracking the Coding Interview 6e” question 4.10: T1 and T2 are binary trees with m and n nodes, respectively. Test whether T1 contains T2 as a subtree. Here is my solution, which performs much better than the book’s solution.

The problem asks, more specifically, if there is a node in T1 such that the subtree rooted there has exactly the same structure as T2, as well as the same values at each node. Spoilers follow the image below, so if you are looking for hints, proceed carefully.

Test subtree containment

Test subtree containment

CCtI6e discusses two solutions, the first one makes the observation that a tree is uniquely identified by its pre-order traversal if you include special identifiers for null child-nodes, so the tree containment problem is reduced to a substring problem: does toString(T1) contain toString(T2)? CCTI6e blurs the solution by using a library call to solve the substring problem (it is not trivial to do it efficiently), but they do claim that this solution runs in O(m+n) time and O(m+n) space. It’s a valid answer, but I want something better.

The second solution tries to improve on the O(m+n) space of the first solution. The naïve way would be to do a depth-first search comparison of T2 and the subtree rooted at each node of T1, resulting in an O(log(m) + log(n)) space algorithm that uses O(min(m*log(m),m*n)) time. CCtI6e suggests using the values of the root of the T1-subtree and the root of T2 to decide whether or not to perform the search, but this ultimately makes no difference at all. First, if the values are different, then the search would abort immediately, and second, the nodes of T1 and T2 could have all the same data, say 0. Below is a drawing of a bad case, where most of T2 occurs at many nodes of T1, but T1 does not contain T2.

A bad case for recursive subtree testing

A bad case for recursive subtree testing

I was surprised to see such naïve solution to this problem, as CCtI6e has been very good so far. The recursive solution can be improved upon considerably by using a better heuristic for whether or not to search/compare T2 with the subtree at a given node of T1.

Hash the structure and data of a binary tree

We want to hash each node so that if the hashes of two trees match, there is a high probability that their subtrees match. As a proof of concept, I adapted Knuth’s multiplicative method to work on binary trees with integer values. The idea is to re-hash the hash values of the left and right children, as well as the height and value of the current node, using distinct 32-bit primes for each part, and then add the four multiplicative hash values. I’ve read that hashing is part guess-work and part theory, and there you have it.

uint32_t hash(uint64_t left,
              uint64_t right,
              uint64_t value,
              uint64_t height){
  return ((left&(0xFFFFFFFF))*2654435761 +
          (right&(0xFFFFFFFF))*2654435741 +
          (value&(0xFFFFFFFF))*2654435723 +
          (height&(0xFFFFFFFF))*2654435713);
}

uint32_t dfs_hash(node *t){
  if(!t) return 0;
  return hash(dfs_hash(t->left),dfs_hash(t->right),t->value,t->height);
}

My hash function seems to work quite well for randomly generated trees, but it is not thoroughly tested or anything. I’d like to eventually adapt a more sophisticated function, perhaps from Eternally Confuzzled.

Using the tree-hashes

My algorithm works as follows:

  • pre-compute the hash of T2.
  • Do a depth-first search of T1 where at each node:
    • if one of child nodes of the current node contains T2, return True. Otherwise, record the hashes of the child nodes.
    • hash the current node in O(1) time
    • if the hash of the current node matches the hash of T2:
      • do a depth-first search that compares T2 to the subtree rooted at the current node
      • if T2 is identical to the subtree rooted at the current node, then return True
    • return the hash of the current node.

Note that I do not cache hash values. The idea in this exercise is not to use unnecessary space, beyond the O(log(m)+log(n)) space necessarily used by depth-first searches. Caching hash values would not only use O(n+m) additional space, but keeping the hash values up to date in a non-static binary tree requires modifying the binary tree implementation.

The above pseudo-code is implemented below, and the complete source-code is at the end of this blog post.

typedef struct node{
  int value,height;
  struct node *left;
  struct node *right;
}node;

node *rec_contains_subtree(node *t1, node *t2, uint32_t *t1ChildHash, uint32_t t2hash){
  uint32_t leftChildHash,rightChildHash;
  node *res;
  *t1ChildHash=0;
  if(!t1)
    return NULL;
  if((res=rec_contains_subtree(t1->left,t2,&leftChildHash, t2hash))||
     (res=rec_contains_subtree(t1->right,t2,&rightChildHash,t2hash)))
    return res;
  count_nodes_hashed++;
  *t1ChildHash=hash(leftChildHash,rightChildHash,t1->value,t1->height);
  if(*t1ChildHash==t2hash){
    count_dfs_compare++;
    if(dfs_compare_trees(t1,t2))
      return t1;
  }
  return NULL;
}

node *contains_subtree(node *t1, node *t2){
  //hash t2
  uint32_t t2hash = dfs_hash(t2);
  uint32_t t1ChildHash;
  if(!t2)
    exit(-1);
  return rec_contains_subtree(t1,t2,&t1ChildHash,t2hash);
}

Below is the other depth-first search, that compares T2 with a subtree of T1. Notice that the CCtI6e solution is implicit in the last conditional.

int dfs_compare_trees(node *t1, node *t2){
  if(!t1 && !t2)
    return 1;
  if(!t1 && t2)
    return 0;
  if(t1 && !t2)
    return 0;
  if(t1->value!= t2->value)
    return 0;
  return dfs_compare_trees(t1->left,t2->left) &&
    dfs_compare_trees(t1->right,t2->right);
}

Testing on some large inputs

I did some tests on randomly generated trees. This is by no means a robust test, especially as regards hash collisions. I’ll paste some output and then explain it.

Below, I insert 10,000,000 nodes into T1 and 100 nodes into T1, by descending the trees on randomly selected branches and inserting at the first empty node. All data values are 0s. Since the result is that T1 doesn’t contain T2, then all 10,000,000 nodes of T1 are searched, but notice that the hashing saves me from doing any node-by-node comparisons with T2.

1e-05 seconds to allocate trees
10.3298 seconds to create t1
2e-05 seconds to create t2
t1 has 10000000 nodes
t2 has 100 nodes
0.928297 seconds to check if t1 contains t2
t1 does not contain t2
Did dfs_compare_trees 0 times
Hashed 10000000 nodes of t1

With the same tree T1 as above, I regenerated T2 100 times and repeated the same test as above. Again, the node-by-node comparison, dfs compares, was done 0 times (per search).

Test containment in t1 of randomly generated trees t2
Nodes: t1 10000000 t2 100; Trials: 100
All nodes hold value=0
Average per search:
	time 0.944213;
	dfs compares 0;
	prop. nodes hashed 1;
	containment 0

I repeated the previous experiment, below, with smaller trees T2, on 20 nodes. Now we see that 6/100 of these T2s are contained in T1, but dfs_compares is done 10 times more often than we find something. It shows that the hashing is giving quite a few false positives for small T2! More testing needed.

Test containment in t1 of randomly generated trees t2
Nodes: t1 10000000 t2 20; Trials: 100
All nodes hold value=0
Average per search:
	time 0.96471;
	dfs compares 0.58;
	prop. nodes hashed 0.964319;
	containment 0.06

In order to test containment of larger trees T2, I simply used subtrees of T1, selected (pseudo) uniformly at random. Since my search does not consider the values of the references to the nodes, it is unaware that T1 and T2 share locations in memory. Below we see that very few nodes of T1 are hashed, and that a hash collision is experienced in about 12% of the searches, since 1.12=dfs_compares/containment.

Test containment in t1 of randomly selected subtrees of t1
Nodes: t1 10000000 t2 (unknown); Trials: 100
All nodes hold value=0
Average per search:
	time 0.0682518;
	dfs compares 1.12;
	prop. nodes hashed 0.0711984;
	containment 1

The running time is difficult to compute (and maybe impossible to prove), because of the way it depends on the hash function. Informally, we can state an expected time of O(m+(1+c)n), where c is the number of hash collisions. If c is small enough, this becomes O(m). This is much better than CCtI6e’s time of O(m+kn), where k is the number of nodes of T1 that have the same value as the root of T2, since, in our 0-value trees, that could easily amount to O(m*log(m)); the cost of doing a full depth-first search of T1 at every node.

The code, which is a bit rough and quick, is available as a gist:


I saw a nice question on leetcode today, apparently asked recently at a Google on-site interview.

Given N rectangles with integer coordinates, check whether they are an exact cover of a rectangular region

If you don’t know already, Google’s on-site interview process involves 5 independent interviews in one day. In each one a Google Software Engineer poses a problem that should be solved in 45 min to 1 hr. The interviewee can code on a white board or in Google Docs (check with your recruiter for details) in a language of their choice, but not pseudo-code.

By itself, the question is not all that intimidating, but the time-limit can make it quite challenging. That’s why I’m practising!

As usual, there is a fairly straightforward “brute force” solution, and some better solutions. I’ll give hints and solutions immediately after the jump so avoid spoilers by not looking past the example inputs yet.

Sample N rectangle inputs

Example inputs

Hint 1

Try to find a brute-force solution by looking at the total area of the rectangles.

Hint 2

After computing the area, can you avoid comparing every pair of rectangles?

Hint 3

The problem can be solved in O(N log N), both with and without computing areas.

Hint 4

What needs to occur at every left-hand segment of an input rectangle that is interior to the smallest rectangle that bounds the input rectangles? At every right-hand segment? At the vertical segments not interior?

Solutions from area

One solution is to compute the area of the smallest bounding rectangle, and the combined areas of the N input rectangles. The input data is unsorted, so you need to do an O(N) scan of the input to find the minimum and maximum points, both vertically and horizontally (up to 4 points). If the areas differ, output False, and if they are the same, you need to check that no pair of input rectangles intersect. To see why, look at the input below.

Area problem input

Area problem input

Testing whether a pair of rectangles intersect is a simple, O(1) calculation (though I would understand if you failed to get it onto the whiteboard in 5 minutes under pressure), so we know we can finish in O(N^2) time by comparing (N choose 2) pairs of input rectangles.

Some light reading suggests that the intersections can be tested in O(N log N) time in some cases, but the solutions seem far from simple. I didn’t look at this very deeply, but for further reading:

R-trees on StackOverflow
R-trees on Wikipedia
Another answer on StackOverflow
The Princeton lecture cited in the above StackOverflow answer

I’ll now present a solution that does not require any 2D geometric tests.

My Solution

Imagine, first, that the (smallest) rectangle bounding the inputs is a cylinder, so that its leftmost edge coincides with its rightmost edge. Now we can state the ‘algorithm’ in one sentence:

If (P) every point interior to a right-hand edge of an input rectangle is unique, and it coincides with a point of a left-hand edge of an input rectangle, and vise versa, then we (Q) output True, and otherwise we output False.

Take a moment to let that sink in. We are claiming that P if and only if Q. Let us make clear here that when we say rectangles intersect, we mean that their interiors intersect. The explanation is a bit dense, but hopefully you’ll come to the same conclusions by making some drawings.

isBigRect uses the interfaces of vertical segments

isBigRect uses the interfaces of vertical segments

If P is false, then one of two things can happen. If points interior to the right-hand edges of two distinct rectangles coincide, then the rectangles intersect. Otherwise, without loss of generality, we may assume there is there is some point p on a left-hand edge of an input rectangle, call it A, that does not coincide with a point on a right-hand edge of an input rectangle. Thus, p is interior to a small open disc that both contains points interior to A and points exterior to A. The same disc is either entirely inside a rectangle distinct from A, which will necessarily intersect with A, or it contains points exterior to all rectangles distinct from A, and thus contains a hole in the bounding cylinder. Thus we output False, so Q is false.

If Q is false, on the other hand, then either two distinct rectangles intersect, or a point in the bounding cylinder falls outside of all input rectangles. If rectangles intersect and they have intersecting vertical segments, then P is false, so we may assume that is not the case. But then, the vertical segment of one rectangle must be interior to another rectangle, which again falsifies P. Finally, if a point falls outside of all rectangles, then there are vertical segments to its left and right that falsify P.

Thus, we have shown that P is a necessary and sufficient condition for the N rectangles to be an exact cover of a rectangle, q.e.d.

Now we need to code the solution. The short of it is, we sort a copy of the input rectangles by left-edge x-coordinates, and we sort another copy by right-edge x-coordinates. The first left edge and last right edge are special cases, but are easily dealt with. We then process the vertical segments in the sorted lists, at each x-coordinate, and simply compare the lists.

The bottleneck in time-complexity is in sorting the inputs, plus we need O(N) space to make left-edge-sorted and right-edge-sorted copies of the inputs.

I am assuming that an input rectangle is given as a bottom-left point and a top-right point; for example, a unit square is [[1,1],[2,2]].

A final word of warning before you delve into my Python code. I am primarily a C-programmer leveraging some features of Python, so my code is not very “Pythonic”. Suggestions, improvements, and more test cases welcome.

def isBigRect(rectangles):
    if rectangles==[]:
        return True
    L=processBoundaries(rectangles,leftOrRight='left')
    R=processBoundaries(rectangles,leftOrRight='right')
    if L==False or R==False or not len(L)==len(R):
        return False
    L[-1][0]=R[-1][0]
    R[0][0]=L[0][0]
    if not L==R:
        return False
    else:
        return True

def processBoundaries(B,leftOrRight='left'):
    x0orx1=0
    if leftOrRight=='right':
        x0orx1=1
        res=[[]]
    elif leftOrRight=='left':
        x0orx1=0
        res=[]
    else:
        print 'process boundaries input error'
        return False
    B=[ [x[x0orx1][0],[x[0][1],x[1][1]]] for x in B]
    B.sort(cmp=lambda x,y: x[0]-y[0])
    i=0
    while i<len(B):
        x=B[i][0]
        res.append( [x,[]] )
        while i<len(B) and B[i][0]==x:
            res[-1][1].append(B[i][1])
            i=i+1
        res[-1][1]=mergeSpecialIntervals(res[-1][1])
        if res[-1][1]==False:
            return False
    if leftOrRight=='right':
        res[0]=['first',res[-1][1]]
    else:
        res.append(['last',res[0][1]])
    return res

def mergeSpecialIntervals(L):
    if L==[]:
        return False
    if len(L)==1:
        return L
    L.sort(cmp=lambda x,y: x[0]-y[0])
    res=[L[0]]
    for i in range(len(L)-1):
        if L[i][1]>L[i+1][0]:
            return False
        elif L[i][1]==L[i+1][0]:
            res[-1][1]=L[i+1][1]
            i=i+2
        else:
            res.append(L[i])
            i=i+1
    if i == len(L)-1:
        res.append(L[i])
    return res


A=[[[[0,0],[4,1]],
    [[7,0],[8,2]],
    [[6,2],[8,3]],
    [[5,1],[6,3]],
    [[4,0],[5,1]],
    [[6,0],[7,2]],
    [[4,2],[5,3]],
    [[2,1],[4,3]],
    [[0,1],[2,2]],
    [[0,2],[2,3]],
    [[4,1],[5,2]],
    [[5,0],[6,1]]],

   #shuffled the above a little
   [[[0,0],[4,1]],
    [[7,0],[8,2]],
    [[5,1],[6,3]],
    [[6,0],[7,2]],
    [[4,0],[5,1]],
    [[4,2],[5,3]],
    [[2,1],[4,3]],
    [[0,2],[2,3]],
    [[0,1],[2,2]],
    [[6,2],[8,3]],
    [[5,0],[6,1]],
    [[4,1],[5,2]]],

   [[[0,0],[4,1]]],

   [],

   #vertical stack
   [[[0,0],[1,1]],
	[[0,1],[1,2]],
	[[0,2],[1,3]],
	[[0,3],[1,4]],],
   
   #horizontal stack
   [[[0,0],[1,1]],
	[[1,0],[2,1]],
    [[2,0],[3,1]],
    [[3,0],[4,1]],],

   ]

print 'should be True'
for a in A:
    print isBigRect(a)


B=[
    #missing a square
    [[[0,0],[4,1]],
     [[7,0],[8,2]],
     [[6,2],[8,3]],
     [[5,1],[6,3]],
     [[6,0],[7,2]],
     [[4,2],[5,3]],
     [[2,1],[4,3]],
     [[0,1],[2,2]],
     [[0,2],[2,3]],
     [[4,1],[5,2]],
     [[5,0],[6,1]]],
    
    #exceeds top boundary
    [[[0,0],[4,1]],
     [[7,0],[8,2]],
     [[5,1],[6,4]],
     [[6,0],[7,2]],
     [[4,0],[5,1]],
     [[4,2],[5,3]],
     [[2,1],[4,3]],
     [[0,2],[2,3]],
     [[0,1],[2,2]],
     [[6,2],[8,3]],
     [[5,0],[6,1]],
     [[4,1],[5,2]]],
    
    #two overlapping rectangles
    [[[0,0],[4,1]],
     [[0,0],[4,1]]],
    
    #exceeds right boundary
    [[[0,0],[4,1]],
     [[7,0],[8,3]],
     [[5,1],[6,3]],
     [[6,0],[7,2]],
     [[4,0],[5,1]],
     [[4,2],[5,3]],
     [[2,1],[4,3]],
     [[0,2],[2,3]],
     [[0,1],[2,2]],
     [[6,2],[8,3]],
     [[5,0],[6,1]],
     [[4,1],[5,2]]],

    # has an intersecting pair
    [[[0,0],[5,1]],
     [[7,0],[8,2]],
     [[5,1],[6,3]],
     [[6,0],[7,2]],
     [[4,0],[5,1]],
     [[4,2],[5,3]],
     [[2,1],[4,3]],
     [[0,2],[2,3]],
     [[0,1],[2,2]],
     [[6,2],[8,3]],
     [[5,0],[6,1]],
     [[4,1],[5,2]]],
    
    #skips column 4
    [[[0,0],[4,1]],
     [[7,0],[8,2]],
     [[5,1],[6,3]],
     [[6,0],[7,2]],
     [[2,1],[4,3]],
     [[0,2],[2,3]],
     [[0,1],[2,2]],
     [[6,2],[8,3]],
     [[5,0],[6,1]],],


    #exceed left boundary
    [[[0,0],[4,1]],
     [[7,0],[8,2]],
     [[5,1],[6,3]],
     [[6,0],[7,2]],
     [[4,0],[5,1]],
     [[4,2],[5,3]],
     [[2,1],[4,3]],
     [[-1,2],[2,3]],
     [[0,1],[2,2]],
     [[6,2],[8,3]],
     [[5,0],[6,1]],
     [[4,1],[5,2]]],


   #horizontal stack with overlapping left boundaries at x=1
    [[[0,0],[1,1]],
     [[1,0],[2,1]],
     [[1,0],[3,1]],
     [[3,0],[4,1]],],
]

print 'should be False'
for b in B:
    print isBigRect(b)

Bonus: what if the input rectangles can be moved?

If you are a mathematician you might be inclined to view a rectangle as simply a width and height, with no fixed location in the plane. This is exactly what Tony Huynh, self-described Mathematical Vagabond (and matroid theorist by trade), pointed out to me, so I asked him if the alternate interpretation of the problem is NP-hard?

The new problem:

Given N rectangles with integer dimensions, check whether they can be translated and rotated to exactly cover a rectangular region.

Indeed the answer is yes, and Tony came back to me a few minutes later with this very nice reduction:

Consider the following input of rectangles, where a rectangle is given by a pair (height, width): (1,a1), (1,a2), …, (1,aN) and (S,S) where S is half the sum of all the ai. If you can solve the rectangle partitioning problem for this instance, then you can find a subset of a1, …, aN whose sum is exactly half the total sum (which is NP-hard).


Suppose you have two unsigned 16-bit integers,
X=0000 0000 0111 1111 1000 1010 0010 0100
and
Y=0000 0000 0000 0000 0011 0000 1010 1111.
How can you tell if Y is more than half as long as X? I came across this because I wanted some code to branch on whether or not Y was a lot bigger or smaller than the square root of a large positive integer X (behaviour when Y was around the square root was arbitrary). The square root of X has half as many bits as X does, and I figured that I could get the lengths of these numbers super fast to make a fairly coarse comparison.

I was right about that, but the efficiency of the answer depends on the CPU the code is running on. Even so, the portable answers should be quite fast, and they are interesting, so let’s look at them.

As usual, I did some research on the internet for this, and the bit twiddling I did is from Stackoverflow and Haacker’s Delight (it’s a book, of course, but I linked to some snippets on their website).

I wrote some code that explains the steps of a smearing function and a population count. In fact, that’s the spoiler: I will smear the bits of X to the right, and then count the population of 1-bits.

//smear
x = x | x>>1;
x = x | x>>2;
x = x | x>>4;
x = x | x>>8;
x = x | x>>16;

//population count
x = ( x & 0x55555555 ) + ( (x >> 1 ) & 0x55555555 );
x = ( x & 0x33333333 ) + ( (x >> 2 ) & 0x33333333 );
x = ( x & 0x0F0F0F0F ) + ( (x >> 4 ) & 0x0F0F0F0F );
x = ( x & 0x00FF00FF ) + ( (x >> 8 ) & 0x00FF00FF );
x = ( x & 0x0000FFFF ) + ( (x >> 16) & 0x0000FFFF );

The idea behind the population count is to divide and conquer. For example with 01 11, I first count the 1-bits in 01: there is 1 1-bit on the right, and there are 0 1-bits on the left, so I record that as 01 (in place). Similarly, 11 becomes 10, so the updated bit-string is 01 10, and now I will add the numbers in buckets of size 2, and replace the pair of them with the result; 1+2=3, so the bit string becomes 0011, and we are done. The original bit-string is replaced with the population count.

There are faster ways to do the pop count given in Hacker’s Delight, but this one is easier to explain, and seems to be the basis for most of the others. You can get my code as a Gist here..

X=0000 0000 0111 1111 1000 1010 0010 0100
Set every bit that is 1 place to the right of a 1
0000 0000 0111 1111 1100 1111 0011 0110
Set every bit that is 2 places to the right of a 1
0000 0000 0111 1111 1111 1111 1111 1111
Set every bit that is 4 places to the right of a 1
0000 0000 0111 1111 1111 1111 1111 1111
Set every bit that is 8 places to the right of a 1
0000 0000 0111 1111 1111 1111 1111 1111
Set every bit that is 16 places to the right of a 1
0000 0000 0111 1111 1111 1111 1111 1111
Accumulate pop counts of bit buckets size 2
0000 0000 0110 1010 1010 1010 1010 1010
Accumulate pop counts of bit buckets size 4
0000 0000 0011 0100 0100 0100 0100 0100
Accumulate pop counts of bit buckets size 8
0000 0000 0000 0111 0000 1000 0000 1000
Accumulate pop counts of bit buckets size 16
0000 0000 0000 0111 0000 0000 0001 0000
Accumulate pop counts of bit buckets size 32
0000 0000 0000 0000 0000 0000 0001 0111

The length of 8358436 is 23 bits

Y=0000 0000 0000 0000 0011 0000 1010 1111
Set every bit that is 1 place to the right of a 1
0000 0000 0000 0000 0011 1000 1111 1111
Set every bit that is 2 places to the right of a 1
0000 0000 0000 0000 0011 1110 1111 1111
Set every bit that is 4 places to the right of a 1
0000 0000 0000 0000 0011 1111 1111 1111
Set every bit that is 8 places to the right of a 1
0000 0000 0000 0000 0011 1111 1111 1111
Set every bit that is 16 places to the right of a 1
0000 0000 0000 0000 0011 1111 1111 1111
Accumulate pop counts of bit buckets size 2
0000 0000 0000 0000 0010 1010 1010 1010
Accumulate pop counts of bit buckets size 4
0000 0000 0000 0000 0010 0100 0100 0100
Accumulate pop counts of bit buckets size 8
0000 0000 0000 0000 0000 0110 0000 1000
Accumulate pop counts of bit buckets size 16
0000 0000 0000 0000 0000 0000 0000 1110
Accumulate pop counts of bit buckets size 32
0000 0000 0000 0000 0000 0000 0000 1110

The length of 12463 is 14 bits

So now I know that 12463 is significantly larger than the square root of 8358436, without taking square roots, or casting to floats, or dividing or multiplying.

It needs to be said that there are built-in functions that get the number of leading or trailing 0s. They are __builtin_clz() and __builtin_ctz() (i.e., clz = count leading zeros), and I read that most modern CPUs will do this computation in 1 instruction. So, alternatively, the length of uint32_t x is

32-__builtin_clz(x);