Binary tree traversals are a staple of the technical interview process at many software companies, small and large. For anyone with an understanding of recursion, the family of traversal techniques are quite straightforward. A common twist on these concepts that show up more in technical interviews than undergraduate computer science problem sets is the rather artificial constraint that asks one to implement the traversals using iteration rather than recursion.
I’ve always found the reference implementations of iterative tree traversals, particularly inorder traversal, to be lacking in intuitive understanding. The classic way of iteratively traversing a binary tree is to use a stack data structure, and the first snippet of code you often see is something like this:
stack = []
curr = root
while curr is not null || stack is not empty
while curr is not null
stack.push(curr)
curr = curr.left
...
When I see this code, I immediately have more questions than insight.
- Why is the code following left pointers while pushing all the nodes onto the stack?
- Why is the loop condition checking against both the current node and the stack size?
- Why is there a nested
while
loop?
To me, the intuitive way to reason about an iterative implementation of a recursive function is to simulate a call stack, and that usually begins with a pen and paper. For example, suppose we have a binary tree that looks like this:
A
/ \
B C
/ \
D E
An inorder traversal of such a tree should yield the nodes in this order:
D, B, E, A, C
. The best way to simulate the call stack that yields such a
traversal is to draw out the contents of the stack as the traversal makes its
way through the tree. I like to model my stacks after the real world, with a
physical base to indicate the bottom of the stack, and elements being pushed on
and popped off. Here’s the visualization of an empty stack, and its
transformation following two push (push(A)
, push(B)
) operations and one pop
(pop()
) operation:
B ~B~
A A A
_____ _____ _____ _____
stack stack stack stack
At the end of the operations, the stack contains a single element B
. The
notation here marks any items popped off the stack with strikethrough-like
markers (~), but leaves it on the stack in its original location to better
illustrate ordering. We can now use this notation to visually simulate the first
few steps of what an inorder traversal on the example tree might look like using
a standard depth-first
search approach. Initially,
the stack is empty, and the root node is pushed onto the stack.
A
_____
stack
We invoke the same logic repeatedly while the stack has items: pop a node off
and process it. When a node is popped off the stack, we need to process it in
inorder fashion: traverse its left child first, visit itself, then traverse its
right child. This translates to push(C)
, push(A)
, and push(B)
. Notice
that the elements are pushed onto the stack in reverse order from the way they
would be processed in a standard recursive implementation, as to achieve the
desired order. Perhaps more importantly, the node itself (A
) is pushed back
onto the stack.
B
A
C
A ~A~ ~A~
_____ _____ _____
stack stack stack
At this point, B
is popped off the stack and the same logic is applied. The
traversal proceeds down to the left child of B
, followed by a visit to B
,
and a subsequent traversal down the right child of B
. That is, push(E)
,
push(B)
, push(D)
. Here’s what the stack looks like after reaching the first
leaf node D
:
D
B
E
B ~B~ ~B~
A A A
C C C
~A~ ~A~ ~A~
_____ _____ _____
stack stack stack
Since D
has no child nodes, it will be popped off the stack then pushed back
onto the stack. Here’s the second piece of logic that is core to our traversal:
if a node being processed has already been discovered, then it should be
visited. With that, the algorithm for our inorder traversal - with respect to
processing a single node - can be expressed as follows:
if the node has already been discovered
"visit" or do something with the node
else
mark the node as discovered
push the right child of the node onto the stack
push the node onto the stack
push the left child of the node onto the stack
As noted earlier, the symmetry between this iterative approach and the standard recursive implementation is clear. The recursive implementation will first (in an eager, depth-first manner) traverse down the left child of a given node, then visit the node, followed by a traversal down the right child. Using an explicit stack data structure to simulate the calling pattern simply means pushing the nodes onto the explicit stack in reverse order as compared to the implicit recursion call stack.
To illustrate the process more clearly, we can push nodes onto the stack with
an explicit status: a start
status to indicate that the node has yet to be
processed and an end
status to indicate that it has been processed. For
example, A.start
and A.end
will represent the start and end state for a node
A
, respectively. Here’s the same visualization for the same first few
operations as above, with explicit status attributes for each node:
B.start
A.end
C.start
A.start ~A.start~ ~A.start~
_____ _____ _____
stack stack stack
With this extended notation, the logic needed to process any given node at the
top of the stack is clear. If a node at the top of the stack has a start
status, push its right child onto the stack with a start
status, push itself
back onto the stack with an end
status, and push its left child onto the stack
with a start
status.
An astute reader will notice that since there are only two states, a binary flag is sufficient for storing the same information. In fact, this can represented in exactly the same way as the classic implementations of graph traversals introduced in CLRS, in which nodes are assigned colors to keep track of traversal progress. In the case of binary tree traversals, we only need two colors: white for undiscovered nodes and black for discovered nodes. With that insight, the iterative version of any of the traversals becomes easy to derive:
def iterative_inorder_traversal(root):
stack = []
stack.append(root)
discovered = {}
while len(stack) > 0:
node = stack.pop()
if node in discovered:
pass # "Visit" or do something with the node
else:
discovered[node] = True
if node.right:
stack.append(node.right)
stack.append(node)
if node.left:
stack.append(node.left)
An iterative implementation of preorder or postorder traversal should easily follow from the inorder traversal; the sequence in which the nodes should be pushed onto the stack simply needs to be modified to match the desired traversal behavior. The cost of the intuitive version of these iterative traversals is a larger constant in the runtime complexity, as each node is actually processed twice. Ultimately, the runtime still grows at a rate linearly proportional to the size of the input.
Comments