Tricks of the trade: Recursion to Iteration, Part 2: Eliminating Recursion with the Time-Traveling Secret Feature Trick

Posted on
Tags: programming, recursion, iteration, python, google code jam, puzzles, recursion-to-iteration series, tail calls

This is the second post in a series on converting recursive algorithms into iterative algorithms. If you haven’t read the previous post, you probably should. It introduces some terms and background that will be helpful.

Last time, if you’ll recall, we discussed The Simple Method of converting recursive functions into iterative functions. The method, as its name implies, is straightforward and mostly mechanical. The only potential trouble is that, for the method to work, you must convert all of the recursive calls in your function into tail calls.

This task can be tricky. So we also discussed the Secret Feature trick for putting recursive calls into tail-call form. That trick works well for simple recursive calls, but when your calls aren’t so simple, you need a beefier version of the trick.

That’s the subject of this post: the Time-Traveling Secret Feature trick. It’s like sending a T-800 back in time to terminate a function’s recursiveness before it can ever offer resistance in the present.


But we’ll have to work up to it. So stick with the early examples to prepare for the cybernetic brain augmentations to come.

Enough talk! Let’s start with a practical example.

Update 2013-06-17: If you want another example, here’s also a step-by-step conversion of the Fibonacci function.

Computing binomial coefficients

The binomial coefficient \(\binom{n}{k}\) for integers \(n\) and \(k\) gives the number of ways to choose \(k\) items from a set of \(n\) items. For this reason, it’s often pronounced “\(n\) choose \(k\).” It’s very common in combinatorics and statistics and often pops up in the analysis of algorithms. A whole chapter, in fact, is dedicated to them in Graham, Knuth, and Patashnik’s Concrete Mathematics.

Math textbooks define the coefficient like this: \[\binom{n}{k} = \frac{n!}{k!(n-k)!},\] but that form causes all sorts of problems for computers. Fortunately, Concrete Mathematics helpfully points out a lifesaving absorption identity: \[\binom{n}{k} = \frac{n}{k}\binom{n - 1}{k - 1},\] and we know what that is, don’t we? That’s a recursive function just waiting to happen!

And that identity, along with the base case \(\binom{n}{0} = 1\), gives us today’s running example:

Before going further, a cautionary point. Math math and computer math are not the same. In math math, \(\frac{nx}{k} = \frac{n}{k}x = \frac{x}{k}n\), but in computer math the equation does not necessarily hold because of finite precision or, in our case, because we’re dealing with integer division that throws away remainders. (By the way, in Python, // is the division operator that throws away remainders.) For this reason, I have been careful to use the form

instead of the more literal translation

which, if you try it, you’ll see often produces the wrong answer.

Okay, our challenge is set before us. Ready to de-recursivify binomial?

Once again, the Secret Feature trick

First, however, we must put the function’s recursive call into tail-call form. And you remember how to do that, right? With the Secret Feature trick, of course! As a refresher from last time, here’s the trick in a nutshell:

  1. Find a recursive call that’s not a tail call.
  2. Identify what work is being done between that call and its return statement.
  3. Extend the function with a secret feature to do that work, as controlled by a new accumulator argument with a default value that causes it to do nothing.
  4. Use the secret feature to eliminate the old work.
  5. You’ve now got a tail call!
  6. Repeat until all recursive calls are tail calls.

Let’s go through it, by the numbers.

  1. Find a recursive call that’s not a tail call.

    Well, there’s only three lines of logic. We’re not talking rocket science here.

  2. Identify what work is being done between that call and its return statement.

    In our mind’s eye, let’s lift the recursive call out of the return statement and replace it with the variable x.

    Now we can visualize the additional work as a separate function operating on x: multiplying it on the left by one number and dividing it on the right by another:

    For functions this simple, you can just hold them in your head and inline them into your code as needed. But for more-complicated functions, it really does help to break them out like this. For this example, I’m going to pretend that our work function is more complicated, just to show how to do it.

  3. Extend the function with a secret feature to do that work, as controlled by a new accumulator argument – in this case a pair (lmul, rdiv) – with a default value that causes it to do nothing.

    Note that I just mechanically converted all return {whatever} statements into return work({whatever}, lmul, rdiv).

  4. Use the secret feature to eliminate the old work.

    Watch what happens to that final line.

  5. You’ve now got a tail call!

    Indeed, we do! All that’s left is to inline the sole remaining work call:

    And to simplify away the needless multiplication by 1 in the return statement:

  6. Repeat until all recursive calls are tail calls.

    There was only one, so we’re done! Yay!

With all the recursive calls (just the one) converted into tail calls, we can easily convert the function into iterative form by applying the Simple Method. Here’s what we get after we load the function body into a one-shot loop and convert the tail call into an assignment-and-continue pair.

As a final touch, we can tidy up and in the process tuck the accumulators inside the function body to keep them completely secret:

And now we have an iterative version of our original binomial function!

A short, cautionary lesson

This next part is subtle but important. To understand what’s going on, you first need to see it. So, once again, let’s use the Online Python Tutor’s excellent Python-runtime visualizer. Open the link below in a new tab if you can, and then read on.

Visualize the execution of binomial(39, 9).

Click the Forward button to advance through each step of the computation. When you get to step 22, where the recursive version has fully loaded the stack with its nested frames, click slowly. Watch the return value (in red) carefully as you advance. See how it gently climbs to the final answer of 211915132, never exceeding that value?

Now continue stepping through to the iterative version. Watch the value of lmul as you advance. See how it grows rapidly, finally reaching a whopping 76899763100160?

This difference matters. While both versions computed the correct answer, the original recursive version would do so without exceeding the capacity of 32-bit signed integers.1 The iterative version, however, needs a comparatively hoggish 47 bits to faithfully arrive at the correct answer.

Python’s integers, lucky for us, grow as needed to hold their values, so we need not fear overflow in this case, but not all languages for which we might want to use our recursion-to-iteration techniques offer such protections. It’s something to keep in mind the next time you’re taking the recursion out of an algorithm in C:

In any case, bigger integers are slower and consume more memory. In one important way, the original, recursive algorithm was better. We have lost something important.

Now let’s get it back.

The importance of respecting associativity and commutativity

It turns out that our iterative version of binomial is the original’s evil twin. Its stack usage looks charming and all, but something else about it is off. Something hard to put our finger on. What is the difference?

It all comes down to grouping and ordering decisions. Implicit grouping and ordering decisions that we overlooked during our code transformations.

Consider how both versions compute \(\binom{5}{2}\).

First, the recursive version. We start out with \(n=5, k=2\) and then recursively expand the expression \[n\binom{n-1}{k-1}/k\] until we hit the base case of \(k=0\). The series of expansions proceeds like so:

\[\binom{5}{2} = 5{\binom{4}{1}} / 2 = 5\cdot\left(4\binom{3}{0} / 1\right) / 2 = 5\cdot(4\cdot(1)/1)/2 = 10.\]

At every step (except for the base case) we perform a multiplication by \(n\) and then a division by \(k\). It’s that division by \(k\) that keeps intermediate results from getting out of hand.

Now lets look at the iterative version. It’s not so obvious what’s going on. (Score another point for recursion over iteration!) But we can puzzle it out. We start out with both accumulators at \(1\) and then loop over the decreasing values of \(k\), building up the accumulators until \(k=0\), at which point we return the quotient of the accumulators.

So the calculation is given by

\[\binom{5}{2} = \frac{\mathit{lmul}}{\mathit{rdiv}} = \frac{((1)\cdot 5)\cdot 4}{((1)\cdot 2)\cdot 1} = 10.\]

Both the numerator and denominator grow and grow and grow until the final division. Not a good trend.

So why did the Secret Feature trick work great for factorial in our previous article but fail us, albeit subtly, now? The answer is that in factorial the extra work being done between each recursive call and its return statement was multiplication and nothing more. And multiplication (of integers that don’t overflow) is commutative and associative. Meaning, ordering and grouping don’t matter.

The lesson, then, is this: Think carefully about whether ordering and grouping matter before using the Secret Feature trick. If it matters, you have two options: One, you can modify your algorithm so that ordering and grouping don’t matter and then use the Secret Feature trick. (But this option is often intractable.) Two, you can use the Time-Traveling Secret Feature trick, which preserves ordering and grouping. And that trick is what we’ve been waiting for.

It’s time.

The Time-Traveling Secret Feature trick

What’s about to happen next is so mind-bendingly awesome that you should probably get a drink to prepare yourself. It’s like combining a T-800 from The Terminator, with the dreams-within-dreams layering from Inception, with the momentary inversion of reality that occurs when the quantum field surrounding an inductive proof collapses and everything snaps into place.

It’s. Really. Cool.

Got your drink? Okay, let’s do this.

Let’s go back to our original, recursive binomial function:

Our goal is to create an equivalent iterative version that exactly preserves the properties of the original. Well, how the hell are we going to do that?

Hold that thought for a sec.

Let’s just stop for a moment and think about what that recursive function is doing. We call it to compute some answer \(x_t\). But that answer depends on some earlier answer \(x_{t-1}\) that the function doesn’t know, so it calls itself to get that answer. And so on. Until, finally, it finds one concrete answer \(x_0\) that it actually knows and, from which, it can build every subsequent answer.

So, in truth, our answer \(x_t\) is just the final element in a whole timeline of needed earlier answers:

\[x_0, x_1, \ldots x_t.\]

Well, so what? Why should we care?

Because we’ve watched The Terminator about six hundred times! We know how to get rid of a problem in the present when we’ve seen its timeline: We send a Terminator into the past to rewrite that timeline so the problem never gets created in the first place.

And our little recursive problem with binomial here? We’ve seen its timeline.

That’s right. It’s about to get real.

One more thing. Every single one of these steps preserves the original function’s behavior. You can run your unit tests after every step, and they should all pass.

Let’s begin.

  1. Send a T-800 terminator unit into the function’s timeline, back to the time of \(x_0\).

    Oh, we don’t have a T-800 terminator unit? No problem. Assume we’ve sent a T-800 terminator unit into the function’s timeline, back to the time of \(x_0\).

    That was easy. What’s next?

  2. Move the recursive call and surrounding work into a helper function.

    This might seem like overkill but prevents mistakes. Do you make mistakes? Then just make the helper function already. I’m calling it step for reasons that will shortly become obvious.

  3. Partition the helper function into its 3 fundamental parts.

    They are:
    1. the recursive call to get the previous answer \(x_{i-1}\),
    2. the work to compute the current answer \(x_i\) from \(x_{i-1}\), and
    3. a return statement applied to \(x_i\) alone:
  4. Secretly prepare to receive communications from the T-800 terminator unit.

    You see, once the T-800 gets its cybernetic hands on earlier values in the timeline, we can use its knowledge to eliminate the recursion. We just need some way to get that knowledge from the T-800, now stuck in the past, to us, stuck in the present.

    Fortunately, we’ve seen time-travel movies, so we know exactly how this problem is solved. We’ll just use a dead drop! That’s a prearranged secret location where the T-800 will drop values in the past and where we will check for them in the future.

    So, per prior arrangement with the T-800, we’ll extend our helper function with a secret feature that checks the dead drop for a previous value \(x_{i-1}\) and, if one’s there, uses it to – muahahahaha! – break the recursive call chain:

    Ordinarily, we would think of the helper as a function of \(n_i\) and \(k_i\) that computes \(x_i\). That’s because the \(x_{i-1}\) argument can safely be ignored. It has no effect unless the T-800 drops a value into it.

    But if the T-800 does drop a value into it, we can see the function as representing the transformation

    \[(n_i, k_i, x_{i-1}) \to x_i\]

    Note that \(x_{i-1}\) goes in and \(x_i\) comes out. And from that little kernel of power we can do some seriously crazy stuff. For example: reverse the backward flow of time. That is, compute forward through the recursive timeline instead of backward.


    Read on.

  5. Modify the helper’s return statement to also pass through its non-secret arguments.

    Now our helper represents the transformation

    \[(n_i, k_i, x_{i-1}) \to (n_i, k_i, x_i)\]


  6. Apply, to the non-secret arguments in the helper’s return statement, the opposite of the transformations applied to them in the recursive call.

    Since we passed \(n-1,k-1\) into the recursive call, we’ll pass \(n+1,k+1\) out through the return statement:

    Now our helper represents the transformation

    \[(n_i, k_i, x_{i-1}) \to (n_{i+1}, k_{i+1}, x_i)\]

    And now we can reverse the flow of time.

    When we started out with our original recursive function, if we asked it for \(x_t\), the function had to go back in the timeline to get \(x_{t-1}\). And to get \(x_{t-1}\), it had to go back even further to get \(x_{t-2}\). And so on, chewing up stack every backward step of the way to \(x_0\). It was heartbreaking, watching it work like that.

    But now, we can step the other way. If we have any (\(n_i, k_i, x_{i-1})\) we can get \(x_i\) straight away, no recursion required. In goes \(x_{i-1}\), out comes \(x_i\). We can go forward.

    Why, if we knew \(n_1\), \(k_1\), and \(x_0\), we could compute the entire series \(x_0, x_1, \ldots x_t\) with zero recursion by starting at \(i=0\) and working forward.

    So let’s get those values!

  7. Determine the initial conditions at the start of the timeline.

    For this simple example, most of you can probably determine the initial conditions by inspection. But I’m going to go through the process anyway. You never know when you might need it. So:

    What’s the start of the timeline? It’s when the recursive binomial function calls itself so many times that it finally hits one of its base cases, which defines the first entry in the timeline, anchoring the timeline at time \(i = 0\). The base cases in this example are easy to find since we’ve already split out the step logic; all that’s left in binomial is the base-case logic. It’s easy to see that there is only one base case, and it’s when \(k=0\):

    From this base case we know that at the very start of the timeline we must have \(i=0\), \(k=0\), and \(x=1\). Thus we know that \(k_0=0\) and \(x_0=1\). And because the timeline ends at time \(i = t\), and we know that \((n_t, k_t) = (n, k)\), the original arguments binomial was called with.

    Let’s visualize what we know about the timeline so far:

    time \(i\): \(0\) \(1\) \(2\) \(\cdots\) \(t\)
    \(n_i\): \(\cdots\) \(n\)
    \(k_i\): \(0\) \(\cdots\) \(k\)
    \(x_i\): \(1\) \(\cdots\) ?

    The answer we ultimately seek is \(x_t\), marked by a question mark in the timeline.

    Now to determine \(n_1\) and \(k_1\). To do that, we must answer this question: How do we go from \((n_i, k_i)\) to \((n_{i+1}, k_{i+1})\)?

    We already know the answer! Our step helper computes \((n_i, k_i, x_{i-1}) \to (n_{i+1}, k_{i+1}, x_i)\). So look at its logic: How does it transform its \(n\) and \(k\) arguments? It adds one to them. Thus,

    \[\begin{eqnarray*} n_{i+1} & = & n_i + 1,\\ k_{i+1} & = & k_i + 1. \end{eqnarray*}\]

    Therefore, \[k_1 = k_0 + 1 = 0 + 1 = 1.\] Since \(k_i\) is \(i\) steps from \(k_0 = 0\) and each step adds \(+1\) we have \[k_i = k_0 + i (+1) = i.\] And when \(i=t\) we know from the equation above that \(t = k_{t} = k\).

    Finally, we can compute \(n_1\), which is \(t-1\) reverse steps from \(n_t=n\), the only \(n_i\) that we know so far: \[n_1 = n_t - (t - 1) (+1) = n - k + 1.\]

    And now we have our initial conditions:

    \[(n_1, k_1, x_0) = (n - k + 1, 1, 0).\]

    So now our knowledge of the timeline looks like this:

    time \(i\): \(0\) \(1\) \(2\) \(\cdots\) \(t=k\)
    \(n_i\): \(n-k\) \(n-k+1\) \(\cdots\) \(n\)
    \(k_i\): \(0\) \(1\) \(\cdots\) \(k\)
    \(x_i\): \(1\) \(\cdots\) ?

    And with this knowledge, we can step forward through the timeline, from time \(i=1\) to time \(i=2\), and so on, until finally, at the last step when \(i=t\) we will have determined \(x_t\), our desired answer!

    Let’s do it!

  8. In the main function, iterate the step helper \(t\) times, starting from the initial conditions. Then return \(x_t\).

    Boom! That’s 100% iterative. But there’s more!

  9. Remove the base cases from the original function.

    Since our iterations start from a base case, the base cases are already incorporated into our new, iterative logic.

    Boom again!

  10. Remove the secret feature from the step function.

    Since previous_x always gets a value now, we can make it a required part of the function.

  11. Inline the step function.

    We’re on it!

  12. Apply finishing touches.

    This example is pretty tight already, but we can substitute away the x variable.

    And, because \(k_i = i\), we can replace the for loop’s induction variable _i with \(k_i\) and correspondingly eliminate \(k_i\) from the step logic.

    And, while we’re making stuff better, there’s an obvious optimization. One property of binomial coefficients is that \(\binom{n}{k} = \binom{n}{n-k}\). One property of our code is that it runs for \(t=k\) steps. So when \(k > n-k\), we can reduce the number of steps by solving for \(\binom{n}{n-k}\) instead. Let’s add a couple of lines at the start of the logic to claim this optimization.

    And, of course, let’s add a docstring – we’re nice like that.

    The final result:

Let’s review what we just did. It’s mind blowing:

  1. We sent an imaginary T-800 terminator unit back into the function’s timeline.
  2. Then we added a secret feature to our function that allowed the T-800 to send us values from the timeline’s past.
  3. Then we used those values to break the recursive chain, reverse the backward flow of time, and compute the timeline forward and iteratively instead of backward and recursively.
  4. The function then being fully iterative, we removed the secret feature, and the imaginary T-800 winked out of existence. (In an ironic twist of fate, the T-800, via the secret-feature dead drop, had delivered into our hands precisely the knowledge we needed to write both of them out of history!)
  5. And now we have a fast, efficient, and property-preserving iterative version of our original recursive function, and it’s about as good as anything we could hope to conjure up from scratch. (See it in action – just one stack frame and easy on the ints, too.)
  6. And – most importantly – we did it all using a mechanical, repeatable process!


Well, that’s it for this installment. I hope you enjoyed reading it as much as I did writing it. If you liked it (or didn’t), or if you found a mistake, or especially if you can think of any way to help make my explanations better, let me know. Just post a comment or fire me a tweet at @tmoertel.

Until next time, keep your brain recursive and your Python code iterative.

  1. In the visualization, you can’t actually see the largest integer produced by the recursive version’s computation. It’s produced between steps 32 and 33, when the return value from step 32 is multiplied by step 33’s \(n=39\) to arrive at an integer of \(\log_2(39 \cdot 48903492) \approx 30.8\) bits, which is then immediately divided by \(k=9\) to produce the final answer.