Tricks of the trade: Recursion to Iteration, Part 2: Eliminating Recursion with the Time-Traveling Secret Feature Trick

By Tom Moertel
Posted on
Tags: programming, recursion, iteration, python, google code jam, puzzles, recursion-to-iteration series, tail calls

This is the second post in a series on converting recursive algorithms into iterative algorithms. If you haven’t read the previous post, you probably should. It introduces some terms and background that will be helpful.

Last time, if you’ll recall, we discussed The Simple Method of converting recursive functions into iterative functions. The method, as its name implies, is straightforward and mostly mechanical. The only potential trouble is that, for the method to work, you must convert all of the recursive calls in your function into tail calls.

This task can be tricky. So we also discussed the Secret Feature trick for putting recursive calls into tail-call form. That trick works well for simple recursive calls, but when your calls aren’t so simple, you need a beefier version of the trick.

That’s the subject of this post: the Time-Traveling Secret Feature trick. It’s like sending a T-800 back in time to terminate a function’s recursiveness before it can ever offer resistance in the present.

Yeah.

But we’ll have to work up to it. So stick with the early examples to prepare for the cybernetic brain augmentations to come.

Enough talk! Let’s start with a practical example.

Update 2013-06-17: If you want another example, here’s also a step-by-step conversion of the Fibonacci function.

Computing binomial coefficients

The binomial coefficient nk for integers n and k gives the number of ways to choose k items from a set of n items. For this reason, it’s often pronounced “n choose k.” It’s very common in combinatorics and statistics and often pops up in the analysis of algorithms. A whole chapter, in fact, is dedicated to them in Graham, Knuth, and Patashnik’s Concrete Mathematics.

Math textbooks define the coefficient like this: nk=n!k!(nk)!, but that form causes all sorts of problems for computers. Fortunately, Concrete Mathematics helpfully points out a lifesaving absorption identity: nk=nkn1k1, and we know what that is, don’t we? That’s a recursive function just waiting to happen!

And that identity, along with the base case n0=1, gives us today’s running example:

def binomial(n, k):
    if k == 0:
        return 1
    return n * binomial(n - 1, k - 1) // k

Before going further, a cautionary point. Math math and computer math are not the same. In math math, nxk=nkx=xkn, but in computer math the equation does not necessarily hold because of finite precision or, in our case, because we’re dealing with integer division that throws away remainders. (By the way, in Python, // is the division operator that throws away remainders.) For this reason, I have been careful to use the form

n * binomial(n - 1, k - 1) // k

instead of the more literal translation

(n // k) * binomial(n - 1, k - 1)

which, if you try it, you’ll see often produces the wrong answer.

Okay, our challenge is set before us. Ready to de-recursivify binomial?

Once again, the Secret Feature trick

First, however, we must put the function’s recursive call into tail-call form. And you remember how to do that, right? With the Secret Feature trick, of course! As a refresher from last time, here’s the trick in a nutshell:

  1. Find a recursive call that’s not a tail call.
  2. Identify what work is being done between that call and its return statement.
  3. Extend the function with a secret feature to do that work, as controlled by a new accumulator argument with a default value that causes it to do nothing.
  4. Use the secret feature to eliminate the old work.
  5. You’ve now got a tail call!
  6. Repeat until all recursive calls are tail calls.

Let’s go through it, by the numbers.

  1. Find a recursive call that’s not a tail call.

    Well, there’s only three lines of logic. We’re not talking rocket science here.

    def binomial(n, k):
        if k == 0:
            return 1
        return n * binomial(n - 1, k - 1) // k
        #          ^^^^^^^^^^^^^^^^^^^^^^
        #               right here!
  2. Identify what work is being done between that call and its return statement.

    In our mind’s eye, let’s lift the recursive call out of the return statement and replace it with the variable x.

        x = binomial(n - 1, k - 1)
        return n * x // k

    Now we can visualize the additional work as a separate function operating on x: multiplying it on the left by one number and dividing it on the right by another:

    def work(x, lmul, rdiv):
        return lmul * x // rdiv

    For functions this simple, you can just hold them in your head and inline them into your code as needed. But for more-complicated functions, it really does help to break them out like this. For this example, I’m going to pretend that our work function is more complicated, just to show how to do it.

  3. Extend the function with a secret feature to do that work, as controlled by a new accumulator argument – in this case a pair (lmul, rdiv) – with a default value that causes it to do nothing.

    def work(x, lmul, rdiv):
        return lmul * x // rdiv
    
    def binomial(n, k, lmul=1, rdiv=1):
        if k == 0:
            return work(1, lmul, rdiv)
        return work(n * binomial(n - 1, k - 1) // k, lmul, rdiv)

    Note that I just mechanically converted all return {whatever} statements into return work({whatever}, lmul, rdiv).

  4. Use the secret feature to eliminate the old work.

    Watch what happens to that final line.

    def work(x, lmul, rdiv):
        return lmul * x // rdiv
    
    def binomial(n, k, lmul=1, rdiv=1):
        if k == 0:
            return work(1, lmul, rdiv)
        return binomial(n - 1, k - 1, lmul * n, k * rdiv)
  5. You’ve now got a tail call!

    Indeed, we do! All that’s left is to inline the sole remaining work call:

    def binomial(n, k, lmul=1, rdiv=1):
        if k == 0:
            return lmul * 1 // rdiv
        return binomial(n - 1, k - 1, lmul * n, k * rdiv)

    And to simplify away the needless multiplication by 1 in the return statement:

    def binomial(n, k, lmul=1, rdiv=1):
        if k == 0:
            return lmul // rdiv
        return binomial(n - 1, k - 1, lmul * n, k * rdiv)
  6. Repeat until all recursive calls are tail calls.

    There was only one, so we’re done! Yay!

With all the recursive calls (just the one) converted into tail calls, we can easily convert the function into iterative form by applying the Simple Method. Here’s what we get after we load the function body into a one-shot loop and convert the tail call into an assignment-and-continue pair.

def binomial_iter(n, k, lmul=1, rdiv=1):
    while True:
        if k == 0:
            return lmul // rdiv
        (n, k, lmul, rdiv) = (n - 1, k - 1, lmul * n, k * rdiv)
        continue
        break

As a final touch, we can tidy up and in the process tuck the accumulators inside the function body to keep them completely secret:

def binomial_iter(n, k):
    lmul = rdiv = 1
    while k > 0:
        (n, k, lmul, rdiv) = (n - 1, k - 1, lmul * n, k * rdiv)
    return lmul // rdiv

And now we have an iterative version of our original binomial function!

A short, cautionary lesson

This next part is subtle but important. To understand what’s going on, you first need to see it. So, once again, let’s use the Online Python Tutor’s excellent Python-runtime visualizer. Open the link below in a new tab if you can, and then read on.

Visualize the execution of binomial(39, 9).

Click the Forward button to advance through each step of the computation. When you get to step 22, where the recursive version has fully loaded the stack with its nested frames, click slowly. Watch the return value (in red) carefully as you advance. See how it gently climbs to the final answer of 211915132, never exceeding that value?

Now continue stepping through to the iterative version. Watch the value of lmul as you advance. See how it grows rapidly, finally reaching a whopping 76899763100160?

This difference matters. While both versions computed the correct answer, the original recursive version would do so without exceeding the capacity of 32-bit signed integers.1 The iterative version, however, needs a comparatively hoggish 47 bits to faithfully arrive at the correct answer.

Python’s integers, lucky for us, grow as needed to hold their values, so we need not fear overflow in this case, but not all languages for which we might want to use our recursion-to-iteration techniques offer such protections. It’s something to keep in mind the next time you’re taking the recursion out of an algorithm in C:

typedef int32_t Int;

Int binomial(Int n, Int k) {
    if (k == 0)
        return 1;
    return n * binomial(n - 1, k - 1) / k;
}

Int binomial_iter(Int n, Int k) {
    Int lmul = 1, rdiv = 1;
    while (k > 0) {
        lmul *= n; rdiv *= k; n -= 1; k -= 1;
    }
    return lmul / rdiv;
}

int main(int argc, char* argv[]) {
    printf("binomial(39, 9)      = %ld\n", (long) binomial(39, 9));
    printf("binomial_iter(39, 9) = %ld\n", (long) binomial_iter(39, 9));
}

/* Output with Int = int32_t:

binomial(39, 9)      = 211915132
binomial_iter(39, 9) = -4481      <-- oops!

*/

In any case, bigger integers are slower and consume more memory. In one important way, the original, recursive algorithm was better. We have lost something important.

Now let’s get it back.

The importance of respecting associativity and commutativity

It turns out that our iterative version of binomial is the original’s evil twin. Its stack usage looks charming and all, but something else about it is off. Something hard to put our finger on. What is the difference?

It all comes down to grouping and ordering decisions. Implicit grouping and ordering decisions that we overlooked during our code transformations.

Consider how both versions compute 52.

First, the recursive version. We start out with n=5,k=2 and then recursively expand the expression nn1k1/k until we hit the base case of k=0. The series of expansions proceeds like so:

52=541/2=5(430/1)/2=5(4(1)/1)/2=10.

At every step (except for the base case) we perform a multiplication by n and then a division by k. It’s that division by k that keeps intermediate results from getting out of hand.

Now lets look at the iterative version. It’s not so obvious what’s going on. (Score another point for recursion over iteration!) But we can puzzle it out. We start out with both accumulators at 1 and then loop over the decreasing values of k, building up the accumulators until k=0, at which point we return the quotient of the accumulators.

So the calculation is given by

52=lmulrdiv=((1)5)4((1)2)1=10.

Both the numerator and denominator grow and grow and grow until the final division. Not a good trend.

So why did the Secret Feature trick work great for factorial in our previous article but fail us, albeit subtly, now? The answer is that in factorial the extra work being done between each recursive call and its return statement was multiplication and nothing more. And multiplication (of integers that don’t overflow) is commutative and associative. Meaning, ordering and grouping don’t matter.

The lesson, then, is this: Think carefully about whether ordering and grouping matter before using the Secret Feature trick. If it matters, you have two options: One, you can modify your algorithm so that ordering and grouping don’t matter and then use the Secret Feature trick. (But this option is often intractable.) Two, you can use the Time-Traveling Secret Feature trick, which preserves ordering and grouping. And that trick is what we’ve been waiting for.

It’s time.

The Time-Traveling Secret Feature trick

What’s about to happen next is so mind-bendingly awesome that you should probably get a drink to prepare yourself. It’s like combining a T-800 from The Terminator, with the dreams-within-dreams layering from Inception, with the momentary inversion of reality that occurs when the quantum field surrounding an inductive proof collapses and everything snaps into place.

It’s. Really. Cool.

Got your drink? Okay, let’s do this.

Let’s go back to our original, recursive binomial function:

def binomial(n, k):
    if k == 0:
        return 1
    return n * binomial(n - 1, k - 1) // k

Our goal is to create an equivalent iterative version that exactly preserves the properties of the original. Well, how the hell are we going to do that?

Hold that thought for a sec.

Let’s just stop for a moment and think about what that recursive function is doing. We call it to compute some answer xt. But that answer depends on some earlier answer xt1 that the function doesn’t know, so it calls itself to get that answer. And so on. Until, finally, it finds one concrete answer x0 that it actually knows and, from which, it can build every subsequent answer.

So, in truth, our answer xt is just the final element in a whole timeline of needed earlier answers xi:

(x0,x1,xt)=(xii=0,1,t).

Well, so what? Why should we care?

Because we’ve watched The Terminator about six hundred times! We know how to get rid of a problem in the present when we’ve seen its timeline: We send a Terminator into the past to rewrite that timeline so the problem never gets created in the first place.

And our little recursive problem with binomial here? We’ve seen its timeline.

That’s right. It’s about to get real.

One more thing. Every single one of these steps preserves the original function’s behavior. You can run your unit tests after every step, and they should all pass.

Let’s begin.

  1. Send a T-800 terminator unit into the function’s timeline, back to the time of x0.

    Oh, we don’t have a T-800 terminator unit? No problem. We’ll just invoke an induction hypothesis: Assume we’ve sent a T-800 terminator unit into the function’s timeline, back to the time of x0.

    That was easy. What’s next?

  2. Move the recursive call and surrounding work into a helper function.

    This might seem like overkill but prevents mistakes. Do you make mistakes? Then just make the helper function already. I’m calling it step for reasons that will shortly become obvious.

    def step(n, k):
        return n * binomial(n - 1, k - 1) // k
    
    def binomial(n, k):
        if k == 0:
            return 1
        return step(n, k)
  3. Partition the helper function into its 3 fundamental parts.

    They are:
    1. the recursive call to get the previous answer xi1,
    2. the work to compute the current answer xi from xi1, and
    3. a return statement applied to xi alone:
    def step(n, k):
        previous_x = binomial(n - 1, k - 1)  # part (1)
        x = n * previous_x // k              # part (2)
        return x                             # part (3)
    
    def binomial(n, k):
        if k == 0:
            return 1
        return step(n, k)
  4. Secretly prepare to receive communications from the T-800 terminator unit.

    You see, once the T-800 gets its cybernetic hands on earlier values in the timeline, we can use its knowledge to eliminate the recursion. We just need some way to get that knowledge from the T-800, now stuck in the past, to us, stuck in the present.

    Fortunately, we’ve seen time-travel movies, so we know exactly how this problem is solved. We just use a dead drop! That’s a prearranged secret location where the T-800 will drop values in the past and where we will check for them in the future.

    So, per prior arrangement with the T-800, we’ll extend our helper function with a secret feature that checks the dead drop for a previous value xi1 and, if one’s there, uses it to – muahahahaha! – break the recursive call chain:

    def step(n, k, previous_x=None):             # <- new stuff
        if previous_x is None:                   # <
            previous_x = binomial(n - 1, k - 1)  # <
        x = n * previous_x // k
        return x
    
    def binomial(n, k):
        if k == 0:
            return 1
        return step(n, k)

    Ordinarily, we would think of the helper as a function of ni and ki that computes xi. That’s because the xi1 argument can safely be ignored. It has no effect unless the T-800 drops a value into it.

    But if the T-800 does drop a value into it, we can see it as a function representing the transformation

    (ni,ki,xi1)xi

    And from that little kernel of power we can do some seriously crazy stuff. For example: reverse the flow of time. That is, compute forward through the timeline instead of backward.

    Yeah.

    Read on.

  5. Modify the helper’s return statement to also pass through its non-secret arguments.

    def step(n, k, previous_x=None):
        if previous_x is None:
            previous_x = binomial(n - 1, k - 1)
        x = n * previous_x // k
        return (n, k, x)  # <-- here
    
    def binomial(n, k):
        if k == 0:
            return 1
        return step(n, k)[2]  # <-- note also

    Now our helper represents the transformation

    (ni,ki,xi1)(ni,ki,xi)

    Interesting.

  6. Apply, to the non-secret arguments in the helper’s return statement, the opposite of the transformations applied to them for the recursive call.

    Since we passed n1,k1 for the recursive call, in the return statement we’ll pass n+1,k+1:

    def step(n, k, previous_x=None):
        if previous_x is None:
            previous_x = binomial(n - 1, k - 1)  # <-- look here
        x = n * previous_x // k
        return (n + 1, k + 1, x)  # <-- apply opposite here
    
    def binomial(n, k):
        if k == 0:
            return 1
        return step(n, k)[2]

    Now our helper represents the transformation

    (ni,ki,xi1)(ni+1,ki+1,xi)

    And now we can reverse the flow of time.

    When we started out with our original recursive function, if we asked it for xt, the function had to go back in the timeline to get xt1. And to get xt1, it had to go back even further to get xt2. And so on, chewing up stack every backward step of the way to x0. It was heartbreaking, watching it work like that.

    But now, we can step the other way. If we have any (ni,ki,xi1) we can get xi straight away, no recursion required. In goes xi1, out comes xi. We can go forward.

    Why, if we knew n1, k1, and x0, we could compute the entire series x0,x1,xt with zero recursion by starting at i=0 and working forward.

    So let’s get those values!

  7. Determine the initial conditions at the start of the timeline.

    For this simple example, most of you can probably determine the initial conditions by inspection. But I’m going to go through the process anyway. You never know when you might need it. So:

    What’s the start of the timeline? It’s when the recursive binomial function calls itself so many times that it finally hits one of its base cases, which defines the first entry in the timeline, anchoring the timeline at time i=0. The base cases in this example are easy to find since we’ve already split out the step logic; all that’s left in binomial is the base-case logic. It’s easy to see that there is only one base case, and it’s when k=0:

    def step(n, k, previous_x=None):
        if previous_x is None:
            previous_x = binomial(n - 1, k - 1)
        x = n * previous_x // k
        return (n + 1, k + 1, x)
    
    def binomial(n, k):
        if k == 0:    # <-- base case
            return 1
        return step(n, k)[2]

    From this base case we know that at the very start of the timeline we must have i=0, k=0, and x=1. Thus we know that k0=0 and x0=1. And because the timeline ends at time i=t, and we know that (nt,kt)=(n,k), the original arguments binomial was called with.

    Let’s visualize what we know about the timeline so far as a table:

    time i: 0 1 2 t
    ni: n
    ki: 1 k
    xi: 0 ?

    The answer we ultimately seek is xt, marked by a question mark in the timeline.

    Now to determine n1 and k1. To do that, we must answer this question: How do we go from (ni,ki) to (ni+1,ki+1)?

    We already know the answer! Our step helper computes (ni,ki,xi1)(ni+1,ki+1,xi). So look at its logic: How does it transform its n and k arguments? It adds one to them. Thus,

    ni+1=ni+1,ki+1=ki+1.

    Therefore, k1=k0+1=0+1=1. Since ki is i steps from k0=0 and each step adds +1 we have ki=k0+i(+1)=i. And when i=t we know that t=i=ki=t=kt=k. Therefore t=k.

    Finally, we can compute n1, which is t1 reverse steps from nt=n, the only ni that we know so far: n1=nt(t1)(+1)=nk+1.

    And now we have our initial conditions:

    (n1,k1,x0)=(nk+1,1,0).

    So now our knowledge of the timeline looks like this:

    time i: 0 1 2 t=k
    ni: nk nk+1 n
    ki: 0 1 k
    xi: 1 ?

    And with this knowledge, we can step forward through the timeline, from time i=1 to time i=2, and so on, until finally, at the last step when i=t we will have determined xt, our desired answer!

    Let’s do it!

  8. In the main function, iterate the step helper t times, starting from the initial conditions. Then return xt.

    def step(n, k, previous_x=None):
        if previous_x is None:
            previous_x = binomial(n - 1, k - 1)
        x = n * previous_x // k
        return (n + 1, k + 1, x)
    
    def binomial(n, k):
        if k == 0:
            return 1
        t = k                                            # <- new
        (n, k, previous_x) = (n - k + 1, 1, 1)           #
        for _i in xrange(1, t + 1):                      #
            (n, k, previous_x) = step(n, k, previous_x)  #
        return previous_x  # = x_t                       #

    Boom! That’s 100% iterative. But there’s more!

  9. Remove the base cases from the original function.

    Since our iterations start from a base case, the base cases are already incorporated into our new, iterative logic.

    def step(n, k, previous_x=None):
        if previous_x is None:
            previous_x = binomial(n - 1, k - 1)
        x = n * previous_x // k
        return (n + 1, k + 1, x)
    
    def binomial(n, k):
        # if k == 0:   # <- remove these lines
        #    return 1  #
        t = k
        (n, k, previous_x) = (n - k + 1, 1, 1)
        for _i in xrange(1, t + 1):
            (n, k, previous_x) = step(n, k, previous_x)
        return previous_x  # = x_t

    Boom! That’s 100% iterative. But there’s more!

  10. Remove the secret feature from the step function.

    Since previous_x always gets a value now, make it a required part of the function and not an optional secret feature.

    def step(n, k, previous_x):  # <- remove default value for previous_x
        # if previous_x is None:                  # <- remove these 2 lines
        #    previous_x = binomial(n - 1, k - 1)  #
        x = n * previous_x // k
        return (n + 1, k + 1, x)
    
    def binomial(n, k):
        t = k
        (n, k, previous_x) = (n - k + 1, 1, 1)
        for _i in xrange(1, t + 1):
            (n, k, previous_x) = step(n, k, previous_x)
        return previous_x  # = x_t
  11. Inline the step function.

    We’re on it!

    def binomial(n, k):
        t = k
        (n, k, previous_x) = (n - k + 1, 1, 1)
        for _i in xrange(1, t + 1):
            x = n * previous_x // k
            (n, k, previous_x) = (n + 1, k + 1, x)
        return previous_x  # = x_t
  12. Apply finishing touches.

    This example is pretty tight already, but we can substitute away the x variable.

    And, because ki=i, we can replace the for loop’s induction variable _i with ki and correspondingly eliminate ki from the step logic.

    And, while we’re making stuff better, there’s an obvious optimization. One property of binomial coefficients is that nk=nnk. One property of our code is that it runs for t=k steps. So when k>nk, we can reduce the number of steps by solving for nnk instead. Let’s add a couple of lines at the start of the logic to claim this optimization.

    And, of course, let’s add a docstring – we’re nice like that.

    The final result:

    def binomial(n, k):
        """Compute binomial coefficient C(n, k) = n! / (k! * (n-k)!)."""
        if k > n - k:
            k = n - k   # 'pute C(n, n-k) instead if it's easier
        t = k
        (n, previous_x) = (n - k + 1, 1)
        for k in xrange(1, t + 1):
            (n, previous_x) = (n + 1, n * previous_x // k)
        return previous_x  # = x_t

Let’s review what we just did. It’s mind blowing:

  1. We sent an imaginary T-800 terminator unit back into the function’s timeline.
  2. Then we added a secret feature to our function that allowed the T-800 to send us values from the timeline’s past.
  3. Then we used those values to break the recursive chain, reverse the flow of time, and compute the timeline forward and iteratively instead of backward and recursively.
  4. The function being fully iterative then, we removed the secret feature, and the imaginary T-800 winked out of existence, because we had in effect written the need for them both out of history.
  5. And now we have a fast, efficient, and property-preserving iterative version of our original recursive function, and it’s about as good as anything we could hope to conjure up from scratch. (See it in action – just one stack frame and easy on the ints, too.)
  6. And – most importantly – we did it all using a mechanical, repeatable process!

Thanks!

Well, that’s it for this installment. I hope you enjoyed reading it as much as I did writing it. If you liked it (or didn’t), or if you found a mistake, or especially if you can think of any way to help make my explanations better, let me know. Just post a comment or fire me a tweet at @tmoertel.

Until next time, keep your brain recursive and your Python code iterative.



  1. In the visualization, you can’t actually see the largest integer produced by the recursive version’s computation. It’s produced between steps 32 and 33, when the return value from step 32 is multiplied by step 33’s n=39 to arrive at an integer of log2(3948903492)30.8 bits, which is then immediately divided by k=9 to produce the final answer.

comments powered by Disqus