Wondrous oddities: R's function-call semantics

Posted on January 20, 2006

Every so often, I am going to write about wondrous oddities – obscure programming-language features that are so cool they deserve wider notice. Today, in the first installment, I want to show you the function-call semantics of R, a great system for statistical computing.

You might not expect a statistics system to have a first-class programming language at it’s heart, but if you think about it, it does make sense. The R language, actually a dialect of the S language, is described as “a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.” All true. It gives me the feeling of an infix Lisp or Scheme whose syntax is slanted toward mathematics and vector operations. The language has an object layer, too, but that’s not why we are here.

No, we are here to look at R’s uncommonly interesting function-call semantics, in particular argument binding and evaluation. Let’s dig in.

Flexible argument binding

Here is a simple function of two arguments:

f <- function(tens, ones = tens)
    ones + 10 * tens

The function f has two formal arguments, tens and ones, the second of which has a default value, defined to be tens, referring back to the first argument. R lets you call the function like so, passing in arguments by position:

f(3, 4)  # 34

But you can also specify arguments by name, in any order:

f(tens=3, ones=4)  # 34
f(ones=4, tens=5)  # 54

And, if you leave off the ones argument, it will get its value from tens because of its default definition:

f(3)       # 33
f(tens=2)  # 22

At this point, you’re probably thinking that this is nice and all, but not “wondrous oddity” material. Hold that thought for a moment.

Moving on, you can mix positional and named arguments and even shuffle the argument ordering:

f(tens=2, 6)       # 26
f(6, tens=2)       # 26
f(ones=9, tens=8)  # 89

You can even abbreviate arguments:

f(tens=2, o=6)  # 26
f(t=3, ones=9)  # 39
f(o=9, t=4)     # 49

To explore the full abbreviation semantics, we need a more complex function:

g <- function(ones=1, tens=2, hundreds=3, thousands=4)
    ones + 10 * tens + 100 * hundreds + 1000 * thousands

You can call the function with no arguments, as expected:

g()  # 4321

But you can’t get away with an ambiguous argument abbreviation:

g(t=0) # Error in g(t = 0) :
       # argument 1 matches multiple formal arguments

So you must disambiguate:

g(te=0) # 4301

But, R is smart enough not to consider an abbreviation ambiguous if the ambiguity goes away when other arguments are matched exactly:

g(t=0, thousands=9) # 9301

Before we move on, let’s review R’s argument-binding features:

you can pass arguments by position or by name
you can omit arguments that have defaults
you can abbreviate argument names
you can use any combination of the above features, provided the combination results in no ambiguity

Lazy argument evaluation

Unlike most programming languages, R evaluates bound arguments lazily, meaning that the expressions you pass as arguments are not converted into values until needed. This lets you create functions that act like control structures. For example, the following function acts like an if-then-else control structure:

myif <- function(test, valT, valF)
    if (test) valT else valF

myif(T, print("true"), print("false"))  # prints "true"
myif(F, print("true"), print("false"))  # prints "false"

Even though the valT and valF arguments are print statements, they are not evaluated until they are chosen by the test argument. The unchosen argument is not evaluated at all.

In contrast, most common languages evaluate arguments before passing them into functions. For example, Ruby:

# Ruby code

def myif(test, valT, valF)
  if (test) then valT else valF; end
end

myif(true, puts("true"), puts("false"))
# prints true *and* false

Another benefit of R’s lazy argument evaluation is that you can provide mutually recursive defaults, which is a great way to implement adaptive interfaces. For example, here is a function that computes a point’s representation in both Cartesian and polar coordinate systems. You can specify the input point in either system, and the function adapts automatically:

# R code

polar <- function(x = r * cos(theta), y = r * sin(theta),
                  r = sqrt(x*x + y*y), theta = atan2(y, x))
    c(x, y, r, theta)

polar(1,1)                    # provide (x,y) pair
# 1.0000000 1.0000000 1.4142136 0.7853982

polar(r=sqrt(2), theta=pi/4)  # provide (r, theta) pair
# 1.0000000 1.0000000 1.4142136 0.7853982

Notice how there was no need for me to test the arguments to see how the function was called. All I did was define each set of argument defaults in terms of the other set of arguments. R can figure out the rest based on how the function is called. That’s programmer friendly.

Let’s review. R’s lazy argument evaluation provides cool benefits:

you can define your own control structures
you can provide mutually recursive defaults for arguments, which makes smart, flexible interfaces easy
if you don’t use an argument, you don’t have to pay for R to evaluate it

Split-horizon scoping

R’s scoping rules give passed arguments and default values different perspectives – split horizons, if you will. Passed arguments see what was visible at the time of the call. No biggie here; every language works this way. Default values, on the other hand, see what is inside of the function as it evaluates. That means defaults have access to bound arguments and local variables, which means you can write functions whose defaults rely upon values computed in the function body.

This is a great feature that combines with R’s lazy argument binding to eliminate argument-handling logic. For example, a lot of R’s library code takes advantage of the following idiom:

myplot <- function(vals, ymin=bnds$ymin, ymax=bnds$ymax) {
    bnds <- compute.bounds(vals)
    # plot the values, constrained by ymin and ymax ...
}

The myplot function plots the values you pass it in vals. By default the function scales the plot to show all of the values. If you want, however, you can constrain the vertical extent of the plot by passing in ymin and/or ymax arguments. Note the refreshing lack of logic to handle the arguments. The code just gets down to business.

For comparison, here is a Ruby version of the function. When it comes to this kind of thing, Ruby is better than most mainstream languages, but it still makes us do about twice the work that R does:

def myplot(vals, ymin = nil, ymax = nil)
  bnds = compute_bounds(vals)
  ymin ||= bnds.ymin
  ymax ||= bnds.ymax
  # plot the values, constrained by ymin and ymax ...
end

To recap, R’s scoping rules, when combined with lazy argument evaluation, let you shave away tedious argument tests and placeholder defaults such as nil. Instead, you can focus on the core logic, letting R take care of the argument handling burdens. The win might seem small, but when you write a lot of code, the clarity and code reduction add up.

That’s it

So there you have it: a surprisingly sophisticated function-call semantics that does away with argument-handling tedium. That you’ll find it in a statistics system and not in a mainstream programming language makes it a wondrous oddity.