Wondrous oddities: R's function-call semantics
Posted by Tom Moertel Fri, 20 Jan 2006 23:02:00 GMT
Every so often, I am going to write about wondrous oddities – obscure programming-language features that are so cool they deserve wider notice. Today, in the first installment, I want to show you the function-call semantics of R, a great system for statistical computing.
You might not expect a statistics system to have a first-class programming language at it’s heart, but if you think about it, it does make sense. The R language, actually a dialect of the S language, is described as “a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.” All true. It gives me the feeling of an infix Lisp or Scheme whose syntax is slanted toward mathematics and vector operations. The language has an object layer, too, but that’s not why we are here.
No, we are here to look at R’s uncommonly interesting function-call semantics, in particular argument binding and evaluation. Let’s dig in.
Flexible argument binding
Here is a simple function of two arguments:
f <- function(tens, ones = tens)
ones + 10 * tens
The function f has two formal arguments, tens and ones, the second of which has a default value, defined to be tens, referring back to the first argument. R lets you call the function like so, passing in arguments by position:
f(3, 4) # 34
But you can also specify arguments by name, in any order:
f(tens=3, ones=4) # 34
f(ones=4, tens=5) # 54
And, if you leave off the ones argument, it will get its value from tens because of its default definition:
f(3) # 33
f(tens=2) # 22
Up to this point, you’re probably thinking that this is nice and all, but not “wondrous oddity” material. Hold that thought for a moment.
Moving on, you can mix positional and named arguments and even shuffle the argument ordering:
f(tens=2, 6) # 26
f(6, tens=2) # 26
f(ones=9, tens=8) # 89
You can even abbreviate arguments:
f(tens=2, o=6) # 26
f(t=3, ones=9) # 39
f(o=9, t=4) # 49
To explore the full abbreviation semantics, we need a more complex function:
g <- function(ones=1, tens=2, hundreds=3, thousands=4)
ones + 10 * tens + 100 * hundreds + 1000 * thousands
You can call the function with no arguments, as expected:
g() # 4321
But you can’t get away with an ambiguous argument abbreviation:
g(t=0) # Error in g(t = 0) :
# argument 1 matches multiple formal arguments
So you must disambiguate:
g(te=0) # 4301
But, R is smart enough not to consider an abbreviation ambiguous if the ambiguity goes away when other arguments are matched exactly:
g(t=0, thousands=9) # 9301
Before we move on, let’s review R’s argument-binding features:
- you can pass arguments by position or by name
- you can omit arguments that have defaults
- you can abbreviate argument names
- you can use any combination of the above features, provided the combination results in no ambiguity
Lazy argument evaluation
Unlike most programming languages, R evaluates bound arguments lazily, meaning that the expressions you pass as arguments are not converted into values until needed. This lets you create functions that act like control structures. For example, the following function acts like an if-then-else control structure:
myif <- function(test, valT, valF)
if (test) valT else valF
myif(T, print("true"), print("false")) # prints "true"
myif(F, print("true"), print("false")) # prints "false"
Even though the valT and valF arguments are print statements, they are not evaluated until they are chosen by the test argument. The unchosen argument is not evaluated at all.
In contrast, most common languages evaluate arguments before passing them into functions. For example, Ruby:
# Ruby code
def myif(test, valT, valF)
if (test) then valT else valF; end
end
myif(true, puts("true"), puts("false"))
# prints true *and* false
Another benefit of R’s lazy argument evaluation is that you can provide mutually recursive defaults, which is a great way to implement adaptive interfaces. For example, here is a function that computes a coordinate’s representation in both Cartesian and polar coordinate systems. You can specify the input coordinate in either system, and the function adapts automatically:
# R code
polar <- function(x = r * cos(theta), y = r * sin(theta),
r = sqrt(x*x + y*y), theta = atan2(y, x))
c(x, y, r, theta)
polar(1,1) # provide (x,y) pair
# 1.0000000 1.0000000 1.4142136 0.7853982
polar(r=sqrt(2), theta=pi/4) # provide (r, theta) pair
# 1.0000000 1.0000000 1.4142136 0.7853982
Notice how there was no need for me to test the arguments to see how the function was called. All I did was define each set of argument defaults in terms of the other set of arguments. R can figure out the rest based on how the function is called. That’s programmer friendly.
Let’s review. R’s lazy argument evaluation provides cool benefits:
- you can define your own control structures
- you can provide mutually recursive defaults for arguments, which makes smart, flexible interfaces easy
- if you don’t use an argument, you don’t have to pay for R to evaluate it
Split-horizon scoping
R’s scoping rules give passed arguments and default values different perspectives – split horizons, if you will. Passed arguments see what was visible at the time of the call. No biggie here; every language works this way. Default values, on the other hand, see what is inside of the function as it evaluates. That means defaults have access to bound arguments and local variables, which means you can write functions whose defaults rely upon values computed in the function body.
This is a great feature that combines with R’s lazy argument binding to eliminate argument-handling logic. For example, a lot of R’s library code takes advantage of the following idiom:
myplot <- function(vals, ymin=bnds$ymin, ymax=bnds$ymax) {
bnds <- compute.bounds(vals)
# plot the values, constrained by ymin and ymax ...
}
The myplot function plots the values you pass it in vals. By default the function scales the plot to show all of the values. If you want, however, you can constrain the vertical extent of the plot by passing in ymin and/or ymax arguments. Note the refreshing lack of logic to handle the arguments. The code just gets down to business.
For comparison, here is a Ruby version of the function. When it comes to this kind of thing, Ruby is better than most mainstream languages, but it still makes us do about twice the work that R does:
def myplot(vals, ymin = nil, ymax = nil)
bnds = compute_bounds(vals)
ymin ||= bnds.ymin
ymax ||= bnds.ymax
# plot the values, constrained by ymin and ymax ...
end
To recap, R’s scoping rules, when combined with lazy argument evaluation, let you shave away tedious argument tests and placeholder defaults such as nil. Instead, you can focus on the core logic, letting R take care of the argument handling burdens. The win might seem small, but when you write a lot of code, the clarity and code reduction add up.
That’s it
So there you have it: a surprisingly sophisticated function-call semantics that does away with argument-handling tedium. That you’ll find it in a statistics system and not in a mainstream programming language makes it a wondrous oddity.
readers
Hi Tom. As a novice R programmer, I find your posts to be very interesting and informative.
And that said, I have a question.
I am trying to understand the “Split-horizon scoping” you explained at the end of the current post.
I do not understand the function you give. I can’t see any plot command in it. and nor do I see where the ymin, xmin arguments are being used inside the function. could you please explain more ?
Thanks, Tal.
Tal Galili wrote:
In the myplot example, the code that does the actual work of making the plot is omitted. Just assume that it’s a block of code that makes use of vals, ymin, ymax, and bnds. The point of that example is that the default values for ymin and ymax can look into the function body to use the value of bnds. There is no need to use placeholder values until the value of bnds is visible; it can be accessed directly.
More generally, in programming languages that support default values for function arguments, you must provide the default values in the form of expressions that are composed of other values. When writing those expressions, you are limited to using only those values that are “visible” to you at the time (i.e., “within scope”).
In most programming languages, default-value expressions can “see” only those values defined outside of the body of the function to which they are attached. They can’t see the values defined inside of the function body. In Ruby, for example, the default-value expression for x cannot make use of y, which is defined within the body of the function:
In R, however, default-value expressions can see into the function body (as well as outside):
So if you are coding a function in R and produce all sorts of interesting values within the body of the function, you can use them as default values for arguments – easily and directly.
Does this explanation clear things up?
thanks for this interesting article. Please do more!
Thanks for that articel, nice writing.
We need more of that stuff, the R documentation in the interwebs is far from being user-friendly, but the language has some cool stuff inside. However, most of the time i am frickeling with list(..) and c(..) and data.frame stuff to get it in the right form, but direct access seams to be difficult for me (how do you build iterative a list of lists? c() makes it too flat, list() makes it to steep).
One (newbish) thing i was missing here: Call by value and call by reference ( how is reference working? ) and the variable scope inside a function ( can access outer parameters AS LONG as we do not assign them ! ).
This is a fantastic article, I found it very informative and easy to read. I hope you do some more on R in the future.