A bright future: security and modern type systems

Posted by Tom Moertel Wed, 15 Aug 2007 20:07:00 GMT

The recent defacement of the United Nations web site is a prime example of why we programmers shouldn’t trust ourselves to write secure code – at least not without our computers’ help. The U.N. web site, according to Slashdot’s coverage of the incident, was defaced by way of a common, well-known attack: SQL injection. What’s interesting is that programmers can render this attack harmless by employing simple, readily available programming tools such as placeholders and prepared statements. Why, then, are so many web sites, including the UN site apparently, still vulnerable?

Some say it’s because the programmers of these sites are incompetent, but that argument ignores that programmers are human, while the security tools we give them offer meaningful protection only if wielded with inhuman perfection. Having the tools to plug security holes, even if the tools are simple to use and readily available, is not enough to ensure that every single security hole will be identified, let alone plugged. Even the most experienced programmer can be expected to overlook a hole now and then. Unfortunately, one hole is all it takes.

That’s because security is not like other software-quality challenges: its costs are fundamentally asymmetric. For the attacker, the bad guy, the challenge is to find just a single exploitable hole. For us, the good guys, the challenge is to achieve perfection: to plug all of the holes in our code, every single one. That’s because attackers, unlike regular users, can be expected to probe our code until they find a hole to exploit.

How then do we ensure that we have plugged every single hole in our code? Testing isn’t sufficient: we can easily overlook holes when writing tests – a perfectly human error. We could supplement testing with code reviews, painstakingly searching for remaining holes while enforcing the use of hole-preventing best practices, but reviews are expensive and, again, subject to human error. A better approach, both less costly and more reliable, is to delegate this burden to our computers, which can do the job correctly, every single time.

This kind of delegation is possible today with modern static type systems. For example, in A Type-Based Solution to the Strings Problem, I offered a tiny “safe strings” library for the Haskell programming language. The library takes advantage of Haskell’s powerful type system to detect unsafe string interactions at compile time. If we faithfully build our code on top of the library, and our code compiles without error, we can be assured that our code is free – completely free – of SQL-injection (and XSS) holes.

While this result is indeed quite beautiful, it certainly isn’t novel. Researchers have been proving interesting properties via type systems for a long time. As Oleg Kiselyov and Chung-chieh Shan pointed out in a comment on my earlier article, the foundational idea is over three decades old.

More recently, Kiselyov and Shan have extended the idea to guarantee more-interesting properties using a trusted kernel and types that represent lightweight static capabilities. The kernel, which is small enough to be reasoned about and formally verified, carefully hands out capabilities to untrusted application code. The untrusted code, in turn, presents the capabilities back to the kernel to invoke operations, which, thanks to the kernel’s trustworthiness, are guaranteed to be safe. (My safe-string library can be seen as a trivial implementation of this programming style.)

When static type systems are used in this way, they don’t merely catch typos and bugs that good testing would have caught as a matter of course, but offer programmers guarantees that would have been impractical to obtain any other way.1 If you consider security important, you might bear this fact in mind when choosing languages for your next project.

Going further, the security benefits of rich static type systems are only now starting to trickle into mainstream industry. As libraries like “safe strings” and idioms like static capabilities become more familiar and get woven into future generations of development frameworks, we can expect marked improvements in the security and robustness of our applications.

In the not-too-distant future, perhaps, we might look back in amazement at the days when important security properties were neither free nor guaranteed but expensive and uncertain, underwritten only by the heroic efforts of individual programmers, struggling against impossible odds to achieve inhuman perfection.

Then again, it sure took garbage collection a long time to catch on.


1. How, for example, could you eliminate the possibility of SQL-injection and XSS holes via testing?

I suppose you could do it if you worked at it hard enough. You could augment your string data structures with run-time information about what they represent: this string represents SQL, this string represents plain-old text, and so on. Then you could redefine your string operations and template interpolation systems to assert that their string inputs were compatible. Of course, if these assertions ever failed, they would do so only at run time, when it would be too late to do anything but die rather ungracefully. So you would be forced to augment your code-coverage tools to ensure that every string-path was covered during testing. That way you could catch all potential run-time string failures – indicating holes – during testing and eliminate the holes (and the subsequent need to fail at run time) before you deployed your application for real.

So, yes, you could do it. But to do so would require you, in effect, to write a crude, single-purpose type system that checks types at test time. That says something, doesn’t it?

Posted in , ,
Tags , , , , ,
9 comments
no trackbacks
Reddit Delicious

Don't let password recovery keep you from protecting your users

Posted by Tom Moertel Fri, 09 Feb 2007 20:36:00 GMT

In 2006’s most-read article on my blog, Never store passwords in a database!, I urged web programmers, unsurprisingly, not to store passwords in their user databases. I tried to persuade them to salt and hash the passwords instead: store the salts and hashes in the database and throw the passwords away. The article, posted shortly after the Reddit blog announced the theft of its unprotected user database, generated buckets of comments. Reading over them today, I noticed something that I had missed earlier.

It seems that a decent slice of programmers think that switching to a salted-and-hashed password scheme implies giving up the ability to assist users who have forgotten their passwords. If the passwords are irretrievably hashed away, the programmers reason, there’s no way to recover forgotten passwords and email them to stranded users. Hence those users are screwed.

And that wrinkle, it might seem, is a good reason not to switch to a salted-and-hashed password scheme.

But that wrinkle turns out to be imaginary. Not being able to recover an account’s password does not mean that you can’t recover the account itself. The password, after all, is not the thing of value; the account is. And, as we shall see, we can recover an account without knowing its password.

Recall that the primary benefit of using a hash is that it is a one-way operation. Once you salt and hash a password, there is no practical way to retrieve it. That’s what protects it from would-be attackers. But that also means you can’t get at it, either. Thus sending password reminders to people who have forgotten their passwords is no longer an option.

How, then, can you help your stranded users? One method is to send them account-recovery tokens, which you can think of as one-time, special-purpose passwords. (This method is suitable only if you require no stronger authentication than knowing that your site’s users own the email addresses they claim to own. This is the case for most “low security” sites such as Slashdot, Reddit, and Digg, as well as most blogging systems.)

Here’s how it works. Say Joe has lost his password and can’t log in to your site. He clicks that button that says “I’ve lost my password. Help me!” Now what?

Here’s what you do:

  1. Generate a big, random, unique token and stuff it into Joe’s account record in the database. Stuff the current date and time in there, too.
  2. Send an email to Joe, but instead of enclosing his password (which you can’t recover), tell Joe to click on the enclosed account-recovery link, which includes the random token: http://example.com/recover-account?token=pCIqq1unxntVqc8XtCXg.
  3. Joe receives the email and clicks on the link, which sends his token to your site.
  4. Look up the token in the user database. Is it there?
    1. No? Render a screen that says, “Sorry, bub, that token is no longer valid.” Stop.
    2. Yes? Excellent. Grab the user record associated with the token. (It will, of course, be Joe’s record.)
  5. Is the date and time stamp on that record more than a few hours old?
    1. Yes? Render that screen that says, “Sorry, bub, that token is no longer valid.” Stop.
    2. No? Congratulations. Joe has effectively authenticated himself via his email address.
  6. Render a confirmation screen that explains the following to Joe:
    1. His account password is going to be reset to the following random string: ocZodbew. (Generate a new random string each time.)
    2. If he likes the password, great. If not, he can use the change-password feature immediately after the password is reset.
    3. If he understands the above and wants to continue, he should confirm by clicking the big “Reset My Account Password” button.
  7. Joe clicks the button.
  8. You, in response, do the following:
    1. Delete the recovery token from Joe’s user record in the database. (This prevents somebody from using the old token to steal his account, should, for example, Joe’s email get stolen.)
    2. Replace Joe’s old password with the new, randomly generated password from above. (You will, of course, use the salted-and-hashed method and not store the new password itself.)
    3. Log Joe in.
    4. Render a screen saying, “Joe, please don’t forget that your new password is ocZodbew. If you would like to change it, just visit Change My Password in your account preferences [provide a link]. Otherwise, you’re logged in and ready to go. Enjoy the site!”
  9. And you’re done.

The code required to make it happen is shorter than the explanation above. It’s one of those easier-done-than-said things.

So, if concerns about account recovery have been holding you back from protecting your users’ passwords, you need hold back no longer. It’s time to “do” your due diligence.

Update 2007-09-10: I made clear that the account-recovery method I describe above is suitable only for low-security sites where a valid email address is sufficient to authenticate users.

Posted in ,
Tags , , , , , ,
13 comments
no trackbacks
Reddit Delicious

Never store passwords in a database!

Posted by Tom Moertel Fri, 15 Dec 2006 18:25:00 GMT

Recently, the folks behind Reddit.com confessed that a backup copy of their database had been stolen. Later, spez, one of the Reddit developers, confirmed that the database contained password information for Reddit’s users, and that the information was stored as plain, unprotected text. In other words, once the thief had the database, he had everyone’s passwords as well.

Had the folks at Reddit salted and hashed the passwords, the thief would now be in a very different situation. Instead of holding all the keys to the kingdom, he would face the prospect of a potentially expensive search for each and every user’s password he wanted to extract from the database. The expense of the search would likely have dissuaded him from making the attempt in earnest, given how little exploitable value a Reddit account represents. In short, the passwords would have been secure, even though the database had fallen into the thief’s hands.

Why, then, didn’t Reddit’s programmers salt and hash the passwords before storing them in their database? Because, according to the earlier post by spez, they wanted to be able to send forgotten passwords to users via email. It was a design decision: they weighed the risks of having plain-as-day passwords in the database against the convenience of being able to email users their forgotten passwords and decided that, in the balance, convenience carried more weight. It’s a decision they now regret. (It’s a doubly unfortunate decision because you don’t need to store passwords in your user database in order to offer convenient account recovery.)

The reason I’m writing about this event isn’t to kick the good folks at Reddit while they’re down. Rather, I’m trying to make a point:

If you are storing passwords in a database, you are almost certainly making a mistake.

The guys at Reddit are known for being smart. They thought they had a good reason for storing passwords in their database. They were wrong. If smart programmers can make this mistake, lots of programmers can. Do you think you have a good reason for storing passwords in your database? If so, you’re probably wrong, too.

How can I be so sure? Because, when it comes to web-app authentication, cutting corners doesn’t buy you anything. It doesn’t save you coding time. It doesn’t give your users a better experience. All it does is weaken the security of your web application, needlessly putting your users, your employer, and yourself at risk.

So please let me take this opportunity to ask if you know of (or perhaps work on) any software systems that store passwords as plain, unprotected text in a database. If so, fix your software now:

  • Salt and hash each and every password (use an expensive hashing function such as bcrypt that was designed for password applications)
  • Store the salt and hash – not the password – in your database.
  • Throw the password itself away.

You’ll be glad you did.

Update: Minor edits for clarity.

Update 2007-02-13: Salting and hashing does not get in the way of account recovery. You do not need to email users their forgotten passwords: there are other account-recovery options that are just as convenient but much more secure. See Don’t let password recovery keep you from protecting your users for more.

Update 2007-10-03: Revised text slightly to emphasize that there is no benefit to be had by implementing a weak password system, and therefore there is no reason not to implement a secure system. Pointed more directly to bcrypt, too.

Posted in ,
Tags , , , ,
51 comments
no trackbacks
Reddit Delicious

A type-based solution to the "strings problem": a fitting end to XSS and SQL-injection holes?

Posted by Tom Moertel Thu, 19 Oct 2006 01:40:00 GMT

Even skilled programmers have a hard time keeping their web applications free of XSS and SQL-injection vulnerabilities. And it shows: a sobering portion of web sites are open to some scary security threats.

Why are so many sites vulnerable to these well-known holes? Probably because it’s insanely hard for programmers to solve the fundamental “strings problem” at the heart of these vulnerabilities. The problem itself is easy to understand, but we humans aren’t equipped to carry out the solution. Simply put, we just plain suck at keeping a bazillion different strings straight in our heads, let alone consistently and reliably rendering their interactions safe whenever they cross paths in a modern web application. It’s easy to say, “just escape the little buggers,” but it’s hard to get it right, every single time.

Computers, on the other hand, are pretty good at keeping track of details by the bucket-full. Wouldn’t it be nice, then, if our programming languages gave us the power to delegate this nasty “strings problem” to our computers, which could then devote their unwavering mechanical precision to grinding the problem out of existence? Isn’t that the kind of thing modern programming languages are supposed to be good at?

I’d like to think the answer to that question is a big, you betcha.

So let’s grab a modern programming language and solve the strings problem.

Let’s solve the strings problem in Haskell

In this article, we will look at one way (among many) to solve the strings problem: by adding Ruby-style string templates to Haskell. These templates support “interpolation” via the usual, convenient #{var} syntax, but here interpolation is type safe. Haskell’s type system will prevent us from inadvertently mixing incompatible string types, and it will detect mistakes at compile time, before they can become live XSS or SQL-injection holes. Further, our solution will offer us these benefits without making us jump through hoops or pay some onerous syntax penalty.

To be more specific, the system offers the following benefits:

  • It provides a string-management kernel that lets you create “safe strings” by certifying a regular string as representing either text or a fragment of a known language.
  • It allows you to conveniently define new language types for any string-based language that you can provide an escaping rule for (e.g., XML, URLs, SQL, untrusted user input).
  • It provides compile-time syntactic sugar (via Template Haskell) that makes working with safe strings as convenient as working with string interpolation in languages like Ruby and Perl.
  • It catches and reports (at compile time) the following commonly made programming errors:
    • failing to escape a plain-old-text string before mixing it into a string that represents a language fragment
    • mixing strings that represent fragments of incompatible languages
    • mixing strings that represent fragments of compatible languages in an ambiguous way (the system will force you to disambiguate)

(This is a long one, so grab an espresso, lean back, and read on in style. Also, if you have a smoking jacket, you might want to get it now.)

Read more...

Posted in , , , , , ,
Tags , , , ,
37 comments
no trackbacks
Reddit Delicious

Ryan Carson: Building web applications on a budget

Posted by Tom Moertel Wed, 08 Feb 2006 19:25:00 GMT

Via Simon Willison’s post about the 2006 Future of Web Apps Summit, I found notes for Ryan Carson’s talk about building web apps on a budget. In the talk Ryan breaks down the budget for starting up DropSend:

Budget (£) Need
5,000 Branding & UI design
8,500 Development of web app (developers also given small equity stake)
2,750 Desktop apps (Windows and Mac)
1,600 Building XHTML/CSS
500 Hardware (internal development server)
800 (per month) hosting and maintenance
2,630 Legal fees
500 Accounting fees
500 Linux-specialist fees
1,950 Misc. fees (trips, replace broken hardware)
250 Trademark
200 Merchant account
500 Payment processor’s setup fee
25,680 Total

That is about $45K in US dollars. In other words, you can launch a new web application for less than a skilled technology worker’s salary. Or, if you are a skilled technology worker, you can do much of the work yourself and launch a new web application for about $25K.

Got an itch to scratch?

Posted in ,
1 comment
no trackbacks
Reddit Delicious

A simple Apache recipe for migrating blog articles to a new host

Posted by Tom Moertel Mon, 06 Feb 2006 22:52:00 GMT

In Everything old is new again, I wrote that I was moving articles from my old blog over to my new Typo-powered blog (here). Now that the process is underway, I need to make sure that people looking for my old articles can find them at their new home. To solve this problem, I am using a simple Apache httpd recipe to redirect requests for the old articles to the corresponding updated articles on my new blog. In case you need to do something similar some day, here is the recipe.

First, set up a mapping file

Create a two-column mapping file that you can use to map each article’s old location to its corresponding new location. If there are any parts of the locations that never change, you can factor them out to reduce clutter.

For example, the article “My New Radio VCR” has the following old and new locations (the constant parts are emphasized):

Old = http://community.moertel.com/ss/space/2004-02-20
New = http://blog.moertel.com/articles/2004/02/20/my-new-radio-vcr

Its entry in my mapping file looks like this:

# File: /path/to/conf/old-blog-to-new.txt
# Map articles from old blog to new blog.
#
# OLD LOCATION    NEW LOCATION
# .../ss/space/X  http://blog.moertel.com/Y

...               ...
2004-02-20        articles/2004/02/20/my-new-radio-vcr
...               ...

Second, configure Apache to use the mapping file

Edit the Apache configuration that controls the old locations. Add a set of mod_rewrite rules to match requests for the old locations and redirect them to the corresponding new locations, using the mapping file as a reference. For example, here is my configuration:

# in Apache's configuration for community.moertel.com

RewriteEngine on
RewriteMap blogmap txt:/path/to/conf/old-blog-to-new.txt
RewriteCond ${blogmap:$1|NOT-FOUND} !=NOT-FOUND
RewriteRule ^/ss/space/(.+) http://blog.moertel.com/${blogmap:$1} [R=301,L]

The first line makes sure that mod_rewrite is active.

The second line tells Apache to load the mapping file. Apache will cache the mapping file’s contents for speed, but it is smart enough to reload the file when modified. That means you can add new entries to the mapping file at any time, and Apache will act on them immediately, no restart or reload required. Every time I moved an article over to the new blog, for example, I just edited the mapping file, and the new location “went live,” replacing the old.

The third line says that the recipe is conditional upon there being a matching entry in the mapping file. If no entry exists, the recipe will not apply, and the request will be handled as usual.

The final line defines the rewrite rule. In this example, it tries to match requests that start with ”/ss/space/X_”, where _X is any suffix. (The prefix “http://community.moertel.com” is implied because this configuration is for the community.moertel.com site.) If the request matches, X is stored in the $1 variable. Then – and this is one of those things that makes mod_rewrite seem tricky – the condition defined in the previous line is tested using the current value of $1. If the condition is satisfied, the request is redirected to http://blog.moertel.com/Y_, where _Y is the corresponding location for X, according to the mapping file.

The [R=301,L] part of the rewrite rule is important. It specifies that redirects should be of the 301-Permanent variety. This advertises to the world that the new locations are intended to replace the old locations. Using permanent redirects also ensures that any Google juice that may have accumulated for my articles follows them to their new home.

Third, activate the new configuration

This part is easy: restart Apache to make sure it turns on the rewrite engine and activates the new configuration directives.

Finally, test it out

To see if everything is working properly, visit an article’s old location to see if you are redirected to the corresponding new location. For example:

If you click on this link, you should be redirected to blog.moertel.com.

And that’s the recipe.

Posted in ,
no comments
no trackbacks
Reddit Delicious

Top-ten weblog usability mistakes: My blog's scorecard

Posted by Tom Moertel Wed, 26 Oct 2005 20:51:00 GMT

Jakob Nielsen’s Alertbox for 17 October 2005 is Weblog Usability: The Top Ten Design Mistakes. In other words, it’s a top-ten list of things not to do on your blog if you care about usability.

Since I care about usability, I decided to test my own weblog against Jakob’s top-ten list. How did I fare?

Let’s see:

Top Ten Weblog Usability Mistakes
  1. No author biography: Fail. Oops, got me there. While I have a bio on the Community Projects site, I don’t even link to it from my blog.
  2. No author photo: Fail. Oops, again. I don’t have a photo of myself on the blog, nor on any other site (that I know of).
  3. Nondescript posting titles: Pass. I generally give my posts informative titles such as How to change symlinks atomically and Simple data formats are not going away. Only rarely do I use cutesy titles such as On the effortless cultivation of humility, where readers are forced to guess what the post is about.
  4. Links don’t say where they go: Pass. As a reader, I find “click here” links to be annoying, and so I avoid the practice in my writing.
  5. Classic hits are buried: Pass. I link to popular topics in the Popular Topics sidebar.
  6. The calendar is the only navigation: Pass. My posts are organized by topic as well as by date. Further, the ever-present live search makes finding posts by content easy.
  7. Irregular publishing frequency: Fail. I don’t have a regular posting schedule. When work gets heavy, for example, I rarely post.
  8. Mixing topics: Semi-Pass. I do mix up topics somewhat, but almost all of my topics fall into the category of “stuff programming geeks like” and in that regard are fairly consistent.
  9. Forgetting that you write for your future boss: Pass. I don’t think there is anything on my blog that a future employer would find troubling or even unprofessional. (Since I am a consultant, I have lots of “employers,” and so far none of them seem to mind what I post. Some – the crazy ones – even enjoy my blog.)
  10. Having a domain name owned by a weblog service: Pass. Since 1996 I have been keeping it real on moertel.com. My blog’s home is blog.moertel.com, which seems like the natural place for it.

In sum, I made three of Jakob’s top-ten weblog usability mistakes:

  1. I don’t have an author bio.
  2. I don’t have an author photo.
  3. I don’t post regularly.

The first two are easy to fix, and I’ll fix them right away. The third – posting regularly – is more difficult, owing to the ever-varying demands of my work load, but I’ll make an effort to pick up the pace. Hopefully, my blog will 100 percent “Jakob compliant” in the next day or so.

Do you have a weblog? If so, how many weblog usability mistakes do you make? Grab Jakob’s top-ten list and find out.

Posted in , ,
no comments
no trackbacks
Reddit Delicious

Google Web Accelerator vs. unsafe linking: Round Two!

Posted by Tom Moertel Tue, 25 Oct 2005 20:12:00 GMT

The good folks at 37signals are once again up in arms about Google Web Accelerator (GWA). David Heinemeier Hansson (DHH), in particular, writes in a recent post to Signal vs. Noise that “[GWA] was evil enough the first time around, but this time it’s downright scary.”

The problem, it seems, is that GWA automatically, silently, and unblockably follows hypertext links to web pages that are linked to by the pages you visit. It does this in order to cache those pages so that if you visit them later, it will have cached copies ready in an instant, thus “accelerating” your web surfing. But some web developers use hypertext links to trigger potentially unsafe actions, such as deleting records in a database, and when GWA automatically follows such links, it triggers the actions.

Oops.

Let’s do the time warp again…

Now, if this story sounds familiar, that’s because half a year ago, the exact same thing happened. GWA was unveiled to the public. People started using it. And some of those people started losing data from their accounts with popular web applications, such as 37signal’s own Backpack. 37signals publicized the problem in their blog and DHH even called for a recall on GWA.

And then the community responses came in. For the most part, the responses could be divided into two camps, based on who was blamed for the problem. The first camp blamed the web designers who used links to trigger unsafe actions (in violation of applicable standards), and the second camp blamed Google for unleashing GWA upon a web where standards aren’t always followed.

Both viewpoints had some merit, but I was in the first camp and thus argued for following the standards and against unsafe linking practices:

What surprised me was that so many people in the second camp argued in defense of unsafe linking practices, which I had thought indefensible. I didn’t have any problem with arguments against Google’s unleashing GWA on an imperfect web, but arguing for the web’s imperfections seemed like an odd way of making the case. The supportive arguments boiled down to the following:

  • Lots of web sites use action-triggering links, so the practice is de facto acceptable.
  • The existing palette of user-interface options is too limited for today’s web applications; thus, designers are justified in breaking the rules.
  • The standards don’t actually prohibit the practice (they say “SHOULD NOT,” not “MUST NOT”); thus, the practice is allowable.

None of the arguments seem to withstand scrutiny. The first argument breaks down like so: That lots of web sites do it only means that those sites get away with it, not that the practice is acceptable. Further, as GWA demonstrates, those sites may not get away with the practice much longer.

The second argument breaks down when one examines the uses of unsafe linking practices. Most of them could be replaced by safe practices through modest UI refactoring. Given that safe alternatives exist, the unsafe practices are not justified by virtue of being the only realistic option.

The third argument breaks down when one actually reads the relevant standards. Then it becomes clear that one should not use links to trigger potentially unsafe actions. The wiggle room created by the use of “SHOULD NOT” instead of “MUST NOT” does not admit the large problems caused by unsafe linking.

Finally, even if there were some justification for unsafe linking, the practice would still be a bad idea: its costs and risks outweigh its benefits. Why hold back the potential of efficient caching architectures for the web? Why risk data loss for your users? It’s not worth it.

Back to the Future

So where are we now? Given how little justification there is for unsafe linking practices, one would hope that we would have abandoned them by now. But, as the recent cries about the second coming of GWA suggest, the web-development community is not yet ready to give up those sexy, action-triggering links.

It’s not that the means aren’t available. Rails, for example, has plenty of support for sane and safe practices for triggering actions. Rather, the problem is cultural. Too many influential people, especially in the Rails community, are unrepentant users of – and, dare I say it, apologists for – action-triggering links. Until this changes, I expect many new web developers to pick up dangerous habits from the very people they respect most.

Fortunately, many other respect-worthy people are pointing toward a better way:

  • Sam Ruby: “I’m on the other side of this debate. While this appears to be a purely philosophical concern, in reality this stuff matters.”
  • Bill de hÓra: “The GWA is back and following GET links again… The technology itself is interesting insofar as we are going to see more and more highly automated robots enter the web over the next few years…. Even more interesting is the kind of outrage holding forth in places like Signal v Noise….”
  • Joe Gregorio : “And now we begin the next chapter in which Pooh discovers that five months after the first time Google turned on GWA that standards still matter.”

I hope that this time around the web-development community answers the wake-up call. It’s time to abandon action-triggering links.

Posted in
Tags , , , , ,
no comments
no trackbacks
Reddit Delicious

How to change symlinks atomically

Posted by Tom Moertel Mon, 22 Aug 2005 16:00:00 GMT

Many people don’t realize that changing the target of a symbolic link (symlink) is not an atomic operation. “Changing” a symlink really means deleting it and creating a new link with the same file name. For example, if I have a symlink current that points to a directory old, and I want to change it to point to a directory new, I might use the following command:

$ ln -snf new current

Strace shows what really happens when I run the command:

$ strace ln -snf new current 2>&1 | grep link
unlink("current")         = 0
symlink("new", "current") = 0

First, the existing symlink is deleted via the unlink system call. Then a new, identically named symlink is created via the symlink system call. It’s a two-step process, and in between the steps, there is no symlink.

This can be a problem if you expect the symlink to be there always, such as when using the link to point to the active version of a live web site. If you change the symlink while deploying a new version of your site, for example, the web server might try to dereference the link during the small window of time when it doesn’t exist. Oops.

The solution to this problem is to effect the change by creating a new symlink and then renaming it over the old symlink. On Unix-like systems, renaming is an atomic operation, and thus the symlink “change” will be atomic too. By hand, the process looks like this:

$ ln -s new current_tmp && mv -Tf current_tmp current

In Ruby, I make atomic symlinking available everywhere by extending the Pathname class with a new method atomic_symlink:

require 'pathname'

class Pathname
  def atomic_symlink(old)
    suffix = [Array.new(6){rand(256).chr}.join].pack("m").strip.tr('/','_');
    tmplink = Pathname.new(self.to_s + "_" + suffix)
    tmplink.make_symlink(old)
    begin
      tmplink.rename(self)
    rescue
      # if rename fails, we must remove the temporary link manually
      File.unlink(tmplink.to_s)
      raise
    end
  end
end

This code is nothing more than a robustified version of the by-hand method. It picks better names for temporary links, and it cleans up after itself, should something go wrong, but otherwise it does the same thing.

Given how easy it is to change symlinks atomically, why do it any other way? Life is hard enough without having to worry about another race condition.

Posted in , ,
Tags , ,
5 comments
no trackbacks
Reddit Delicious

The button_to helper is now part of Rails!

Posted by Tom Moertel Thu, 16 Jun 2005 16:00:00 GMT

I am delighted to report that the button_to helper has been added to the Ruby on Rails web-development framework. David applied the patch earlier today, and so button_to will be in the much-anticipated Rails 1.0 release.

David’s change-log entry summarizes the patch well:

Added button_to as a form-based solution to deal with harmful actions that should be hidden behind POSTs. This makes it just as easy as link_to to create a safe trigger for actions like destroy, although it’s limited by being a block element, the fixed look, and a no-no inside other forms.

David does a good job of highlighting the helper’s limitations. I’ll take this opportunity to elaborate on each.

It is a block element

The button_to helper creates a small form, which in HTML is considered block content, just like the p, div, and blockquote elements are. Basically, block content cannot be mixed into runs of text. But links can: links are inline content. Thus button_to cannot be used as a drop-in replacement for every occurrence of link_to that might be unsafe; it works only for those occurrences within block-accepting contexts.

Luckily for us, when designers use links to trigger unsafe actions, they rarely slip such links into the middle of ordinary looking text. Naughty uses of link_to almost always occur within contexts that accept block content. In Rails-generated scaffolding code, for instance, the unsafe uses of link_to occur within table cells, and table cells have a flow content model, which accepts both inline and block content. So button_to works great for the default cases in Rails.

It has a fixed look

As its name implies, button_to creates buttons. Buttons don’t look like links and aren’t styled the same way that links are. For some design scenarios, this might be a problem.

(My view is that links should not be used to trigger unsafe actions. In the same way that action-triggering GET requests violate the spirit of the HTTP standards, action-triggering hypertext links violate the spirit of the HTML standards. For this reason, I view this limitation as a feature.)

It is a no-no inside other forms

Forms cannot be nested, and so button_to cannot be used inside of forms.

Fortunately, this limitation usually doesn’t matter because when we are inside of a form, we can use its buttons instead of button_to-created buttons to trigger actions. Still, there are some circumstances where it does matter, such as the “Amazon.com wish list” scenario. In this scenario, we should consider other options.

The bottom line: Pick the low-hanging fruit

While button_to has its limitations, it does provide a simple solution to the unsafe-GET problem for most real-world cases. I am glad that it is now a part of Rails, and I offer a big thank-you to David for accepting the patch.

Posted in ,
Tags , , , , , ,
no comments
no trackbacks
Reddit Delicious

Older posts: 1 2