Posted by Tom Moertel
Wed, 15 Aug 2007 20:07:00 GMT
The recent defacement of the United Nations web
site is a prime
example of why we programmers shouldn’t trust ourselves to write
secure code – at least not without our computers’ help. The U.N. web
site, according to Slashdot’s coverage of the incident, was defaced by
way of a common, well-known attack: SQL
injection. What’s
interesting is that programmers can render this attack harmless by
employing simple, readily available programming tools such as
placeholders and prepared statements. Why, then, are so many web sites,
including the UN site apparently, still vulnerable?
Some say it’s because the programmers of these sites are incompetent,
but that argument ignores that programmers are
human, while the security tools we give them offer meaningful
protection only if wielded with inhuman perfection. Having the tools
to plug security holes, even if the tools are simple to use and
readily available, is not enough to ensure that every single security
hole will be identified, let alone plugged. Even the most experienced
programmer can be expected to overlook a hole now and then.
Unfortunately, one hole is all it takes.
That’s because security is not like other software-quality challenges:
its costs are fundamentally asymmetric. For the attacker, the bad
guy, the challenge is to find just a single exploitable hole. For
us, the good guys, the challenge is to achieve perfection: to plug
all of the holes in our code, every single one. That’s because
attackers, unlike regular users, can be expected to probe our code
until they find a hole to exploit.
How then do we ensure that we have plugged every single hole in our
code? Testing isn’t sufficient: we can easily overlook holes
when writing tests – a perfectly human error. We could supplement
testing with code reviews, painstakingly searching for remaining holes
while enforcing the use of hole-preventing best practices, but reviews
are expensive and, again, subject to human error. A better approach,
both less costly and more reliable, is to delegate this burden to our
computers, which can do the job correctly, every single time.
This kind of delegation is possible today with modern static type systems.
For example, in A Type-Based Solution to the Strings
Problem,
I offered a tiny “safe strings” library for the Haskell programming
language. The library takes advantage of
Haskell’s powerful type system to detect unsafe string interactions at
compile time. If we faithfully build our code on top of the library, and our
code compiles without error, we can be assured that our code is
free – completely free – of SQL-injection (and XSS) holes.
While this result is indeed quite beautiful, it certainly isn’t novel.
Researchers have been proving interesting
properties via type systems for a long time. As Oleg Kiselyov and Chung-chieh Shan pointed out in a comment on my earlier article, the foundational idea is over three decades old.
More recently, Kiselyov and Shan have extended the
idea to guarantee more-interesting properties using a trusted kernel and types that represent
lightweight static
capabilities.
The kernel, which is small enough to be reasoned about and formally
verified, carefully hands out capabilities to untrusted application
code. The untrusted code, in turn, presents the capabilities back to
the kernel to invoke operations, which, thanks to the kernel’s
trustworthiness, are guaranteed to be safe. (My safe-string library
can be seen as a trivial implementation of this programming style.)
When static type systems are used in this way, they don’t merely catch
typos and bugs that good testing would have caught as a matter of
course, but offer programmers guarantees that would have been
impractical to obtain any other way.1 If
you consider security important, you might bear this fact in mind when
choosing languages for your next project.
Going further, the security benefits of rich static type systems are only now
starting to trickle into mainstream industry. As libraries like “safe
strings” and idioms like static capabilities become more familiar and
get woven into future generations of development frameworks, we can
expect marked improvements in the security and robustness of our
applications.
In the not-too-distant future, perhaps, we might look
back in amazement at the days when important security properties were
neither free nor guaranteed but expensive and uncertain, underwritten
only by the heroic efforts of individual programmers, struggling
against impossible odds to achieve inhuman perfection.
Then again, it sure took garbage collection a long time to catch on.
Posted in programming, web development, security
Tags capabilities, safestrings, security, sqlinjection, types, xss
9 comments
no trackbacks

Posted by Tom Moertel
Fri, 09 Feb 2007 20:36:00 GMT
In 2006’s most-read article on my blog, Never store passwords in
a
database!,
I urged web programmers, unsurprisingly, not to store passwords
in their user databases. I tried to persuade them to salt and hash
the passwords instead: store the salts and hashes in the database and
throw the passwords away. The article, posted shortly after the
Reddit blog announced the theft of its unprotected user
database, generated buckets of comments.
Reading over them today, I noticed something that I had missed
earlier.
It seems that a decent slice of programmers think that switching
to a salted-and-hashed password scheme implies giving up the ability to
assist users who have forgotten their passwords. If the
passwords are irretrievably hashed away, the programmers reason, there’s no way
to recover forgotten passwords and email them to stranded users.
Hence those users are screwed.
And that wrinkle, it might seem, is a good reason not to switch to a
salted-and-hashed password scheme.
But that wrinkle turns out to be imaginary. Not being able to recover
an account’s password does not mean that you can’t
recover the account itself. The password, after all, is not the thing
of value; the account is. And, as we shall see, we can recover
an account without knowing its password.
Recall that the primary benefit of using a hash is that it is a
one-way operation. Once you salt and hash a password, there is no
practical way to retrieve it. That’s what protects it from would-be
attackers. But that also means you can’t get at it, either.
Thus sending password reminders to people who have forgotten their
passwords is no longer an option.
How, then, can you help your stranded users? One method is to send
them account-recovery tokens, which you can think of as one-time,
special-purpose passwords. (This method is suitable only if you
require authentication no stronger than knowing that your site’s users own
the email addresses they claim to own. This is the case for most “low
security” sites such as Slashdot, Reddit, and Digg, as
well as most blogging systems.)
Here’s how it works. Say Joe has lost his password and can’t log in
to your site. He clicks that button that says “I’ve lost my
password. Help me!” Now what?
Here’s what you do:
- Generate a big, random, unique token and stuff it into Joe’s account record in the database. Stuff the current date and time in there, too.
- Send an email to Joe, but instead of enclosing his password (which you can’t recover), tell Joe to click on the enclosed account-recovery link, which includes the random token:
http://example.com/recover-account?token=pCIqq1unxntVqc8XtCXg.
- Joe receives the email and clicks on the link, which sends his token to your site.
- Look up the token in the user database. Is it there?
- No? Render a screen that says, “Sorry, bub, that token is no longer valid.” Stop.
- Yes? Excellent. Grab the user record associated with the token. (It will, of course, be Joe’s record.)
- Is the date and time stamp on that record more than a few hours old?
- Yes? Render that screen that says, “Sorry, bub, that token is no longer valid.” Stop.
- No? Congratulations. Joe has effectively authenticated himself via his email address.
- Render a confirmation screen that explains the following to Joe:
- His account password is going to be reset to the following random string: ocZodbew. (Generate a new random string each time.)
- If he likes the password, great. If not, he can use the change-password feature immediately after the password is reset.
- If he understands the above and wants to continue, he should confirm by clicking the big “Reset My Account Password” button.
- Joe clicks the button.
- You, in response, do the following:
- Delete the recovery token from Joe’s user record in the database. (This prevents somebody from using the old token to steal his account, should, for example, Joe’s email get stolen.)
- Replace Joe’s old password with the new, randomly generated password from above. (You will, of course, use the salted-and-hashed method and not store the new password itself.)
- Log Joe in.
- Render a screen saying, “Joe, please don’t forget that your new password is ocZodbew. If you would like to change it, just visit Change My Password in your account preferences [provide a link]. Otherwise, you’re logged in and ready to go. Enjoy the site!”
- And you’re done.
The code required to make it happen is shorter than the explanation
above. It’s one of those easier-done-than-said things.
So, if concerns about account recovery have been holding you
back from protecting your users’ passwords, you need hold back no
longer. It’s time to “do” your due diligence.
Update 2007-09-10: I made clear that the account-recovery method I
describe above is suitable only for low-security sites where
a valid email address is sufficient to authenticate users.
Posted in web development, security
Tags database, hash, passwords, recovery, risks, salt, security
15 comments
no trackbacks

Posted by Tom Moertel
Fri, 15 Dec 2006 18:25:00 GMT
Recently, the folks behind Reddit.com confessed
that a backup copy of their database had been
stolen. Later, spez, one of the Reddit
developers, confirmed
that the database contained password information for Reddit’s users,
and that the information was stored as plain, unprotected text.
In other words, once the thief had the database, he had everyone’s
passwords as well.
Had the folks at Reddit salted and hashed the
passwords and then stored the salts and resulting hashes in the database instead, the thief would now be in a very different
situation. Instead of holding all the keys to the kingdom, he would
face the prospect of a potentially expensive search for each and every
user’s password he wanted to extract from the database. The expense
of the search would likely have dissuaded him from making the attempt
in earnest, given how little exploitable value a Reddit account
represents. In short, the passwords would have been secure, even
though the database had fallen into the thief’s hands.
Why, then, didn’t Reddit’s programmers salt and hash the passwords? Because, according to the
earlier post by spez, they wanted to be able to send forgotten
passwords to users via email. It was a design decision: they
weighed the risks of having plain-as-day passwords in the database
against the convenience of being able to email users their forgotten
passwords and decided that, in the balance, convenience carried more
weight. It’s a decision they now regret. (It’s a doubly unfortunate
decision because the reasoning behind it is faulty: you don’t need to store passwords in your user database
in order to offer convenient account recovery.)
The reason I’m writing about this event isn’t to kick the
good folks at Reddit while they’re down. Rather, I’m trying to make a point:
If you are
storing passwords in a database, you are almost certainly making a
mistake.
The guys at Reddit are known for being smart. They thought they had a
good reason for storing passwords in their database. They
were wrong. If smart programmers can make this mistake, lots
of programmers can. Do you think you have a good reason for storing
passwords in your database? If so, you’re probably wrong, too.
How can I be so sure? Because, when it comes to web-app authentication,
cutting corners doesn’t buy you anything. It doesn’t save you coding time.
It doesn’t give your users a better experience. All it does is weaken the security of your web site, needlessly putting your users, your employer, and yourself at risk.
So please let me take this opportunity to ask if you
know of (or perhaps work on) any software systems that store passwords
in a database. If so, fix your
software now:
- Salt and hash each and every password (use an expensive hashing function such as bcrypt that was designed for password applications)
- Store the salt and
hash – not the password – in your database.
- Throw the password itself away.
You’ll be glad you did.
Update: Minor edits for clarity.
Update 2007-02-13: Salting and hashing does not get in the way of account recovery. You do not need to email users their forgotten passwords: there are other account-recovery options that are just as convenient but much more secure. See Don’t let password recovery keep you from protecting your users for more.
Update 2007-10-03: Revised text slightly to emphasize that there is no benefit to be had by implementing a weak password system, and therefore there is no reason not to implement a secure system. Pointed more directly to bcrypt, too.
Posted in web development, security
Tags hash, passwords, reddit, salt, security
55 comments
no trackbacks

Posted by Tom Moertel
Thu, 19 Oct 2006 01:40:00 GMT
Even skilled programmers have a hard time keeping their web
applications free of XSS and SQL-injection vulnerabilities. And it
shows: a sobering portion of web sites are open to some scary security threats.
Why are so many sites vulnerable to these well-known holes? Probably
because it’s insanely hard for programmers to solve the fundamental
“strings problem” at the heart of these vulnerabilities. The problem
itself is easy to understand, but we humans aren’t equipped to carry
out the solution. Simply put, we just plain suck at keeping a
bazillion different strings straight in our heads, let alone
consistently and reliably rendering their interactions safe whenever they
cross paths in a modern web application. It’s easy to say, “just
escape the darn things,” but it’s hard to get it right, every single time.
Computers, on the other hand, are pretty good at keeping track of
details by the bucket-full. Wouldn’t it be nice, then,
if our programming languages gave us the power to delegate this nasty “strings
problem” to our computers, which could then devote their unwavering mechanical precision to grinding the problem out of existence? Isn’t that the kind of thing modern programming languages are supposed to be good at?
I’d like to think the answer to that question is a big, you betcha.
So let’s grab a modern programming language and solve the strings problem.
Let’s solve the strings problem in Haskell
In this article, we will look at one way (among many) to solve the strings
problem: by adding Ruby-style string templates to Haskell. These
templates support “interpolation” via the usual, convenient #{var}
syntax, but here interpolation is type safe. Haskell’s type system
will prevent us from inadvertently mixing incompatible string types,
and it will detect mistakes at compile time, before they can become
live XSS or SQL-injection holes. Further, our solution will offer
us these benefits without making us jump through hoops or pay some
onerous syntax penalty.
To be more specific, the system offers the following benefits:
- It provides a string-management kernel that lets you create “safe strings” by certifying a regular string as representing either text or a fragment of a known language.
- It allows you to conveniently define new language types for any string-based language that you can provide an escaping rule for (e.g., XML, URLs, SQL, untrusted user input).
- It provides compile-time syntactic sugar (via Template Haskell) that makes working with safe strings as convenient as working with string interpolation in languages like Ruby and Perl.
- It catches and reports (at compile time) the following commonly made programming errors:
- failing to escape a plain-old-text string before mixing it into a string that represents a language fragment
- mixing strings that represent fragments of incompatible languages
- mixing strings that represent fragments of compatible languages in an ambiguous way (the system will force you to disambiguate)
(This is a long one, so grab an espresso, lean back, and read on in
style. Also, if you have a smoking jacket, you might want to get it now.)
Read more...
Posted in programming, programming languages, haskell, ruby, web development, testing, rails
Tags haskell, ruby, strings, testing, types
42 comments
no trackbacks

Posted by Tom Moertel
Wed, 08 Feb 2006 19:25:00 GMT
Via Simon Willison’s post about the 2006 Future of Web Apps Summit, I found notes for Ryan Carson’s talk about building web apps on a budget.
In the talk Ryan breaks down the budget for starting up DropSend:
| Budget (£) |
Need |
| 5,000 |
Branding & UI design |
| 8,500 |
Development of web app (developers also given small equity stake) |
| 2,750 |
Desktop apps (Windows and Mac) |
| 1,600 |
Building XHTML/CSS |
| 500 |
Hardware (internal development server) |
| 800 |
(per month) hosting and maintenance |
| 2,630 |
Legal fees |
| 500 |
Accounting fees |
| 500 |
Linux-specialist fees |
| 1,950 |
Misc. fees (trips, replace broken hardware) |
| 250 |
Trademark |
| 200 |
Merchant account |
| 500 |
Payment processor’s setup fee |
| 25,680 |
Total |
That is about $45K in US dollars. In other words, you can launch a new web application for less than a skilled technology worker’s salary. Or, if you are a skilled technology worker, you can do much of the work yourself and launch a new web application for about $25K.
Got an itch to scratch?
Posted in web development, business
1 comment
no trackbacks

Posted by Tom Moertel
Mon, 06 Feb 2006 22:52:00 GMT
In Everything old is new
again,
I wrote that I was moving articles from my old blog over to my new
Typo-powered blog (here). Now that the process is underway, I need to
make sure that people looking for my old articles can find them at
their new home. To solve this problem, I am using a simple Apache httpd
recipe to redirect requests for the old articles to the corresponding
updated articles on my new blog. In case you need to do something
similar some day, here is the recipe.
First, set up a mapping file
Create a two-column mapping file that you can use to map each
article’s old location to its corresponding new location. If there
are any parts of the locations that never change, you can factor them
out to reduce clutter.
For example, the article “My New Radio VCR” has the following old and
new locations (the constant parts are emphasized):
| Old |
= http://community.moertel.com/ss/space/2004-02-20 |
| New |
= http://blog.moertel.com/articles/2004/02/20/my-new-radio-vcr |
Its entry in my mapping file looks like this:
# File: /path/to/conf/old-blog-to-new.txt
# Map articles from old blog to new blog.
#
# OLD LOCATION NEW LOCATION
# .../ss/space/X http://blog.moertel.com/Y
... ...
2004-02-20 articles/2004/02/20/my-new-radio-vcr
... ...
Second, configure Apache to use the mapping file
Edit the Apache configuration that controls the old locations.
Add a set of
mod_rewrite rules to
match requests for the old locations and redirect them to the
corresponding new locations, using the mapping file as a reference. For example,
here is my configuration:
# in Apache's configuration for community.moertel.com
RewriteEngine on
RewriteMap blogmap txt:/path/to/conf/old-blog-to-new.txt
RewriteCond ${blogmap:$1|NOT-FOUND} !=NOT-FOUND
RewriteRule ^/ss/space/(.+) http://blog.moertel.com/${blogmap:$1} [R=301,L]
The first line makes sure that mod_rewrite is active.
The second line tells Apache to load the mapping file. Apache will
cache the mapping file’s contents for speed, but it is smart enough to
reload the file when modified. That means you can add new entries to
the mapping file at any time, and Apache will act on them immediately,
no restart or reload required. Every time I moved an article over to
the new blog, for example, I just edited the mapping file, and the new
location “went live,” replacing the old.
The third line says that the recipe is conditional upon
there being a matching entry in the mapping file. If no entry
exists, the recipe will not apply, and the request will be handled
as usual.
The final line defines the rewrite rule. In this example, it tries
to match requests that start with ”/ss/space/X_”, where _X is any
suffix. (The prefix “http://community.moertel.com” is implied because
this configuration is for the community.moertel.com site.) If the
request matches, X is stored in the $1 variable.
Then – and this is one of those things that makes mod_rewrite seem
tricky – the condition defined in the previous line is tested using the current
value of $1. If the condition is satisfied, the request
is redirected to http://blog.moertel.com/Y_, where _Y is the
corresponding location for X, according to the mapping file.
The [R=301,L] part of the rewrite rule is important. It
specifies that redirects should be of the 301-Permanent variety. This
advertises to the world that the new locations are intended to
replace the old locations. Using permanent redirects also ensures
that any Google juice that
may have accumulated for my articles follows them to their new
home.
Third, activate the new configuration
This part is easy: restart Apache to make sure it turns on the
rewrite engine and activates the new configuration directives.
Finally, test it out
To see if everything is working properly, visit an article’s
old location to see if you are redirected to the corresponding
new location. For example:
If you click on this link, you should be redirected to blog.moertel.com.
And that’s the recipe.
Posted in site news, web development
no comments
no trackbacks

Posted by Tom Moertel
Wed, 26 Oct 2005 20:51:00 GMT
Jakob Nielsen’s Alertbox for 17 October 2005
is Weblog Usability: The Top Ten Design Mistakes. In other words, it’s a top-ten list of things not to do on your blog if you care about usability.
Since I care about usability, I decided to test my own weblog against
Jakob’s top-ten list. How did I fare?
Let’s see:
Top Ten Weblog Usability Mistakes
- No author biography: Fail. Oops, got me there. While I have a bio on
the Community Projects site,
I don’t even link to it from my blog.
- No author photo: Fail. Oops, again. I don’t have a photo of myself on
the blog, nor on any other site (that I know of).
- Nondescript posting titles: Pass. I generally give my posts
informative titles such as
How to change symlinks atomically
and
Simple data formats are not going away.
Only rarely do I use cutesy
titles such as
On the effortless cultivation of humility, where readers are forced
to guess what the post is about.
- Links don’t say where they go: Pass. As a reader, I find “click
here” links to be annoying, and so I avoid the practice in my writing.
- Classic hits are buried: Pass. I link to popular topics in the
Popular Topics sidebar.
- The calendar is the only navigation: Pass. My posts are organized by topic as well as by date. Further, the ever-present live search makes finding posts by content easy.
- Irregular publishing frequency: Fail. I don’t have a regular posting schedule. When work gets heavy, for example, I rarely post.
- Mixing topics: Semi-Pass. I do mix up topics somewhat, but almost all of my topics fall into the category of “stuff programming geeks like” and in that regard are fairly consistent.
- Forgetting that you write for your future boss: Pass. I don’t think there is anything on my blog that a future employer would find troubling or even unprofessional. (Since I am a consultant, I have lots of “employers,” and so far none of them seem to mind what I post. Some – the crazy ones – even enjoy my blog.)
- Having a domain name owned by a weblog service: Pass. Since 1996 I have been keeping it real on moertel.com. My blog’s home is blog.moertel.com, which seems like the natural place for it.
In sum, I made three of Jakob’s top-ten weblog usability mistakes:
- I don’t have an author bio.
- I don’t have an author photo.
- I don’t post regularly.
The first two are easy to fix, and I’ll fix them right away. The
third – posting regularly – is more difficult, owing to the
ever-varying demands of my work load, but I’ll make an effort to pick
up the pace. Hopefully, my blog will 100 percent “Jakob compliant”
in the next day or so.
Do you have a weblog? If so, how many weblog usability mistakes do
you make? Grab Jakob’s top-ten list and find out.
Posted in site news, web development, usability
no comments
no trackbacks

Posted by Tom Moertel
Tue, 25 Oct 2005 20:12:00 GMT
The good folks at 37signals are once again up
in arms about Google Web
Accelerator (GWA). David
Heinemeier Hansson (DHH), in particular, writes in a recent post to
Signal vs. Noise that “[GWA] was evil
enough the first time around, but this time it’s downright scary.”
The problem, it seems, is that GWA automatically, silently, and
unblockably follows hypertext links to web pages that are linked to by
the pages you visit. It does this in order to cache those pages so
that if you visit them later, it will have cached copies ready in an
instant, thus “accelerating” your web surfing. But some web
developers use hypertext links to trigger potentially unsafe actions,
such as deleting records in a database, and when GWA automatically
follows such links, it triggers the actions.
Oops.
Let’s do the time warp again…
Now, if this story sounds familiar, that’s because half a year
ago, the exact same thing happened. GWA was unveiled to the public.
People started using it. And some of those people started losing data
from their accounts with popular web applications, such as
37signal’s own Backpack. 37signals
publicized the problem in their blog and DHH even
called for a recall on GWA.
And then the community responses came in. For the most part, the
responses could be divided into two camps, based on who was
blamed for the problem. The first camp blamed the web designers who used
links to trigger unsafe actions (in violation of applicable standards),
and the second camp blamed Google for unleashing GWA upon a web where
standards aren’t always followed.
Both viewpoints had some merit, but I was in the first camp and thus
argued for following the standards and against unsafe linking
practices:
What surprised me was that so many people in the second camp argued in
defense of unsafe linking practices, which I had thought indefensible.
I didn’t have any problem with arguments against Google’s unleashing
GWA on an imperfect web, but arguing for the web’s imperfections
seemed like an odd way of making the case. The supportive arguments
boiled down to the following:
- Lots of web sites use action-triggering links, so the practice is de facto acceptable.
- The existing palette of user-interface options is too limited for today’s web applications; thus, designers are justified in breaking the rules.
- The standards don’t actually prohibit the practice (they say “SHOULD NOT,” not “MUST NOT”); thus, the practice is allowable.
None of the arguments seem to withstand scrutiny. The first argument
breaks down like so: That lots of web sites do it only means that
those sites get away with it, not that the practice is acceptable.
Further, as GWA demonstrates, those sites may not get away with the
practice much longer.
The second argument breaks down when one examines the uses of unsafe
linking practices. Most of them could be replaced by safe practices
through modest UI refactoring. Given that safe alternatives exist,
the unsafe practices are not justified by virtue of being the only realistic option.
The third argument breaks down when one actually reads the relevant
standards. Then it becomes clear that one should not use links to
trigger potentially unsafe actions. The wiggle room created
by the use of “SHOULD NOT” instead of “MUST NOT” does not admit
the large problems caused by unsafe linking.
Finally, even if there were some justification for unsafe linking, the
practice would still be a bad idea: its costs and risks outweigh its
benefits. Why hold back the potential of efficient caching
architectures for the web? Why risk data loss for your users? It’s
not worth it.
Back to the Future
So where are we now? Given how little justification there is for
unsafe linking practices, one would hope that we would have abandoned
them by now. But, as the recent cries about the second coming of GWA
suggest, the web-development community is not yet ready to give up those
sexy, action-triggering links.
It’s not that the means aren’t available. Rails, for example, has
plenty of support for sane and safe practices for triggering actions.
Rather, the problem is cultural. Too many influential people,
especially in the Rails community, are unrepentant users of – and, dare
I say it, apologists for – action-triggering links. Until this changes, I
expect many new web developers to pick up dangerous habits from the
very people they respect most.
Fortunately, many other respect-worthy people are pointing toward
a better way:
- Sam Ruby: “I’m on the other side of this debate. While this appears to be a purely philosophical concern, in reality this stuff matters.”
- Bill de hÓra: “The GWA is back and following GET links again… The technology itself is interesting insofar as we are going to see more and more highly automated robots enter the web over the next few years…. Even more interesting is the kind of outrage holding forth in places like Signal v Noise….”
- Joe Gregorio : “And now we begin the next chapter in which Pooh discovers that five months after the first time Google turned on GWA that standards still matter.”
I hope that this time around the web-development community answers
the wake-up call. It’s time to abandon action-triggering links.
Posted in web development
Tags get, gwa, rails, rest, safe, unsafe
no comments
no trackbacks

Posted by Tom Moertel
Mon, 22 Aug 2005 16:00:00 GMT
Many people don’t realize that changing the target of a symbolic link (symlink) is
not an atomic operation. “Changing” a symlink really means deleting it
and creating a new link with the same file name. For example, if I have a
symlink current that points to a directory old, and I want to change
it to point to a directory new, I might use the following command:
$ ln -snf new current
Strace shows what really happens when I run the command:
$ strace ln -snf new current 2>&1 | grep link
unlink("current") = 0
symlink("new", "current") = 0
First, the existing symlink is deleted via the unlink system
call. Then a new, identically named symlink is created via the symlink
system call. It’s a two-step process, and in between the steps, there
is no symlink.
This can be a problem if you expect the symlink to be there always,
such as when using the link to point to the active version of a live
web site. If you change the symlink while deploying a new version of
your site, for example, the web server might try to dereference the
link during the small window of time when it doesn’t exist. Oops.
The solution to this problem is to effect the change by creating a new
symlink and then renaming it over the old symlink. On Unix-like
systems, renaming is an atomic operation, and thus the symlink
“change” will be atomic too. By hand, the process looks like this:
$ ln -s new current_tmp && mv -Tf current_tmp current
In Ruby, I make atomic symlinking available everywhere by extending
the Pathname class with a new method atomic_symlink:
require 'pathname'
class Pathname
def atomic_symlink(old)
suffix = [Array.new(6){rand(256).chr}.join].pack("m").strip.tr('/','_');
tmplink = Pathname.new(self.to_s + "_" + suffix)
tmplink.make_symlink(old)
begin
tmplink.rename(self)
rescue
File.unlink(tmplink.to_s)
raise
end
end
end
This code is nothing more than a robustified version of the by-hand
method. It picks better names for temporary links, and it cleans up
after itself, should something go wrong, but otherwise it does the
same thing.
Given how easy it is to change symlinks atomically, why do it any
other way? Life is hard enough without having to worry about another
race condition.
Posted in programming, ruby, web development
Tags ruby, safe, symlink
8 comments
no trackbacks

Posted by Tom Moertel
Thu, 16 Jun 2005 16:00:00 GMT
I am delighted to report that the button_to
helper
has been added to the Ruby on Rails
web-development framework. David
applied the patch earlier
today, and so button_to will be in the much-anticipated Rails 1.0
release.
David’s change-log entry summarizes the patch well:
Added button_to as a form-based solution to deal with harmful
actions that should be hidden behind POSTs. This makes it just as
easy as link_to to create a safe trigger for actions like destroy,
although it’s limited by being a block element, the fixed look,
and a no-no inside other forms.
David does a good job of highlighting the helper’s limitations. I’ll
take this opportunity to elaborate on each.
It is a block element
The button_to helper creates a small form, which in HTML is considered
block content, just
like the p, div, and blockquote elements are. Basically, block
content cannot be mixed into runs of text. But links can: links are
inline content. Thus
button_to cannot be used as a drop-in replacement for every
occurrence of link_to that might be unsafe; it works only for those
occurrences within block-accepting contexts.
Luckily for us, when designers use links to trigger unsafe actions,
they rarely slip such links into the middle of ordinary looking
text. Naughty uses of link_to almost always occur within contexts
that accept block content. In Rails-generated scaffolding code, for
instance, the unsafe uses of link_to occur within table cells, and
table cells have a flow content
model, which accepts
both inline and block content. So button_to works great for the
default cases in Rails.
It has a fixed look
As its name implies, button_to creates buttons. Buttons don’t look
like links and aren’t styled the same way that links are. For some
design scenarios, this might be a problem.
(My view is that links should not be used to trigger unsafe
actions. In the same way that action-triggering GET requests violate
the spirit of the HTTP standards, action-triggering hypertext links
violate the spirit of the HTML standards. For this reason, I view this
limitation as a feature.)
It is a no-no inside other forms
Forms cannot be nested, and so button_to cannot be used inside of
forms.
Fortunately, this limitation usually doesn’t matter because when we
are inside of a form, we can use its buttons instead of
button_to-created buttons to trigger actions. Still, there are some
circumstances where it does matter, such as the “Amazon.com wish list”
scenario. In this scenario, we should consider other
options.
The bottom line: Pick the low-hanging fruit
While button_to has its limitations, it does provide a simple solution
to the unsafe-GET problem for most real-world cases. I am glad that it
is now a part of Rails, and I offer a big thank-you to David for
accepting the patch.
Posted in web development, rails
Tags get, gwa, link_to, post, rails, safe, unsafe
no comments
no trackbacks
