Database connection leak in Typo 4.0.3: problem solved

Posted by Tom Moertel Thu, 24 Aug 2006 19:41:00 GMT

In an earlier post I wrote about stability problems that have plagued my blog since upgrading from Typo 4.0.0 to 4.0.3. I have finally traced the problem to its source, and here’s the deal:

If you’re serving Typo up via Mongrel, do not configure ActiveRecord to allow concurrency.

One of the changes between Typo 4.0.0 and 4.0.3 is this addition to the environment.rb file:

config.active_record.allow_concurrency = true

Comment out this line, restart Typo, and the problem is solved. Apply Changeset 1255, and the problem is solved. (See Update 2, below.)

Discussion

When ActiveRecord::Base.allow_concurrency is set to true, AR will give each thread its own database connections and cache them in thread-localized storage. The idea is that, in a multi-threaded environment, this simple policy prevents unsafe interactions between threads and the database. (Imagine what would happen if one thread “borrowed” a connection over which another thread had opened a transaction. Oops, there goes transactional isolation.)

This policy, however, does place a burden on the owner of the threads to make sure that each thread’s local connection cache is cleared when the thread is joined, a burden that is not, it would seem, being carried by Typo under Mongrel. As a result, Typo rapidly chews through the allotment of file descriptors that the operating system kindly had reserved for Mongrel:

Typo 4.0.3 on Mongrel w/ SQLite3 consumes about 1.7 file descriptors per minute when ActiveRecord is configured to allow concurrency

(On my Linux server, the Mongrel process gets an allotment of 1024 file descriptors.)

Lucky for us, this each-thread-gets-its-own-connections policy is unnecessary under Mongrel because Mongrel, while being multi-threaded itself, serializes all access to the Rails-based applications it serves up:

Q: Is [Mongrel] multi-threaded or can it handle concurrent requests?

Mongrel is uses a pool of thread workers to do it’s processing. This means that it is able to handle concurrent access and should be thread safe. This also means that you have to be more careful about how you use Mongrel. You can’t just write your application assuming that there are no threads involved. ...

Ruby on Rails is not thread safe so there is a synchronized block around the calls to Dispatcher.dispatch. This means that everything is threaded right before and right after Rails runs. While Rails is running there is only one controller in operation at a time.

(Source: Mongrel FAQ list)
Thus we can safely turn off (i.e., comment out in Typo’s environment.rb file) ActiveRecord’s allow-currency option without having to worry about nasty concurrency or performance issues:
# the following line is commented out
# config.active_record.allow_concurrency = true

For more on this subject, see Rails ticket #2162 and Rails ticket #2742.

Now, here’s my question: Are there any environments in which Typo can run with the allow-concurrency option enabled and not leak database connections? Inquiring minds want to know.

Update: Upon further investigation, turning off concurrency might not be altogether without risk. Some of the Typo code that handles potentially long tasks, such as making trackbacks and pings, spawns new threads in which to carry out its work. I’m looking further into this risk. Updates to come.

Update 2: Piers Cawley added Changeset 1255, which turns AR’s allow-concurrency flag back off and revises the ping code so that it does not attempt concurrent database access. Apply the patch version of 1255 and restart Typo to get the fix. A tip of the hat to Piers for making the quick fix when he was supposed to be on holiday.

Posted in , ,
Tags , , , ,
8 comments
no trackbacks
Reddit Delicious

Typo-4.0.3 instability and a minor patch for sqlite3-ruby

Posted by Tom Moertel Thu, 24 Aug 2006 04:41:00 GMT

Since I upgraded my blog from Typo 4.0.0 to 4.0.3, it has been somewhat unstable. About once a day it starts responding with “500 Internal Server Error” and stays that way until I restart it.

The root of the problem seems to be the database connection, as evidenced by this exception showing up in the production log:

SQLite3::CantOpenException (could not open database)

Unfortunately, the exception doesn’t provide anything specific to go on.

A quick look at the sqlite3-ruby code suggested that I was not going to get the specifics, either. The Ruby-based wrapper never calls sqlite3_errmsg after a call to sqlite3_open fails on behalf of SQLite3::Database.new.

A quick patch, however, fixed the problem:

--- sqlite3-ruby-1.1.0.orig/lib/sqlite3/database.rb
+++ sqlite3-ruby-1.1.0/lib/sqlite3/database.rb
@@ -109,7 +109,7 @@
       @statement_factory = options[:statement_factory] || Statement

       result, @handle = @driver.open( file_name, utf16 )
-      Error.check( result, nil, "could not open database" )
+      Error.check( result, self, "could not open database" )

       @closed = false
       @results_as_hash = options.fetch(:results_as_hash,false)

(Submitted as Ticket 5504 on RubyForge.)

Before applying the patch, opening a database at a nonexistent path results in a generic error message:

$ ruby -r rubygems -e 'require_gem "sqlite3-ruby";
    SQLite3::Database.new("/no/such/path/db")'

... could not open database (SQLite3::CantOpenException) ...

After applying the patch, we get additional error information:

... could not open database: unable to open database file
    (SQLite3::CantOpenException) ...

With the patch in place, all I have to do is wait for Typo to start acting up again. Then I’ll have some interesting information in the log.

Until then, I’m relying on cron and a short monitoring script to restart Typo when it tips into foolishness:

#!/bin/bash

url=http://blog.moertel.com/admin
addrs=tom@moertel.com

response=$(GET -sd $url 2>&1)

if [ "$response" != "200 OK" ]; then
    { echo "Response was: $response"; echo; service typo restart; } |
    mail -s "Blog site not responding! (Restarting)" $addrs
fi

We’ll see how it goes.

Update: That was fast. The error popped up again and this time the log told me something useful: “unable to open database file.” Now, why couldn’t Typo open the database file, especially since the file is perfectly fine and had been opened successfully (many times) by the very same Typo process earlier? Here’s a hint:
$ ls /proc/28788/fd | wc -l
1023

Seems like there’s a resource leak in Typo 4.0.3 (or Rails 1.1.6). Under some conditions, instead of reusing existing database connections, Typo keeps trying to open new ones. Eventually, it uses up its allotment of file descriptors and the operating system is forced to say, “That’s enough, pal,” (EMFILE).

I’ll look in to it more in the morning.

Update 2: Problem solved.

Posted in , , ,
Tags , ,
1 comment
no trackbacks
Reddit Delicious