Adding Haskell syntax highlighting to the Typo blogging system

Posted by Tom Moertel Wed, 01 Nov 2006 22:01:00 GMT

Last night on #haskell, Don Stewart asked if I had seen HsColour for rendering syntax-highlighted Haskell in HTML. He had used it recently, he noted in passing, to add syntax highlighting to planet.haskell.org.

Now, I can’t be certain about this, but I suspect that Don’s question was cleverly designed to instill in me a subtle case of syntax-highlighting envy. For on my blog, Haskell code snippets were rendered in dreadfully boring uncolored text. But on his blog, the snippets dance in joyous polychromatic splendor.

Thus I was compelled to add Haskell syntax-highlighting to my blog.

Adding Haskell syntax-highlighting to Typo

My blog runs on the Ruby-on-Rails-powered Typo system, which allows for plug-in text filters. One of the included filters, in fact, is a syntax-highlighting filter for snippets of Ruby, XML, and YAML code. This filter is built upon the Ruby Syntax module, which wasn’t exactly designed for Haskell syntax analysis. So I set out to create a new plug-in filter based upon HsColour.

This task turned out to be easy. All I did was duplicate Typo’s existing syntax-highlighting filter and swap out its filtering code for the following:

IO.popen("HsColour -css", "r+") do |f|
  pid = fork { f.write text; f.close; exit! 0 }
  f.close_write
  text = f.read
  Process.waitpid pid
end

I also tweaked the post-processing regular expressions so that they would whittle away the HTML filler before and after the syntax-highlighted output of HsColour:

text.gsub!(/.*<p()re>/m, ...)
text.gsub!(/<\/pre>.*/m, ...)

A few more tweaks and I was done.

Now I can wrap my Haskell code in <typo:haskell> tags and it, too, will dance in joyous polychromatic splendor:

constructTable tspecs = do
    ecolspecs <- during "argument evaluation" $ do
        toNvps . concat =<< mapM splice tspecs
    let names = map fst ecolspecs
    let evecs = map snd ecolspecs
    vecs <- argof nm $ mapM evalVector evecs
    let vlens = map vlen vecs
    if length (group vlens) == 1
        then return . VTable $ mkTable (zip names vecs)
        else throwError $
             "table columns must be non-empty vectors of equal length"
  where
    nm = "table(...) constructor"
    splice (TCol envp)  = return [envp]
    splice (TSplice e)  = do
        val <- eval e
        case val of
            VTable t ->
                return $ zipWith mkNVP (tcnames t) (elems (tvecs t))
            VList gl ->
                liftM (zipWith mkNVP (map name . elems $ glnames gl)) $
                mapM asVectorNull (elems $ glvals gl)
            _ -> throwError $
                "can't construct table columns from (" ++
                show val ++ ")"
    mkNVP n vec = NVP n (mkNoPosExpr . EVal $ VVector vec)
    name ""     = "NA"
    name n      = n

If you want the filter code, here it is: haskell_controller.rb. Just drop it into components/plugins/textfilters and restart Typo. The corresponding CSS styles can be found in my user-styles.css.

Posted in , ,
Tags , , ,
no comments
no trackbacks
Reddit Delicious

Database connection leak in Typo 4.0.3: problem solved

Posted by Tom Moertel Thu, 24 Aug 2006 19:41:00 GMT

In an earlier post I wrote about stability problems that have plagued my blog since upgrading from Typo 4.0.0 to 4.0.3. I have finally traced the problem to its source, and here’s the deal:

If you’re serving Typo up via Mongrel, do not configure ActiveRecord to allow concurrency.

One of the changes between Typo 4.0.0 and 4.0.3 is this addition to the environment.rb file:

config.active_record.allow_concurrency = true

Comment out this line, restart Typo, and the problem is solved. Apply Changeset 1255, and the problem is solved. (See Update 2, below.)

Discussion

When ActiveRecord::Base.allow_concurrency is set to true, AR will give each thread its own database connections and cache them in thread-localized storage. The idea is that, in a multi-threaded environment, this simple policy prevents unsafe interactions between threads and the database. (Imagine what would happen if one thread “borrowed” a connection over which another thread had opened a transaction. Oops, there goes transactional isolation.)

This policy, however, does place a burden on the owner of the threads to make sure that each thread’s local connection cache is cleared when the thread is joined, a burden that is not, it would seem, being carried by Typo under Mongrel. As a result, Typo rapidly chews through the allotment of file descriptors that the operating system kindly had reserved for Mongrel:

Typo 4.0.3 on Mongrel w/ SQLite3 consumes about 1.7 file descriptors per minute when ActiveRecord is configured to allow concurrency

(On my Linux server, the Mongrel process gets an allotment of 1024 file descriptors.)

Lucky for us, this each-thread-gets-its-own-connections policy is unnecessary under Mongrel because Mongrel, while being multi-threaded itself, serializes all access to the Rails-based applications it serves up:

Q: Is [Mongrel] multi-threaded or can it handle concurrent requests?

Mongrel is uses a pool of thread workers to do it’s processing. This means that it is able to handle concurrent access and should be thread safe. This also means that you have to be more careful about how you use Mongrel. You can’t just write your application assuming that there are no threads involved. ...

Ruby on Rails is not thread safe so there is a synchronized block around the calls to Dispatcher.dispatch. This means that everything is threaded right before and right after Rails runs. While Rails is running there is only one controller in operation at a time.

(Source: Mongrel FAQ list)
Thus we can safely turn off (i.e., comment out in Typo’s environment.rb file) ActiveRecord’s allow-currency option without having to worry about nasty concurrency or performance issues:
# the following line is commented out
# config.active_record.allow_concurrency = true

For more on this subject, see Rails ticket #2162 and Rails ticket #2742.

Now, here’s my question: Are there any environments in which Typo can run with the allow-concurrency option enabled and not leak database connections? Inquiring minds want to know.

Update: Upon further investigation, turning off concurrency might not be altogether without risk. Some of the Typo code that handles potentially long tasks, such as making trackbacks and pings, spawns new threads in which to carry out its work. I’m looking further into this risk. Updates to come.

Update 2: Piers Cawley added Changeset 1255, which turns AR’s allow-concurrency flag back off and revises the ping code so that it does not attempt concurrent database access. Apply the patch version of 1255 and restart Typo to get the fix. A tip of the hat to Piers for making the quick fix when he was supposed to be on holiday.

Posted in , ,
Tags , , , ,
8 comments
no trackbacks
Reddit Delicious

Typo-4.0.3 instability and a minor patch for sqlite3-ruby

Posted by Tom Moertel Thu, 24 Aug 2006 04:41:00 GMT

Since I upgraded my blog from Typo 4.0.0 to 4.0.3, it has been somewhat unstable. About once a day it starts responding with “500 Internal Server Error” and stays that way until I restart it.

The root of the problem seems to be the database connection, as evidenced by this exception showing up in the production log:

SQLite3::CantOpenException (could not open database)

Unfortunately, the exception doesn’t provide anything specific to go on.

A quick look at the sqlite3-ruby code suggested that I was not going to get the specifics, either. The Ruby-based wrapper never calls sqlite3_errmsg after a call to sqlite3_open fails on behalf of SQLite3::Database.new.

A quick patch, however, fixed the problem:

--- sqlite3-ruby-1.1.0.orig/lib/sqlite3/database.rb
+++ sqlite3-ruby-1.1.0/lib/sqlite3/database.rb
@@ -109,7 +109,7 @@
       @statement_factory = options[:statement_factory] || Statement

       result, @handle = @driver.open( file_name, utf16 )
-      Error.check( result, nil, "could not open database" )
+      Error.check( result, self, "could not open database" )

       @closed = false
       @results_as_hash = options.fetch(:results_as_hash,false)

(Submitted as Ticket 5504 on RubyForge.)

Before applying the patch, opening a database at a nonexistent path results in a generic error message:

$ ruby -r rubygems -e 'require_gem "sqlite3-ruby";
    SQLite3::Database.new("/no/such/path/db")'

... could not open database (SQLite3::CantOpenException) ...

After applying the patch, we get additional error information:

... could not open database: unable to open database file
    (SQLite3::CantOpenException) ...

With the patch in place, all I have to do is wait for Typo to start acting up again. Then I’ll have some interesting information in the log.

Until then, I’m relying on cron and a short monitoring script to restart Typo when it tips into foolishness:

#!/bin/bash

url=http://blog.moertel.com/admin
addrs=tom@moertel.com

response=$(GET -sd $url 2>&1)

if [ "$response" != "200 OK" ]; then
    { echo "Response was: $response"; echo; service typo restart; } |
    mail -s "Blog site not responding! (Restarting)" $addrs
fi

We’ll see how it goes.

Update: That was fast. The error popped up again and this time the log told me something useful: “unable to open database file.” Now, why couldn’t Typo open the database file, especially since the file is perfectly fine and had been opened successfully (many times) by the very same Typo process earlier? Here’s a hint:
$ ls /proc/28788/fd | wc -l
1023

Seems like there’s a resource leak in Typo 4.0.3 (or Rails 1.1.6). Under some conditions, instead of reusing existing database connections, Typo keeps trying to open new ones. Eventually, it uses up its allotment of file descriptors and the operating system is forced to say, “That’s enough, pal,” (EMFILE).

I’ll look in to it more in the morning.

Update 2: Problem solved.

Posted in , , ,
Tags , ,
1 comment
no trackbacks
Reddit Delicious

Adding reddit and del.icio.us buttons to articles in Typo

Posted by Tom Moertel Wed, 09 Aug 2006 22:25:00 GMT

Here’s quick patch I made to my Typo 4.0 installation to add Reddit and del.icio.us buttons to articles. Now one click is all it takes to submit an article to either site. (These buttons appear on my blog at the end of each article.)

If you want to apply the patch, be sure to also place copies of the button images into public/images. You can snag the images from my site or from the Reddit and del.icio.us sites.

Here’s the patch:

--- typo.orig/app/helpers/articles_helper.rb    2006-07-24 11:04:27.000000000 -0400
+++ typo/app/helpers/articles_helper.rb    2006-08-09 17:06:51.000000000 -0400
@@ -73,7 +74,26 @@
       code << tag_links(article)        unless article.tags.empty?
       code << comments_link(article)    if article.allow_comments?
       code << trackbacks_link(article)  if article.allow_pings?
-    end.join("&nbsp;<strong>|</strong>&nbsp;")
+      code << submit_this_article_links(article)
+    end.join("&nbsp;| ")
+  end
+
+  def submit_this_article_links(article)
+    u_url = u(url_of(article, false))
+    u_title = u(article.title)
+    [  # move me into a database table
+      [ "Submit to Reddit.com",
+        "http://reddit.com/submit?url=<URL>&title=<TITLE>",
+        image_tag("reddit.gif", :size => "18x18", :border => 0)
+      ],
+      [ "Save to del.icio.us",
+        "http://del.icio.us/post?v=2&url=<URL>&title=<TITLE>",
+        image_tag("delicious.gif", :size => "16x16", :border => 0)
+      ]
+    ].map do |submit_title, submit_url, image_tag|
+      submit_url = submit_url.gsub(/<URL>/, u_url).gsub(/<TITLE>/, u_title)
+      %(<a href="#{h submit_url}" title="#{h submit_title}: &#x201C;#{h article.title}&#x201D;">#{image_tag}</a>)
+    end.join("&nbsp;")
   end

   def category_links(article)

The code is begging for a little refactoring love, but I’m off for vacation in about twenty minutes, so it will have to wait.

Posted in , ,
Tags , ,
no comments
no trackbacks
Reddit Delicious

Upgrading my blog to run Typo 4.0

Posted by Tom Moertel Mon, 24 Jul 2006 17:34:00 GMT

If my blog looks a little weird right now, please bear with me. I am in the process of upgrading from Typo 2.6.0 to Typo 4.0, and so far the process has been somewhat painful.

The new Typo installer did not have much luck upgrading my blog to the new version. After fighting and solving a succession of errors and confidence-sapping problems, I decided to abandon the upgrade process. Instead, I changed to the course most likely to result in a stable configuration: to install a new blog and then move my content over to it.

The content-moving process was easier than it might sound. I manually migrated the old blog database to the new database format; dumped it to a SQL file; edited the file to remove all but the INSERT statements for articles, comments, pages, and so on; and then I loaded the statements into the new database.

I did not copy over my configuration and sidebar information, however, because I figured it would be safer to use the Typo-4.0 defaults, those being the most tested. I also recreated my user account from scratch.

So far the blog seems to be running stably, enough at least for me to restore public access again. But I still have more restoration ahead. Next I will work on restoring my espresso theme.

Update 2006-07-26: I have now restored my espresso theme. For a while I was considering using Scribbish, which is delightfully clean by comparison, but it has not yet been updated to support much of Typo 4.0’s goodness. Maybe later.

Posted in ,
Tags
no comments
no trackbacks
Reddit Delicious

Improving Typo's spam protection

Posted by Tom Moertel Mon, 16 Jan 2006 06:34:00 GMT

I noticed that my site has been picking up more comment spam recently. Typo has built-in spam protection, but for some reason a few spam comments that ought to have been caught slipped through its filters. Curious, I investigated.

Most spam comments contain links to sites favored by the spammers. The sites are almost always of the form x.domain.com, where domain is one of a few higher-level domains and x is drawn from a large set of values from the realms of gambling, pornography, and male enhancement. It seems that the spammers pay for a few real domains and then create a ton of subdomains under them.

One of the ways to detect comment spam is to find URIs in comments and look up the sites they point to in DNS-based SURBLs, such as multi.surbl.org and bsb.empty.us. The thing is, when SURBLs list a spammy site x.domain.com, sometimes they list it under the full hostname x.domain.com and sometimes they list it under the higher-level domain domain.com. To be safe, Typo looks up both forms when it checks for spam.

Here’s the code it uses:

HOST_RBLS.each do |rbl|
  begin
    if [
        IPSocket.getaddress([host, rbl].join('.')),
        IPSocket.getaddress((domain + [rbl]).join('.'))
       ].include?("127.0.0.2")
      throw :hit, "#{rbl} positively resolved #{domain.join('.')}"
    end
  rescue SocketError
  end
end

The code iterates over the list of SURBLs it has and queries each twice – once for the host and once for the domain in question – saving the results of the queries in an array. Then if the array includes a positive response (127.0.0.2), it throws a “hit” notice to the calling code, which will block the associated comment.

Unfortunately, the code doesn’t quite work as intended. Although a positive response for either the host or the domain should register as a hit, the code requires both queries to return positive responses. As a result, the code yields a lot of false negatives because most lists don’t include both host and domain forms of spammy sites; the required double positive is thus hard to obtain.

The cause of the problem is the attempt to query for both forms of the site before checking either response. The queries are performed by calling IPSocket.getaddress, which performs a DNS query for the “A” record associated with its argument. If the record exists, the call returns it; otherwise, the call raises a SocketError exception.

The exception is what causes the logic to break down. When either the host or domain is not in the queried SURBL, which will almost always be the case for reasons I explained earlier, one of the queries will result in a SocketError exception. The exception will be caught by the rescue clause later in the code, but not before the opportunity to test the other query’s response and throw a “hit” has been lost.

My fix was to replace the above code with a call to a new helper method:

query_rbls(HOST_RBLS, host, domain.join('.'))

The helper, defined later, makes the actual queries:

def query_rbls(rbls, *subdomains)
  rbls.each do |rbl|
    subdomains.uniq.each do |d|
      begin
        response = IPSocket.getaddress([d, rbl].join('.'))
        throw :hit, "#{rbl} positively resolved #{d} => #{response}"
      rescue SocketError
        # NXDOMAIN response => negative:  d is not in RBL
      end
    end
  end
  return false
end

Because some SURBLs don’t use 127.0.0.2 but some other “A” record to indicate a positive response, my helper removes the hard-coded address test.

I also made a few more improvements to the spam-protection code. The full set of changes is available as Patch 657 on the Typo Trac site.

Posted in
Tags , ,
no comments
no trackbacks
Reddit Delicious

I have moved my blog over to Typo

Posted by Tom Moertel Thu, 25 Aug 2005 20:41:00 GMT

SnipSnap no longer makes me happy, and I am switching my weblog over to the Typo weblog system and moving it to blog.moertel.com. I like Rails, and Typo is Rails-based coding at its finest. And Typo has a future. Besides, I want another opportunity to take a perfectly good website theme and destroy it with my utter lack of design acumen.

As far as content goes, I will move my old posts over as time permits. For now, you can read them on the old site, which will remain up. (The Community Projects are still hosted there.)

If you are subscribed to my old feeds, don’t worry: Apache trickery will automagically redirect your RSS reader to my new feeds here. If fact, if you are reading this, you are already getting the new stuff.

—Tom

Posted in ,
6 comments
no trackbacks
Reddit Delicious