Good stuff: Aldo Coffee Company

Posted by Tom Moertel Tue, 31 Jan 2006 02:57:00 GMT

I love espresso. It’s my favorite way to enjoy coffee. Even so, I almost never order espresso in coffee shops because, here in the United States, very few coffee shops have mastered the exacting process by which espresso is made. Dr. Josuma John of the Josuma Coffee Company writes that “more than 95 percent of North American espresso is poorly made, and, in fact, undrinkable.” My experience with Pittsburgh-area coffee shops in the last decade provides no evidence to refute Dr. John’s claim.

If espresso in the United States is so bad, why do Americans drink enough of it to support a Starbucks on every street corner? The reason is that Americans drink espresso almost exclusively in the form of milk-based beverages: cappuccinos, lattes, and mochas. Milk and flavored syrups are the main attractions. Espresso serves only as a coffee-flavored backdrop in which bitterness, a characteristic of poorly made espresso, complements the abundant sweetness of milk laced with sugar syrups. American coffee-shop owners thus have little incentive to offer better espresso to their customers – bad espresso is good enough.

Because of this sad reality, I have developed through hard experience the following reliable guideline for ordering espresso at American coffee shops: Don’t. The one exception I make is for new coffee shops, at which I will try a double espresso, just to see what I get. Almost always, I get a bad espresso, bitter and watery.

And that is what I had expected back in April 2005, when I spotted the brand-new sign for Aldo Coffee Co. in my home town of Mt. Lebanon, Pennsylvania, located in Pittsburgh’s South Hills. I went in, dragging my wife along, and placed my order.

Then something unusual happened. The barista asked me, somewhat hopefully it seemed, if I drank espresso regularly. When I said yes, she seemed pleased. When she followed up by asking me if I read alt.coffee, I was stunned. When I observed that she was timing my shot, my brain actually shut down for a few seconds while it forcibly recalibrated itself to accommodate the seemingly impossible: that I was standing in a coffee shop in my home town, conversing with a barista about alt.coffee, and mere seconds away from receiving what was very likely to be good espresso.

Read more...

Posted in , ,
Tags , , ,
9 comments
1 trackback
Reddit Delicious

Night of the long-tailed beast!

Posted by Tom Moertel Tue, 24 Jan 2006 03:39:00 GMT

When I let the dog out this evening, it didn’t take long for her to start barking. Figuring she had cornered the neighbor’s cat, I went outside and called her. Naturally, she ignored my order to come back into the house.

Angrily, I marched up to her, underneath the crabapple tree, and took her by the collar. I made sure to bend low and look her in the eyes, just to let her know that I was not happy about having to walk in the wet grass to fetch her. When I stood up to lead her back to the house, my head reached into the lower branches of the crabapple tree.

And then I saw it, inches from my face, looking right back at me.

Read more...

Posted in ,
Tags ,
5 comments
no trackbacks
Reddit Delicious

Everything old is new again: moving content over from my old blog.

Posted by Tom Moertel Sat, 21 Jan 2006 21:04:00 GMT

As you may know, a few months ago I moved my blog from its old system, powered by SnipSnap over to a new system, powered by the delightful and easier-to-hack-on Typo. Now that everything has been running comfortably for a few months, I am going to move some of my old blog’s content over to Typo.

At first I planned on writing a program to handle the move for me. It would pull from SnipSnap’s database, convert the markup of the articles, and drop the results into Typo’s database. After reviewing my old content, however, I have changed my plans.

My new plan is to cherry-pick the most interesting stuff and move it over. Some of the old stuff is too out of date or too tied to SnipSnap’s integrated wiki to be sensibly extracted and integrated into the new blog.

I’m starting the move today. If you see a “new” article that has an old date, you’ll know why.

Cheers,
Tom

Posted in
no comments
no trackbacks
Reddit Delicious

Wondrous oddities: R's function-call semantics

Posted by Tom Moertel Fri, 20 Jan 2006 23:02:00 GMT

Every so often, I am going to write about wondrous oddities – obscure programming-language features that are so cool they deserve wider notice. Today, in the first installment, I want to show you the function-call semantics of R, a great system for statistical computing.

You might not expect a statistics system to have a first-class programming language at it’s heart, but if you think about it, it does make sense. The R language, actually a dialect of the S language, is described as “a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.” All true. It gives me the feeling of an infix Lisp or Scheme whose syntax is slanted toward mathematics and vector operations. The language has an object layer, too, but that’s not why we are here.

No, we are here to look at R’s uncommonly interesting function-call semantics, in particular argument binding and evaluation. Let’s dig in.

Read more...

Posted in , ,
Tags , ,
5 comments
no trackbacks
Reddit Delicious

Mining gold from the Internet Movie Database, part 1: decoding user ratings

Posted by Tom Moertel Wed, 18 Jan 2006 01:59:00 GMT

The Internet Movie Database (IMDb) is a rich source of online movie information. The problem is, the true gold is buried deep beneath the site’s user-friendly exterior and hidden within the database itself. With a little digging, however, we can extract the gold, nugget by nugget, and learn about fun statistical tools for data analysis.

Today, in the first part of our analysis, we will put our intuition about rating systems to the test. We will decode IMDb “user ratings,” those numbers such as 6.1 and 7.8 that summarize how the registered users of the IMDb rated movies on a scale from 1 to 10, typically depicted as a series of stars on the screen:

sample user rating

We will extract the collective wisdom of registered IMDb users in order to convert a movie’s user rating into the movie’s standing within the database. This gives us a good indicator of how the movie stacks up against other movies in general, and that’s good information to have when deciding which movies to see in the theater or add to your Netflix list.

Ready to start digging? Let’s go!

Read more...

Posted in ,
Tags , , ,
9 comments
no trackbacks
Reddit Delicious

Improving Typo's spam protection

Posted by Tom Moertel Mon, 16 Jan 2006 06:34:00 GMT

I noticed that my site has been picking up more comment spam recently. Typo has built-in spam protection, but for some reason a few spam comments that ought to have been caught slipped through its filters. Curious, I investigated.

Most spam comments contain links to sites favored by the spammers. The sites are almost always of the form x.domain.com, where domain is one of a few higher-level domains and x is drawn from a large set of values from the realms of gambling, pornography, and male enhancement. It seems that the spammers pay for a few real domains and then create a ton of subdomains under them.

One of the ways to detect comment spam is to find URIs in comments and look up the sites they point to in DNS-based SURBLs, such as multi.surbl.org and bsb.empty.us. The thing is, when SURBLs list a spammy site x.domain.com, sometimes they list it under the full hostname x.domain.com and sometimes they list it under the higher-level domain domain.com. To be safe, Typo looks up both forms when it checks for spam.

Here’s the code it uses:

HOST_RBLS.each do |rbl|
  begin
    if [
        IPSocket.getaddress([host, rbl].join('.')),
        IPSocket.getaddress((domain + [rbl]).join('.'))
       ].include?("127.0.0.2")
      throw :hit, "#{rbl} positively resolved #{domain.join('.')}"
    end
  rescue SocketError
  end
end

The code iterates over the list of SURBLs it has and queries each twice – once for the host and once for the domain in question – saving the results of the queries in an array. Then if the array includes a positive response (127.0.0.2), it throws a “hit” notice to the calling code, which will block the associated comment.

Unfortunately, the code doesn’t quite work as intended. Although a positive response for either the host or the domain should register as a hit, the code requires both queries to return positive responses. As a result, the code yields a lot of false negatives because most lists don’t include both host and domain forms of spammy sites; the required double positive is thus hard to obtain.

The cause of the problem is the attempt to query for both forms of the site before checking either response. The queries are performed by calling IPSocket.getaddress, which performs a DNS query for the “A” record associated with its argument. If the record exists, the call returns it; otherwise, the call raises a SocketError exception.

The exception is what causes the logic to break down. When either the host or domain is not in the queried SURBL, which will almost always be the case for reasons I explained earlier, one of the queries will result in a SocketError exception. The exception will be caught by the rescue clause later in the code, but not before the opportunity to test the other query’s response and throw a “hit” has been lost.

My fix was to replace the above code with a call to a new helper method:

query_rbls(HOST_RBLS, host, domain.join('.'))

The helper, defined later, makes the actual queries:

def query_rbls(rbls, *subdomains)
  rbls.each do |rbl|
    subdomains.uniq.each do |d|
      begin
        response = IPSocket.getaddress([d, rbl].join('.'))
        throw :hit, "#{rbl} positively resolved #{d} => #{response}"
      rescue SocketError
        # NXDOMAIN response => negative:  d is not in RBL
      end
    end
  end
  return false
end

Because some SURBLs don’t use 127.0.0.2 but some other “A” record to indicate a positive response, my helper removes the hard-coded address test.

I also made a few more improvements to the spam-protection code. The full set of changes is available as Patch 657 on the Typo Trac site.

Posted in
Tags , ,
no comments
no trackbacks
Reddit Delicious