Mining gold from the Internet Movie Database, part 1: decoding user ratings

Posted by Tom Moertel Wed, 18 Jan 2006 01:59:00 GMT

The Internet Movie Database (IMDb) is a rich source of online movie information. The problem is, the true gold is buried deep beneath the site’s user-friendly exterior and hidden within the database itself. With a little digging, however, we can extract the gold, nugget by nugget, and learn about fun statistical tools for data analysis.

Today, in the first part of our analysis, we will put our intuition about rating systems to the test. We will decode IMDb “user ratings,” those numbers such as 6.1 and 7.8 that summarize how the registered users of the IMDb rated movies on a scale from 1 to 10, typically depicted as a series of stars on the screen:

sample user rating

We will extract the collective wisdom of registered IMDb users in order to convert a movie’s user rating into the movie’s standing within the database. This gives us a good indicator of how the movie stacks up against other movies in general, and that’s good information to have when deciding which movies to see in the theater or add to your Netflix list.

Ready to start digging? Let’s go!

Read more...

Posted in ,
Tags , , ,
5 comments
no trackbacks
Reddit Delicious

Good stuff: Foyle's War

Posted by Tom Moertel Thu, 01 Sep 2005 19:20:00 GMT

“Reality” shows have plunged mainstream television into an entirely new depth of stupidity – and for television, that’s saying something. Fortunately for us, some programs defy the downward trend, and Masterpiece Theatre’s Foyle’s War is one of the best.

Read more...

Posted in , ,
1 comment
no trackbacks
Reddit Delicious