<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/stylesheets/rss.css" type="text/css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Tom Moertel's Weblog: Tag movies</title>
    <link>http://blog.moertel.com/articles/tag/movies?tag=movies</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Quality rants on programming theory and stuff geeks like</description>
    <item>
      <title>How to download photos and movies from the Palm Centro to a Linux desktop</title>
      <description>&lt;p&gt;I recently got a Palm Centro smartphone, and so far I love it. Like
most modern cell phones, it has a built-in camera and takes decent
snapshots and even records short movies. It&amp;#8217;s great for
spur-of-the-moment shots when I don&amp;#8217;t have my real camera.  The
trick &amp;#8211; and there&amp;#8217;s always a trick when it comes to cell phones &amp;#8211; is getting
the photos off the camera and onto my computer.&lt;/p&gt;


	&lt;p&gt;To get at my pictures, Sprint would prefer that I sign up for their
ludicrously expensive &amp;#8220;PictureMail&amp;#8221; service.  Leave it to weasely
telecom execs to come up with another way to squeeze money from
teenagers: charge them $5 each month for the &amp;#8220;privilege&amp;#8221; of sharing
their pictures with friends.  This fee, of course, is in addition to the
fee for &amp;#8220;unlimited&amp;#8221; mobile Internet use.  I guess picture bits are
somehow more expensive to move over the air than other kinds of bits.&lt;/p&gt;


	&lt;p&gt;In any case, my next goal after
getting my &lt;a href="http://blog.moertel.com/articles/2007/10/31/how-to-hotsync-the-palm-centro-with-a-fedora-7-linux-desktop-via-usb"&gt;Centro to hotsync with my Linux
workstation&lt;/a&gt; was to figure out how
to download my photos and movies.&lt;/p&gt;


	&lt;p&gt;After a bit of hacking, I figured out that the Centro stores images in
a typical digital-camera-image (DCIM) hierarchy.  For
example, I have a 4-GB microSD card installed in my Centro, and I
store my photos in the &amp;#8220;Palm&amp;#8221; album on it.  This album ends up stored
in the /DCIM/Palm directory on the card.&lt;/p&gt;


	&lt;p&gt;Using the pilot-xfer program
from the &lt;a href="http://www.pilot-link.org/"&gt;pilot-link&lt;/a&gt; project, I was able
to find the directory and its contents.  The trick was to use the
sparsely documented &amp;#8211;D flag to work with the Centro&amp;#8217;s virtual
filesystem.  Here, for example, is how I list the contents of the Palm album:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;$ pilot-xfer -p usb: -D /DCIM/Palm -l

   Listening for incoming connection on usb:... connected!

   Directory of /DCIM/Palm...
        652 Fri Nov  2 08:17:06 2007  Album.db
     292053 Fri Nov  2 09:04:20 2007  Photo_110207_001.jpg
      78493 Fri Nov  2 08:17:06 2007  Video_110207_001.3g2
         20 Wed Oct 31 12:09:20 2007  Thumbnail.db

   Thank you for using pilot-link.
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Here, you can see that I have one photo and one movie in the album.
(Movies are stored in .3g2 files that contain &lt;span class="caps"&gt;MPEG4&lt;/span&gt; video.)&lt;/p&gt;


	&lt;p&gt;To download the files, I again turned to pilot-xfer, this time using the
&amp;#8211;f (fetch) flag to fetch a list of files.
Here, for example, I&amp;#8217;ll fetch the image from the listing above:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;$ pilot-xfer -p usb: -D /DCIM/Palm -f Photo_110207_001.jpg

   Listening for incoming connection on usb:... connected!

   Fetching '/DCIM/Palm' ... (292053 bytes)   285 KiB total.

   Thank you for using pilot-link.
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;So that&amp;#8217;s the process.  It&amp;#8217;s kind of clunky, so I wrote a small Python
program to automate it.  (I&amp;#8217;m learning Python.  If you&amp;#8217;re a Pythonista, please
consider critiquing my code. I would be especially thankful if you
could point out any helpful idioms that I may have overlooked.)&lt;/p&gt;


	&lt;p&gt;Here&amp;#8217;s how to use the program:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;$ get-pilot-photos.py --help
Usage: get-pilot-photos.py [options]

Options:
  -h, --help            show this help message and exit
  -s SRCDIR, --srcdir=SRCDIR
                        VFS dir on Palm device from which to fetch images
  -d DESTDIR, --destdir=DESTDIR
                        Where to save the images on your computer
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Both the &amp;#8212;srcdir and &amp;#8212;dstdir options are optional.  If you
omit the first, the program will download photos and movies from the
/DCIM/Palm album.  If you omit the second, the program will save the
downloads to a new, timestamped directory within your home directory.&lt;/p&gt;


	&lt;p&gt;That&amp;#8217;s it.  The code is below.&lt;/p&gt;&lt;pre&gt;&lt;code style="font-size: smaller"&gt;#!/usr/bin/env python

# get-pilot-photos.py -
# Download photos and movies from my Palm Centro via pilot-link
#
# Tom Moertel &amp;lt;tom@moertel.com&amp;gt;
# 2007-11-01
#
# Copyright 2007 Thomas G. Moertel
#
# This program is free software: you can redistribute it and/or
# modify it under the terms of the GNU General Public License as
# published by the Free Software Foundation, either version 3 of
# the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# See &amp;lt;http://www.gnu.org/licenses/&amp;gt; for more.

import os
import optparse
import re
import subprocess
import time

PILOT_XFER = 'pilot-xfer'
DEFAULT_PALM_IMAGE_DIR = '/DCIM/Palm'

class PhotoImporter(object):

    def __init__(self, src_dir, dest_dir=None):
        self.src_dir = src_dir
        self.dest_dir = dest_dir or self.get_image_dir()

    def run(self):
        print 'Finding images in %s on your Palm device.' % self.src_dir
        print 'Begin hotsync now...'
        images = self.get_image_list()
        if len(images) == 0:
            print 'No images were found.  Done.'
            return
        print 'Found %s images' % len(images)
        print 'Waiting for hotsync to complete...'
        time.sleep(10) # give 1st hotsync time to complete
        print 'Begin another hotsync now...'
        self.fetch_images(images)
        print 'Done.  The images were fetched to the following directory:'
        print self.dest_dir

    def get_image_list(self):
        cmdline = [PILOT_XFER, '-p',  'usb:', '-D', self.src_dir, '-l']
        proc = subprocess.Popen(cmdline, stdout=subprocess.PIPE)
        listing = proc.stdout.read()
        proc.wait()
        return re.findall(r'\b\S+\.(?:jpg|3g2)\b', listing)

    def fetch_images(self, images):
        cmdline = [PILOT_XFER, '-p',  'usb:', '-D', self.src_dir, '-f'] + images
        subprocess.Popen(cmdline, cwd=self.dest_dir).wait()

    def get_image_dir(self):
        root = os.getenv('HOME') or tempfile.mkdtemp()
        now = time.strftime("%Y-%m-%d--%H.%M.%S", time.localtime())
        dir = os.path.join(root, 'images', 'unsorted-pix', now)
        os.makedirs(dir, mode=0771)
        return dir

def main():
    p = optparse.OptionParser()
    p.add_option('--srcdir', '-s', default=DEFAULT_PALM_IMAGE_DIR,
                 help='VFS dir on Palm device from which to fetch images')
    p.add_option('--destdir', '-d',
                 help='Where to save the images on your computer')
    opts, args = p.parse_args()
    PhotoImporter(opts.srcdir, opts.destdir).run()

if __name__ == '__main__':
    main()
&lt;/code&gt;&lt;/pre&gt;</description>
      <pubDate>Fri, 02 Nov 2007 15:32:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:92fcb359-e7e8-41dc-a5b9-86caffcde1f5</guid>
      <author>Tom Moertel</author>
      <link>http://blog.moertel.com/articles/2007/11/02/how-to-download-photos-and-movies-from-the-palm-centro-to-a-linux-desktop</link>
      <category>hacks</category>
      <category>movies</category>
      <category>palm</category>
      <category>centro</category>
      <category>hotsync</category>
      <category>images</category>
      <category>download</category>
      <category>python</category>
      <trackback:ping>http://blog.moertel.com/articles/trackback/616</trackback:ping>
    </item>
    <item>
      <title>Netflix vs. Amazon Unbox: Netflix still wins</title>
      <description>&lt;p&gt;When Amazon.com announced its &lt;a href="http://www.amazon.com/gp/redirect.html?ie=UTF8&amp;amp;location=http%3A%2F%2Famazon.com%2Fb%3F%255Fencoding%3DUTF8%26node%3D16261631%26pf%5Frd%5Fm%3DATVPDKIKX0DER%26pf%5Frd%5Fs%3Dleft-nav-1%26pf%5Frd%5Fr%3D1XK6EQTX7A3HM1BKCSMN%26pf%5Frd%5Ft%3D101%26pf%5Frd%5Fp%3D283734401%26pf%5Frd%5Fi%3D507846&amp;amp;tag=tommoertesweb-20&amp;amp;linkCode=ur2&amp;amp;camp=1789&amp;amp;creative=9325"&gt;its Unbox video-download service&lt;/a&gt;, I was skeptical.  Compared to the reigning champion &amp;#8211; the &lt;span class="caps"&gt;DVD&lt;/span&gt; &amp;#8211; Unbox looked like a loser:&lt;/p&gt;


	&lt;ul&gt;
	&lt;li&gt;Unbox burdened its customers with &lt;span class="caps"&gt;DRM&lt;/span&gt; and the annoyances that come with &lt;span class="caps"&gt;DRM&lt;/span&gt;&lt;/li&gt;
		&lt;li&gt;Unbox required the use of a Windows-only player application&lt;/li&gt;
		&lt;li&gt;Unbox movies lacked &amp;#8220;standard&amp;#8221; &lt;span class="caps"&gt;DVD&lt;/span&gt; features such as surround sound, alternative audio tracks, commentaries, and bloopers&lt;/li&gt;
	&lt;/ul&gt;


	&lt;p&gt;The first two points were deal-breakers, so I wrote off Unbox and did my
best to ignore it.&lt;/p&gt;


	&lt;p&gt;And then Amazon hooked up with TiVo.  Beaming movies directly into my
TiVo box eliminates the need to deal with &lt;span class="caps"&gt;DRM&lt;/span&gt; and Windows annoyances.
My two big concerns sidestepped, I decided to give Unbox another
look.  I still wouldn&amp;#8217;t want to &lt;em&gt;buy&lt;/em&gt; Unbox-to-TiVo movies because
they lack the typical &lt;span class="caps"&gt;DVD&lt;/span&gt; extras and would tie up storage
space on my TiVo, but Unbox might be a decent way to rent the
occasional movie &amp;#8211; if the price were right.&lt;/p&gt;


	&lt;h3&gt;Is the price right?&lt;/h3&gt;


	&lt;p&gt;That depends on how the price of Unbox compares with the price
of my current rental option of choice, Netflix.  Both services offer immediate
access to good movies: Unbox by on-demand downloads, Netflix by
ensuring that I almost always have a &lt;span class="caps"&gt;DVD&lt;/span&gt; or two in the house.&lt;/p&gt;


	&lt;p&gt;To compare Unbox with Netflix, I had to figure out how much a
rental costs me with each service.  With Unbox the figuring was easy
because each rental has its own price tag, typically $3.99.&lt;/p&gt;


	&lt;p&gt;With Netflix, it&amp;#8217;s a bit trickier because the rental price depends
upon how many DVDs I rent in a month. I pay a monthly fee of $17.99
and can rent as many DVDs as I want, at least until the infamous
&lt;a href="http://www.hackingnetflix.com/2005/02/netflix_custome.html"&gt;Netflix rate
throttle&lt;/a&gt;
kicks in.
To determine how
many DVDs I rent during the typical month, I had to download my
rental history.  (If you&amp;#8217;re a Netflix subscriber, you can get your
history from the &lt;a href="http://www.netflix.com/ReturnedRentals"&gt;Returned
Rentals&lt;/a&gt; page.)
After downloading my history, massaging it into the desired form, and
loading it into &lt;a href="http://www.r-project.org/"&gt;R&lt;/a&gt;, I generated a
stem-and-leaf plot to visualize the number of DVDs I have rented
during each of the 76 months I have been a Netflix subscriber:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;&amp;gt; stem(monthly.rental.counts, scale=2)

  The decimal point is at the |

   1 | 0
   2 | 000
   3 | 0000000
   4 | 00000000000
   5 | 000000000000
   6 | 000000000000000
   7 | 0000
   8 | 000000
   9 | 00000
  10 | 0000
  11 | 0
  12 | 00
  13 | 00
  14 | 00
  15 | 0
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;It looks like I have rented as few as one and as many as fifteen DVDs in a
month.  Most months, however, I rent between three and ten DVDs.  On
average, I rent about 6.4 DVDs per month:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;&amp;gt; summary(monthly.rental.counts)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  1.000   4.000   6.000   6.408   8.000  15.000
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Thus my average rental price is about $2.80 per &lt;span class="caps"&gt;DVD&lt;/span&gt;:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;&amp;gt; 17.99 / 6.4
[1] 2.810937
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Now I can make my Unbox-vs-Netflix price comparison.  For me, it
looks like Unbox is about 40 percent more expensive than
Netflix:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;&amp;gt; 3.99 / 2.81
[1] 1.419929
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;So the price of Unbox is &lt;em&gt;not&lt;/em&gt; right, at least for me.&lt;/p&gt;


	&lt;h3&gt;Testing Unbox-to-TiVo rentals&lt;/h3&gt;


	&lt;p&gt;Because Amazon is offering free $15 credits to TiVo owners, I decided
to give Unbox a test drive.  My test rental was &lt;a href="http://imdb.com/title/tt0443543/"&gt;&lt;em&gt;The Illusionist&lt;/em&gt;&lt;/a&gt;.  Renting the movie was
easy (just one click), and shortly thereafter Unbox automatically
downloaded the movie to my TiVo box.  When I played the movie,
however, I was disappointed with the video quality.  I easily
noticed banding artifacts, which were distracting
at times.  On the whole, the viewing experience was inferior to watching a
&lt;span class="caps"&gt;DVD&lt;/span&gt;.&lt;/p&gt;


	&lt;h3&gt; Netflix still beats Unbox&lt;/h3&gt;


	&lt;p&gt;For me, then, Unbox is still a loser.  It costs more and delivers
less than &lt;span class="caps"&gt;DVD&lt;/span&gt; rentals via Netflix.&lt;/p&gt;


	&lt;h3&gt; A note to my friends at Amazon.com&lt;/h3&gt;


	&lt;p&gt;I would be happy to give you my business, but right now you&amp;#8217;re not
earning it.  If you
want me as an Unbox customer, here is the recipe for winning me over:&lt;/p&gt;


	&lt;ul&gt;
	&lt;li&gt;Let me easily download movie rentals to my TiVo.  (&lt;em&gt;Check.&lt;/em&gt;)&lt;/li&gt;
		&lt;li&gt;Offer true &lt;span class="caps"&gt;DVD&lt;/span&gt; quality or better. (&lt;em&gt;You&amp;#8217;re not there yet.&lt;/em&gt;)&lt;/li&gt;
		&lt;li&gt;Sell the rentals for less than $2.80. (&lt;em&gt;You&amp;#8217;re not there yet.&lt;/em&gt;)&lt;/li&gt;
	&lt;/ul&gt;


	&lt;p&gt;Until then, I&amp;#8217;ll have to give my money to Netflix.&lt;/p&gt;


	&lt;p&gt;Cheers,&lt;br/&gt;
Tom&lt;/p&gt;


&lt;div class="update"&gt;
&lt;strong&gt;Update:&lt;/strong&gt; edits for clarity; added tags.
&lt;/div&gt;</description>
      <pubDate>Sat, 07 Apr 2007 12:20:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:ca850250-6b95-409a-9b1c-18f7a1707576</guid>
      <author>Tom Moertel</author>
      <link>http://blog.moertel.com/articles/2007/04/07/netflix-vs-amazon-unbox-netflix-still-wins</link>
      <category>reviews</category>
      <category>amazon</category>
      <category>netflix</category>
      <category>reviews</category>
      <category>movies</category>
      <category>unbox</category>
      <category>dvds</category>
      <category>rentals</category>
      <category>tivo</category>
      <trackback:ping>http://blog.moertel.com/articles/trackback/436</trackback:ping>
    </item>
    <item>
      <title>The IMDB Movie Rating Decoder Ring: updated w/ 2 March 2007 data</title>
      <description>&lt;p&gt;If you want to get more out of &lt;a href="http://imdb.com/"&gt;&lt;span class="caps"&gt;IMDB&lt;/span&gt;&lt;/a&gt; movie ratings, check out my
&lt;a href="http://community.moertel.com/ss/space/IMDB+Movie-Rating+Decoder+Ring"&gt;&lt;span class="caps"&gt;IMDB&lt;/span&gt; Movie Rating Decoder Ring&lt;/a&gt;, now updated with fresher data (as of 2 March 2007).&lt;/p&gt;</description>
      <pubDate>Fri, 09 Mar 2007 17:40:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:f75cbc12-2c78-4a30-9863-968dc535d1a3</guid>
      <author>Tom Moertel</author>
      <link>http://blog.moertel.com/articles/2007/03/09/the-imdb-movie-rating-decoder-ring-updated-w-2-march-2007-data</link>
      <category>statistics</category>
      <category>imdb</category>
      <category>statistics</category>
      <category>movies</category>
      <category>decoder_rinng</category>
      <category>ratings</category>
      <category>stars</category>
      <category>data</category>
      <trackback:ping>http://blog.moertel.com/articles/trackback/409</trackback:ping>
    </item>
    <item>
      <title>Mining gold from the Internet Movie Database, part 1: decoding user ratings</title>
      <description>&lt;p&gt;&lt;a href="http://imdb.com/"&gt;The Internet Movie Database&lt;/a&gt; (IMDb) is a rich source
of online movie information.  The problem is, the true gold is buried
deep beneath the site&amp;#8217;s user-friendly exterior and hidden within the
database itself.  With a little digging, however, we can extract the
gold, nugget by nugget, and learn about fun statistical tools for data
analysis.&lt;/p&gt;


	&lt;p&gt;Today, in the first part of our analysis, we will put our intuition
about rating systems to the test.  We will decode IMDb &amp;#8220;user ratings,&amp;#8221; 
those numbers such as 6.1 and 7.8 that summarize how the registered
users of the IMDb rated movies on a scale from 1 to 10, typically
depicted as a series of stars on the screen:&lt;/p&gt;


&lt;div style="text-align: center; margin: 1.5ex; "&gt;
&lt;img src="http://community.moertel.com/~thor/pix/20060114/sample-user-rating.png" title="sample user rating" alt="sample user rating" /&gt;
&lt;/div&gt;

	&lt;p&gt;We will extract the collective wisdom of registered IMDb users in
order to convert a movie&amp;#8217;s user rating into the movie&amp;#8217;s standing
within the database.  This gives us a good indicator of how the movie
stacks up against other movies in general, and that&amp;#8217;s good information
to have when deciding which movies to see in the theater or add to
your Netflix list.&lt;/p&gt;


	&lt;p&gt;Ready to start digging?  Let&amp;#8217;s go!&lt;/p&gt;&lt;h3&gt;Getting to know user ratings: fundamental descriptive statistics&lt;/h3&gt;


	&lt;p&gt;Like most online movie databases, the IMDb encourages its users to
rate movies on a numerical scale, in this case from 1 to 10.  The IMDb
software averages these ratings into a composite &amp;#8220;user rating&amp;#8221; for
each movie.  &lt;a href="http://imdb.com/title/tt0360717/"&gt;King Kong&lt;/a&gt;, for
example, currently has a user rating of 7.8.  &lt;a href="http://imdb.com/title/tt0388482/"&gt;Transporter
2&lt;/a&gt;, on the other hand, has a user
rating of 6.1.&lt;/p&gt;


	&lt;p&gt;Certainly, we have some sense of what these ratings mean.  &lt;span class="caps"&gt;A 6&lt;/span&gt;.1, for
example, is somewhat higher than the midpoint of the 1-to-10 scale.
Thus we might expect a 6.1-rated movie to be somewhat better than the
typical movie.  But is this expectation justified?  Also, we know a
7.8 is better than a 6.1.  But how much better?  Is it 1.7 stars
better?  And, if so, what does that mean?&lt;/p&gt;


	&lt;p&gt;To understand what user ratings mean, we must put them into context.
Let&amp;#8217;s assume that buried within the IMDb is some kind of useful
information that reflects the collective wisdom of the site&amp;#8217;s users.
When a movie is rated 7.8, we will assume that the rating means the
movie is &amp;#8220;better&amp;#8221; than lower-rated movies and &amp;#8220;worse&amp;#8221; than
higher-rated movies.  To what degree, we don&amp;#8217;t know for sure, but
that is what we are about to find out.&lt;/p&gt;


	&lt;p&gt;While we might not know what it means for a movie to be a &amp;#8220;7.8,&amp;#8221; we
probably do have a genuine sense for what it means for a movie to be
among the best of movies, or among the worst, or among the middle of
the pack.  We have developed this sense by experience, by watching
movies over our lifetimes.  What we need is some way of converting the
number 7.8 into something that registers with this
hard-earned experience.&lt;/p&gt;


	&lt;p&gt;As a starting point, let&amp;#8217;s examine the most fundamental descriptive
statistics of the IMDb&amp;#8217;s user ratings:&lt;/p&gt;


	&lt;table&gt;
		&lt;tr&gt;
			&lt;th style="text-align:right;"&gt;Count &lt;/th&gt;
			&lt;th style="text-align:right;"&gt;&amp;nbsp;Mean &lt;/th&gt;
			&lt;th style="text-align:right;"&gt;&amp;nbsp;Median &lt;/th&gt;
			&lt;th style="text-align:right;"&gt;&amp;nbsp;St.Dev. &lt;/th&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;23,396 &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;        6.2 &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;          6.4 &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;           1.4 &lt;/td&gt;
		&lt;/tr&gt;
	&lt;/table&gt;




	&lt;p&gt;Breaking them down:&lt;/p&gt;


	&lt;ul&gt;
	&lt;li&gt;&lt;em&gt;count&lt;/em&gt; &amp;#8211; There are 23,396 user ratings in the database.  (There are
  actually more, but to eliminate fringe movies I am considering only
  those movies that have been rated by more than 100 users.)&lt;/li&gt;
		&lt;li&gt;&lt;em&gt;mean&lt;/em&gt; &amp;#8211; The average user rating is 6.2.  While some ratings are
  lower and others higher, if you were to put all of the ratings in
  a blender and purée them into a homogeneous soup, the soup&amp;#8217;s
  overall rating would balance out to 6.2.&lt;/li&gt;
		&lt;li&gt;&lt;em&gt;median&lt;/em&gt; &amp;#8211; The rating that divides the database in half.  Ratings
  higher than 6.4 fall into the better half; ratings lower than 6.4, the
  worse half.&lt;/li&gt;
		&lt;li&gt;&lt;em&gt;standard deviation&lt;/em&gt; &amp;#8211; This is a measure of how spread out the
  ratings are.  Assuming the distribution of the ratings has a
  bell-curve shape, which we will investigate in a moment, about 85
  percent of the ratings will fall within one standard deviation of
  the mean, i.e., in the range 6.2 +/- 1.4 = 4.8 to 7.6.&lt;/li&gt;
	&lt;/ul&gt;


	&lt;p&gt;Another way to examine the ratings is graphically.  The following
chart, called a &lt;em&gt;histogram&lt;/em&gt;, shows how many movies had each possible
user rating:&lt;/p&gt;


	&lt;p&gt;&lt;img src="http://community.moertel.com/~thor/pix/20060114/hist-all.png" title="Histogram of IMDb movie ratings" alt="Histogram of IMDb movie ratings" /&gt;&lt;/p&gt;


	&lt;p&gt;The ratings form a pointy bell curve.  It&amp;#8217;s easy to see that few
movies have ratings lower than 4 or higher than 8; most movies fall in
between.  The movies are most densely packed in the range that is a bit
higher than 6 and a bit lower than 8.  I have plotted the mean
(the triangle) and median (the &amp;#8220;X&amp;#8221;) along the bottom of the chart to put
them into perspective.&lt;/p&gt;


	&lt;h3&gt;Exploring the extremes&lt;/h3&gt;


	&lt;p&gt;With this information, we can begin to make crude interpretations of
user ratings.  Say we hear that
&lt;a href="http://imdb.com/title/tt0327554/"&gt;Catwoman&lt;/a&gt; has a user rating of 3.4.
Before we looked at the histogram, we probably could have guessed that
the movie was not good.  (We may even have heard as much from friends.)
But now that we have seen the histogram, we know that very few movies
had a rating lower than 4, let alone 3.4, and so we know the movie is
among the worst ever released.  It is, no pun intended, an outright
dog.&lt;/p&gt;


	&lt;p&gt;On the other side of the spectrum, 
&lt;a href="http://imdb.com/title/tt0372784/"&gt;Batman Begins&lt;/a&gt; has a user
rating of 8.3.  Since we know that few movies rate better than 8,
we know that this movie is probably among the very best.&lt;/p&gt;


	&lt;p&gt;The following histogram shows where both movies stand:&lt;/p&gt;


	&lt;p&gt;&lt;img src="http://community.moertel.com/~thor/pix/20060114/hist-all-catwoman.png" title="Histogram of IMDb movie ratings, augmented" alt="Histogram of IMDb movie ratings, augmented" /&gt;&lt;/p&gt;


	&lt;p&gt;So far, we understand the extremes of the rating system.  Movies lower
than 4 are probably terrible, and movies higher than 8 are probably
great.  No doubt, that is useful information.  But, what about that big
lump in the middle which represents the bulk of movies?  That is where
there real gold is hidden.  To get it, we must dig deeper.&lt;/p&gt;


	&lt;h3&gt;Charting the inner masses&lt;/h3&gt;


	&lt;p&gt;We already know Catwoman is bad, but how bad is it?  One way to
quantify its badness is to count how many movies in the database are
equally bad or worse, and compare that count to the size of the entire
database.  In the database, there are 1,060 movies with Catwoman&amp;#8217;s 3.4
user rating or lower.  The size of the entire database is 23,396
movies.  Dividing the first number by the second, we find that
Catwoman is among the worst 5 percent of movies the database.
It is in the &lt;em&gt;5th percentile.&lt;/em&gt;&lt;/p&gt;


	&lt;p&gt;We just turned a 3.4 user rating into a percentage that tells us where
3.4-rated movies stand with respect to all of the movies within the
database.  If we repeat the process for all possible movie ratings and
plot the results, we get a chart like this:&lt;/p&gt;


	&lt;p&gt;&lt;img src="http://community.moertel.com/~thor/pix/20060114/ecdf-all-catwoman.png" title="Empirical cumulative distribution of IMDb movie ratings" alt="Empirical cumulative distribution of IMDb movie ratings" /&gt;&lt;/p&gt;


	&lt;p&gt;Each point on the S-shaped curve relates a movie&amp;#8217;s rating with its
standing in the database.  The circle on the lower portion of the curve,
for example, represents Catwoman.  Its position corresponds to a 3.4
user rating on the horizontal axis and a 0.05 portion (5 percent) on
the vertical axis.  Thus a 3.4-rated movie is in the 5th percentile.
The triangle on the upper portion of the curve corresponds to Batman
Begins, relating the movie&amp;#8217;s 8.3 rating to its glorious standing in the
97th percentile.&lt;/p&gt;


	&lt;p&gt;Because the curve covers all ratings, not just the extremes, we now
have a way to quantify the goodness or badness of middle-ground
movies.  Let&amp;#8217;s return to &lt;a href="http://imdb.com/title/tt0360717/"&gt;King Kong&lt;/a&gt;,
currently rated 7.8, and &lt;a href="http://imdb.com/title/tt0388482/"&gt;Transporter
2&lt;/a&gt;, currently rated 6.1.  Look up
their percentiles on the curve above.  (Try it.)  If you are careful,
you should get close to the actual values of 91 and 42, respectively.&lt;/p&gt;


	&lt;p&gt;This would be a good time to reflect upon our intuition about user
ratings.  Earlier, we thought a 6.1 user rating suggested that a movie
was somewhat better than the typical movie.  Now, however, we see that
a 6.1 is worth somewhat less than is typical.&lt;/p&gt;


	&lt;p&gt;Even though their ratings differ by only 1.7 user-rating units, King
Kong is in the 91st percentile &amp;#8211; very good &amp;#8211; and Transporter 2 is way
down in the 42nd percentile &amp;#8211; not so good.  To look at the difference
another way, about &lt;em&gt;half&lt;/em&gt; of the movies in the database fall in
between Transporter 2 and King Kong: 0.91 &amp;#8211; 0.42 = 0.49.  A small
difference in user ratings can represent a large difference in
standings, which might further challenge our intuition about ratings.&lt;/p&gt;


	&lt;p&gt;Additionally, differences in standings are not proportional to
differences in user ratings.  Catwoman, for example, has a user rating
of 3.4 and falls into the 5th percentile.  Transporter 2, with its 6.1
user rating, is a whole 2.7 user-rating units away from Catwoman, but
only 37 percent of movies stand between them.  Even though Transporter
2 is closer to King Kong in terms of user ratings, it is really closer
to Catwoman in terms of standing.&lt;/p&gt;


	&lt;h3&gt; Movie-rating decoder ring&lt;/h3&gt;


	&lt;p&gt;A chart is great for understanding the relationship between user
ratings and movie standings, but it is not ideal for day-to-day use,
when we just want to figure out where a movie stands before deciding
whether it is worth watching.  For times like that, a lookup table is a
convenient alternative.  The table below, for example, summarizes the
rating-standing relationship in a convenient &amp;#8220;decoder-ring&amp;#8221; format.
Find a movie&amp;#8217;s rating in the left column, and the corresponding entry
in the right column gives the movie&amp;#8217;s standing.&lt;/p&gt;


	&lt;table&gt;
		&lt;tr&gt;
			&lt;th&gt; Rating  &lt;/th&gt;
			&lt;th&gt;Percentile &lt;/th&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   4.00  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;      8     &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   5.00  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     19     &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   5.25  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     22     &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   5.50  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     29     &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   5.75  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     33     &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   6.00  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     40     &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   6.25  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     45     &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   6.50  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     55     &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   6.75  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     61     &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   7.00  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     70     &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   7.25  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     76     &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   7.50  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     84     &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   7.75  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     89     &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   8.00  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     94     &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   8.25  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     96     &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   8.50  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     98     &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   8.75  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;     99     &lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;td style="text-align:right;"&gt;   9.00  &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;    100     &lt;/td&gt;
		&lt;/tr&gt;
	&lt;/table&gt;




	&lt;p&gt;Using King Kong as an example again, let&amp;#8217;s look up 7.8.  It turns out
that 7.8 is not in the table, but 7.75 is, and it corresponds to the
89th percentile.  So we can guesstimate that King Kong is a bit above
the 89th percentile, which, as we know from earlier, is correct, the
actual value being 91.  The decoder ring is not as precise as the
chart, but it is more than good enough for finding a movie&amp;#8217;s
approximate standing quickly &amp;#8211; something that might be handy on
a Friday night.&lt;/p&gt;


	&lt;h3&gt; Summary: weighing the gold&lt;/h3&gt;


	&lt;p&gt;What have we dug up so far?  First, we computed a few essential
descriptive statistics of the IMDb&amp;#8217;s user ratings.  We learned that
the average rating is 6.2 and that the median, which divides the
ratings into better and worse halves, is 6.4.&lt;/p&gt;


	&lt;p&gt;Second, we plotted a histogram in order to inspect the ratings
visually.  Right away, we could tell that movies rated lower than 4
are among the very worst, and movies rated higher than 8 are among the
very best.&lt;/p&gt;


	&lt;p&gt;Third, in order to give more meaning to ratings in between those two
extremes, we turned to percentiles.  We computed Catwoman&amp;#8217;s by hand;
it&amp;#8217;s in the 5th percentile &amp;#8211; ouch!  Then we plotted a curve that
represents the relationship between user ratings and percentiles.
Using this curve we determined that King Kong is in the 91st
percentile and Transporter 2 is in the 42nd percentile &amp;#8211; a large
difference in movie standings.&lt;/p&gt;


	&lt;p&gt;Finally, we created a tabular &amp;#8220;decoder ring&amp;#8221; to summarize what the
curve depicted.  It is a quick and easy way to find a movie&amp;#8217;s
standing given its user rating.&lt;/p&gt;


	&lt;p&gt;That concludes our first dig of the Internet Movie Database.  Next
time, we will examine the factors that influence movie ratings.  Are
Documentaries better than Horror flicks?  Are old movies generally
better than new movies?  We will ask those questions and more in the
next part of the series.&lt;/p&gt;


	&lt;p&gt;Until then, enjoy a movie or two.  And don&amp;#8217;t forget your slide-rule.&lt;/p&gt;


	&lt;h3&gt;Acknowledgments&lt;/h3&gt;


	&lt;p&gt;The movie information used in this article is courtesy of &lt;a href="http://www.imdb.com"&gt;The Internet
Movie Database&lt;/a&gt; and used with permission.&lt;/p&gt;


	&lt;p&gt;Second, my analysis was performed with
&lt;a href="http://www.r-project.org/about.html"&gt;R&lt;/a&gt; software from the &lt;a href="http://www.r-project.org/"&gt;R Project
for Statistical Computing&lt;/a&gt;.  R is a great
statistics package.  It&amp;#8217;s Free Software, and it has a great community
around it.  &lt;em&gt;Do&lt;/em&gt; check it out.&lt;/p&gt;</description>
      <pubDate>Tue, 17 Jan 2006 20:59:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:305161727c36a521acc37a1452ee7be2</guid>
      <author>Tom Moertel</author>
      <link>http://blog.moertel.com/articles/2006/01/17/mining-gold-from-the-internet-movie-database-part-1</link>
      <category>movies</category>
      <category>statistics</category>
      <category>R</category>
      <category>imdb</category>
      <category>statistics</category>
      <category>movies</category>
      <trackback:ping>http://blog.moertel.com/articles/trackback/22</trackback:ping>
    </item>
  </channel>
</rss>
