Solving the Google Code Jam "countPaths" problem in Ruby

By
Posted on
Tags: , , ,

Here’s a Ruby version of a dynamic-programming-based solver for the Google Code Jam “countPaths” problem. It is essentially the same as my earlier Haskell-based solution (see Update 2), but much slower. Whereas the Haskell version solves the maximum-size, all-the-same-letter problem in about 0.9 second, the Ruby version requires about 71 seconds. Maybe somebody who understands Ruby’s internals better than I do can come up with some optimizations.

Here’s the code:

# Tom Moertel <tom@moertel.com>
# 2006-08-16
#
# Ruby-based solution to the Google Code Jam problem "countPaths"
# See http://www.cs.uic.edu/~hnagaraj/articles/code-jam/ for more.

class WordPath

  include Enumerable

  def initialize(grid, word)
    @grid, @rword, @counts = grid, word.reverse, {}
  end

  def self.count_paths(grid, word)
    new(grid, word).solve
  end

  def solve
    final_index = @rword.length - 1
    inject(0) { |sum, rc| sum + count_from(final_index, *rc) }
  end

  private

  def count_from(i, r, c)
    @counts[[r, c, i]] ||= begin
      match = @rword[i] == @grid[r][c]
      case
        when i == 0 && match then 1
        when match then subsum_of_neighbors(r, c, i - 1)
        else 0
      end
    end
  end

  def subsum_of_neighbors(r, c, i)
    sum = 0
    rowlen = @grid[0].size
    for nr in [r - 1, r, r + 1]
      next if nr < 0 or nr >= @grid.size
      for nc in [c - 1, c, c + 1]
        next if nc < 0 || nc >= rowlen
        next unless r != nr || c != nc
        if count = count_from(i, nr, nc)
          sum += count
        end
      end
    end
    sum
  end

  def each
    @grid.each_index do |r|
      @grid[0].size.times { |c| yield([r, c]) }
    end
  end

end


# TESTS

if ENV["TEST"] || ENV["BIG_TEST"]

  require "test/unit"

  class TestWordPath < Test::Unit::TestCase

    if ENV["BIG_TEST"]
      def test_big_problem
        assert_equal \
          303835410591851117616135618108340196903254429200,
          WordPath.count_paths(["A"*50] * 50, "A"*50)
      end
    end

    if ENV["TEST"]
      def test_count_paths
        w = WordPath
        assert_equal 1,
          w.count_paths(%w{ABC FED GHI}, "ABCDEFGHI")
        assert_equal 2,
          w.count_paths(%w{ABC FED GAI}, "ABCDEA")
        assert_equal 0,
          w.count_paths(%w{ABC DEF GHI}, "ABCD")
        assert_equal 108,
          w.count_paths(%w{AA AA}, "AAAA")
        assert_equal 56448,
          w.count_paths(%w{ABABA BABAB ABABA BABAB ABABA}, "ABABABBA")
        assert_equal 2745564336,
          w.count_paths(%w{AAAAA AAAAA AAAAA AAAAA AAAAA}, "AAAAAAAAAAA")
        assert_equal 0,
          w.count_paths(%w{AB CD}, "AA" )
        assert_equal 1,
          w.count_paths(%w{A}, "A")
      end
    end

  end

end

Set the BIG_TEST and/or TEST environment variables to run the test suites. For example:

$ <code>TEST=1 ./countpaths.rb</code>
Loaded suite countpaths
Started
.
Finished in 0.02062 seconds.

1 tests, 8 assertions, 0 failures, 0 errors

Unless somebody beats me to it, I’ll whip up a Perl version for comparison.

Update: I managed to speed up my code by a factor of 17. Now the execution time for the maximum-size, all-the-same-letter problem is down to 4.2 seconds, which is comparable with implementations in other languages. Ivan Peev’s Python implementation, for example, is only slightly faster at 2.8 seconds.

A performance killer in the previous version was using a single big hash for my cache. Now I use a 3D array:

counts[[i,r,c]]   # one big hash (slower)
counts[i][r][c]   # 3D-array (faster)

An additional advantage of the 3D-array is that I can peel off slabs as I descend the outer layers of nested loops. For instance, instead of writing:

for i in 0 .. 10
  for j in 0 .. 10
    sum += counts[i][j]
  end
end

I can lift the counts[i] slab out of the inner loop to eliminate j array-indexing operations:

for i in 0 .. 10
  slab = counts[i]
  for j in 0 .. 10
    sum += slab[j]
  end
end

Here’s the new code (sans the unit tests, which haven’t changed):

class WordPath

  A = Array

  def self.count_paths(grid, word)

    rword  = word.reverse
    rowmax = grid.size - 1
    colmax = grid.first.size - 1

    for i in 0 .. rword.size - 1
      letter = rword[i]
      previous_slab, slab = slab, A.new(rowmax+1) { A.new(colmax+1) }
      for r in 0 .. rowmax
        row, line = grid[r], slab[r]
        for c in 0 .. colmax
          line[c] = unless letter == row[c]
            0
          else
            if i == 0
              1
            else
              sum = 0
              clo = c > 0 ? c - 1 : c
              chi = c < colmax ? c + 1 : c
              for nr in (r > 0 ? r - 1 : r) .. (r < rowmax ? r + 1 : r)
                for nc in clo .. chi
                  sum += previous_slab[nr][nc] if nr != r || nc != c
                end
              end
              sum
            end
          end
        end
      end
    end

    sum = 0
    for r in 0 .. rowmax
      for c in 0 .. colmax
        sum += slab[r][c]
      end
    end

    sum

  end
end

Update 2: I tweaked the code snippet above to remove a variable that I just noticed wasn’t actually doing anything.