<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/stylesheets/rss.css" type="text/css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Tom Moertel's Weblog: Category sysadmin</title>
    <link>http://blog.moertel.com/articles/category/sysadmin</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Quality rants on programming theory and stuff geeks like</description>
    <item>
      <title>A couple of tips for writing Puppet manifests</title>
      <description>&lt;p&gt;I recently started using &lt;a href="http://reductivelabs.com/trac/puppet"&gt;Puppet&lt;/a&gt;
to automate my server-build processes.  The basic idea behind Puppet
is that you create &amp;#8220;manifests&amp;#8221; that declare
a directed graph of &amp;#8220;resources&amp;#8221; that represents the desired state of
your machines.  Puppet-managed machines on your network then query a
master server to obtain the latest copy of the graph, which they then
reconcile with their current states to make whatever changes are
necessary to bring themselves up to date.&lt;/p&gt;


	&lt;p&gt;For the most part, everything works well.  I have encountered a couple
of snags when writing manifests, however, so I&amp;#8217;m going to explain them
here as reminder until I get the time to fix them in the Puppet code and send
patches upstream.&lt;/p&gt;


	&lt;p&gt;First, don&amp;#8217;t use hyphens in class names.  While hyphens are legal
in class names, they are not allowed in qualified variables, thus
variables defined within hyphen-named classes are inaccessible
from the outside world.&lt;/p&gt;


	&lt;p&gt;Second, and this one is both tricky and important, Puppet handles
prerequisites for definitions by silently passing those prerequisites on
to all of the resources within the definitions.  Definitions, in
effect, don&amp;#8217;t really have their own prerequisites, they just pass them on to
their children.  But &amp;#8211; and here&amp;#8217;s the problem &amp;#8211; if those child
resources declare their own prerequisites, those prerequisites will
&lt;em&gt;overwrite the passed-on prerequisites&lt;/em&gt;, effectively causing them to
be ignored.&lt;/p&gt;


	&lt;p&gt;This problem bit me hard when trying to create a definition for
installing Ruby Gems from a local cache of gems:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;define local_gem($gem) {
    $path = "/var/local/local-gems/$gem" 
    file { $path:
        ensure  =&amp;gt; present,
        source  =&amp;gt; "puppet://puppet/files/gems/$gem",
        require =&amp;gt; File["local-gems-dir"],
        owner   =&amp;gt; root,
        group   =&amp;gt; root,
        mode    =&amp;gt; 0664,
    }
    package { $title:
        ensure   =&amp;gt; installed,
        provider =&amp;gt; "gem",
        require  =&amp;gt; [ Package["rubygems"], File[$path] ],
        source   =&amp;gt; $path,
    }
}
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;The intent was to be able to declare a local gem like so:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;local_gem { "sqlite3-ruby":
    gem     =&amp;gt; "sqlite3-ruby-1.2.1.gem",
    require =&amp;gt; Package["sqlite-devel"]
}
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Thus the &amp;#8220;sqlite3-ruby&amp;#8221; local gem has the single prerequisite of the
&amp;#8220;sqlite-devel&amp;#8221; package &amp;#8211; or at least that&amp;#8217;s what I expected.  What
happened on deployment was that the prerequisite was ignored because
when it was passed on to the inner file and package resources, those
resources had their own &lt;em&gt;require&lt;/em&gt; parameters, and those parameters
overwrote the passed-on prerequisite.&lt;/p&gt;


	&lt;p&gt;The work-around is somewhat hacky.  I augmented the definition with a do-nothing resource
that has no &lt;em&gt;require&lt;/em&gt; parameter of its own.  This
resource does nothing but capture the passed-on prerequisites.  Then I made
all of the other resources in the definition include the do-nothing
resource as one of their prerequisites.  Thus they are made to inherit the
passed-on prerequisites.&lt;/p&gt;


	&lt;p&gt;My final definition looks like this:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;define local_gem($gem) {

    # dummy exec to propagate requires from local_gem
    exec { $name: command =&amp;gt; "/bin/true" }

    $path = "/var/local/local-gems/$gem" 
    file { $path:
        ensure  =&amp;gt; present,
        source  =&amp;gt; "puppet://puppet/files/gems/$gem",
        require =&amp;gt; [ Exec[$name], File["local-gems-dir"] ],
        owner   =&amp;gt; root,
        group   =&amp;gt; root,
        mode    =&amp;gt; 0664,
    }
    package { $title:
        ensure   =&amp;gt; installed,
        provider =&amp;gt; "gem",
        require  =&amp;gt; [ Exec[$name], Package["rubygems"], File[$path] ],
        source   =&amp;gt; $path,
    }
}
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Notice how the file and package resource both require the dummy exec resource.
That&amp;#8217;s the trick that allows them to require the prerequisites passed on from
the local_gem definition.&lt;/p&gt;


	&lt;p&gt;It&amp;#8217;s not pretty, but it works.  &lt;a href="http://mail.madstop.com/pipermail/puppet-users/2007-March/001953.html"&gt;See this email on the puppet-users mailing list&lt;/a&gt; for more on the problem.&lt;/p&gt;</description>
      <pubDate>Thu, 15 Nov 2007 02:30:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:4fc6dc28-ae38-4d72-ac05-d9a65d6653f8</guid>
      <author>Tom Moertel</author>
      <link>http://blog.moertel.com/articles/2007/11/15/a-couple-of-tips-for-writing-puppet-manifests</link>
      <category>sysadmin</category>
      <category>rails</category>
      <category>puppet</category>
      <category>manifests</category>
      <category>gems</category>
      <trackback:ping>http://blog.moertel.com/articles/trackback/622</trackback:ping>
    </item>
    <item>
      <title>Typo-4.0.3 instability and a minor patch for sqlite3-ruby</title>
      <description>&lt;p&gt;Since I upgraded my blog from &lt;a href="http://typosphere.org/"&gt;Typo&lt;/a&gt; 4.0.0 to
4.0.3, it has been somewhat unstable.  About once a day it starts
responding with &amp;#8220;500 Internal Server Error&amp;#8221; and stays that way until I
restart it.&lt;/p&gt;


	&lt;p&gt;The root of the problem seems to be the database
connection, as evidenced by this exception showing up in the
production log:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;SQLite3::CantOpenException (could not open database)
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Unfortunately, the exception doesn&amp;#8217;t provide anything specific
to go on.&lt;/p&gt;


	&lt;p&gt;A quick look at the
&lt;a href="http://rubyforge.org/projects/sqlite-ruby/"&gt;sqlite3-ruby&lt;/a&gt; code
suggested that I was not going to get the specifics, either.  The Ruby-based wrapper
never calls &lt;a href="http://www.sqlite.org/capi3ref.html#sqlite3_errmsg"&gt;sqlite3_errmsg&lt;/a&gt; after a call to &lt;a href="http://www.sqlite.org/capi3ref.html#sqlite3_open"&gt;sqlite3_open&lt;/a&gt; fails on behalf of SQLite3::Database.new.&lt;/p&gt;


	&lt;p&gt;A quick patch, however, fixed the problem:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;--- sqlite3-ruby-1.1.0.orig/lib/sqlite3/database.rb
+++ sqlite3-ruby-1.1.0/lib/sqlite3/database.rb
@@ -109,7 +109,7 @@
       @statement_factory = options[:statement_factory] || Statement

       result, @handle = @driver.open( file_name, utf16 )
-      Error.check( result, nil, "could not open database" )
+      Error.check( result, self, "could not open database" )

       @closed = false
       @results_as_hash = options.fetch(:results_as_hash,false)
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;(Submitted as &lt;a href="http://rubyforge.org/tracker/index.php?func=detail&amp;#38;aid=5504&amp;#38;group_id=254&amp;#38;atid=1043"&gt;Ticket 5504&lt;/a&gt; on &lt;a href="http://rubyforge.org/"&gt;RubyForge&lt;/a&gt;.)&lt;/p&gt;


	&lt;p&gt;Before applying the patch, opening a database at a nonexistent path results in
a generic error message:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;$ ruby -r rubygems -e 'require_gem "sqlite3-ruby";
    SQLite3::Database.new("/no/such/path/db")'

&lt;/code&gt;... could not open database (SQLite3::CantOpenException) ...
&lt;/pre&gt;

	&lt;p&gt;After applying the patch, we get additional error information:&lt;/p&gt;


&lt;pre&gt;... could not open database: unable to open database file
    (SQLite3::CantOpenException) ...
&lt;/pre&gt;

	&lt;p&gt;With the patch in place, all I have to do is wait for Typo to start
acting up again.  Then I&amp;#8217;ll have some interesting information in the
log.&lt;/p&gt;


	&lt;p&gt;Until then, I&amp;#8217;m relying on &lt;a href="http://en.wikipedia.org/wiki/Crontab"&gt;cron&lt;/a&gt;
and a short monitoring script to restart Typo when it tips into
foolishness:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;#!/bin/bash

url=http://blog.moertel.com/admin
addrs=tom@moertel.com

response=$(GET -sd $url 2&amp;gt;&amp;#38;1)

if [ "$response" != "200 OK" ]; then
    { echo "Response was: $response"; echo; service typo restart; } |
    mail -s "Blog site not responding! (Restarting)" $addrs
fi
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;We&amp;#8217;ll see how it goes.&lt;/p&gt;


&lt;div class="update"&gt; &lt;strong&gt;Update:&lt;/strong&gt; That was fast.  The error popped up
again and this time the log told me something useful: &amp;#8220;unable to open
database file.&amp;#8221;  Now, why couldn&amp;#8217;t Typo open the database file,
especially since the file is perfectly fine and had been opened
successfully (many times) by the very same Typo process earlier?  Here&amp;#8217;s
a hint:

&lt;pre&gt;&lt;code&gt;$ ls /proc/28788/fd | wc -l
&lt;/code&gt;1023
&lt;/pre&gt;

	&lt;p&gt;Seems like there&amp;#8217;s a resource leak in Typo 4.0.3 (or Rails 1.1.6).
Under some conditions, instead of reusing existing database
connections, Typo keeps trying to open new ones.  Eventually, it uses
up its allotment of file descriptors and the operating system is forced
to say, &amp;#8220;That&amp;#8217;s enough, pal,&amp;#8221; (&lt;a href="http://www.wlug.org.nz/EMFILE"&gt;&lt;code&gt;EMFILE&lt;/code&gt;&lt;/a&gt;).&lt;/p&gt;


	&lt;p&gt;I&amp;#8217;ll look in to it more in the morning.&lt;/p&gt;


&lt;strong&gt;Update 2:&lt;/strong&gt; &lt;a href="http://blog.moertel.com/articles/2006/08/24/database-connection-leak-in-typo-4-0-3-problem-solved"&gt;Problem solved&lt;/a&gt;.
&lt;/div&gt;</description>
      <pubDate>Thu, 24 Aug 2006 00:41:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:2e527a1f-3415-4322-9f0f-244b45a3b695</guid>
      <author>Tom Moertel</author>
      <link>http://blog.moertel.com/articles/2006/08/24/typo-4-0-3-instability-and-a-minor-patch-for-sqlite3-ruby</link>
      <category>ruby</category>
      <category>typo</category>
      <category>rails</category>
      <category>sysadmin</category>
      <category>typo</category>
      <category>sqlite3</category>
      <category>rails</category>
      <trackback:ping>http://blog.moertel.com/articles/trackback/163</trackback:ping>
    </item>
    <item>
      <title>How to make sure your servers come back up after an extended power outage</title>
      <description>&lt;p&gt;If an extended power outage drains your &lt;span class="caps"&gt;UPS&lt;/span&gt;, and your servers are
forced to shut down, will they automatically start up again when the
power is eventually restored?  It&amp;#8217;s a good question, especially
if your servers are in some distant, unattended server room.
Unless you&amp;#8217;ve tested your servers, don&amp;#8217;t assume that the answer
is Yes.&lt;/p&gt;


	&lt;p&gt;Many servers offer a &lt;span class="caps"&gt;BIOS&lt;/span&gt; configuration option that forces them to
automatically power on when they receive line voltage.  If your
servers have this option, just set it and you&amp;#8217;re done.&lt;/p&gt;


	&lt;p&gt;Unfortunately, some servers, including a Dell PowerEdge 1600SC
that I&amp;#8217;m using, lack this configuration option.  When these servers
turn themselves off as the final step of a &lt;span class="caps"&gt;UPS&lt;/span&gt;-controlled
shutdown, they don&amp;#8217;t start up again when the power is restored.
Because they were shut down &lt;em&gt;before&lt;/em&gt; the power was cut off, they think
they are supposed to remain off when the power is restored.  That is,
they remember their on/off status across power outages.&lt;/p&gt;


	&lt;p&gt;Fortunately, there is a way to make sure these servers automatically
power on: shut them  down without powering them off; halt them
instead.  That way, when the &lt;span class="caps"&gt;UPS&lt;/span&gt; finally cuts off the supply voltage,
the servers will still be in their &amp;#8220;on&amp;#8221; state, and they will remember
this state across the outage. Later, when the power is restored, the servers
will automatically restore their pre-outage state and power up.&lt;/p&gt;


	&lt;p&gt;With Fedora Core Linux and &lt;a href="http://www.networkupstools.org/"&gt;Network &lt;span class="caps"&gt;UPS&lt;/span&gt;
Tools&lt;/a&gt;, it&amp;#8217;s not difficult to make
sure the servers are halted instead of powered off, but the implementation
isn&amp;#8217;t obvious.  To spare you the digging, here are the
important bits.&lt;/p&gt;


&lt;ol&gt;

&lt;li&gt;When the power fails and the &lt;span class="caps"&gt;UPS&lt;/span&gt;-monitoring software decides that
the batteries are almost depleted, it will initiate a server shutdown
using the command defined in the &lt;code&gt;/etc/ups/upsmon.conf&lt;/code&gt;
file.  The default command is this:

&lt;pre&gt;&lt;code&gt;SHUTDOWNCMD "/sbin/shutdown -h +0" 
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;

&lt;li&gt;The shutdown command will tell the &lt;code&gt;init&lt;/code&gt; process
to enter runlevel 0, which is the prepare-to-halt-the-system runlevel.&lt;/li&gt;

&lt;li&gt;The &lt;code&gt;init&lt;/code&gt; process will stop all of the running
services in an orderly fashion, and then, as the last step, invoke the
final script in the shutdown process:
&lt;code&gt;/etc/rc.d/rc0.d/S01halt&lt;/code&gt;.&lt;/li&gt;

&lt;li&gt;The final lines of the &lt;code&gt;S01halt&lt;/code&gt; script will
power off the server.  Unless, that is, the file &lt;code&gt;/halt&lt;/code&gt; is
present, in which case the script will halt the server instead.&lt;/li&gt;

&lt;/ol&gt;

	&lt;p&gt;Thus the trick is to make sure that the &lt;code&gt;/halt&lt;/code&gt;
file &lt;em&gt;does&lt;/em&gt; exist.  The trick turns out to be easy to pull off;
just redefine the shutdown command in &lt;code&gt;/etc/ups/upsmon.conf&lt;/code&gt;:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;SHUTDOWNCMD "/bin/touch /halt; /sbin/shutdown -h +0" 
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;And that&amp;#8217;s all there is to it!&lt;/p&gt;</description>
      <pubDate>Wed, 09 Aug 2006 00:35:00 -0400</pubDate>
      <guid isPermaLink="false">urn:uuid:488c24a8-dcb3-4015-8d8e-09f6267e6051</guid>
      <author>Tom Moertel</author>
      <link>http://blog.moertel.com/articles/2006/08/09/how-to-make-sure-your-servers-come-back-up-after-an-extended-power-outage</link>
      <category>linux</category>
      <category>hardware</category>
      <category>sysadmin</category>
      <category>hardware</category>
      <category>ups</category>
      <category>linux</category>
      <category>fedora</category>
      <category>nut</category>
      <category>power</category>
      <category>shutdown</category>
      <category>halt</category>
      <trackback:ping>http://blog.moertel.com/articles/trackback/151</trackback:ping>
    </item>
  </channel>
</rss>
