====== How to be reboot ready ====== There are 2 ways to have your applications automatically start at boot time on Solaris: SMF and Legacy Init Scripts. Lets examine init scripts first. ===== init scripts ===== Legacy Init Scripts are just what the name suggests, the good ol' RC scripts you've used forever on System V based UNIX. They typically look something like this: $ cat /etc/init.d/acct #!/sbin/sh state="$1" case "$state" in 'start') echo 'Starting process accounting' /usr/lib/acct/startup ;; 'stop') echo 'Stopping process accounting' /usr/lib/acct/shutacct ;; *) echo "Usage: $0 { start | stop }" exit 1 ;; esac exit 0 As you can see in the example above, a legacy init script typically is simply a case statement that handles at least 2 arguments: start and stop. Very often these scripts will have other options like "status", "restart", "refresh" and others. These script are stored in /etc/init.d. They are then symlinked into various RC directories, 1 directory per run-level the most commonly used being **/etc/rc2.d** and **/etc/rc3.d**. Init scripts symlinked into these directories are prefixed with either a capital S for "Start" or K for "Kill", followed by two numbers. When a system is booted scripts in these run-level directories are run if they start with a S and sequentially based on the two digits, so S00whatever is run first, then S01something, so on and so forth. This is why user added scripts tend to be named S99something, to ensure that they run last. If you have two scripts with the same digits (S99apache and S99bind) thats fine. So lets say you want **/etc/init.d/cswapache2** to start at boot, you'd do this: # ln -s /etc/init.d/cswapache2 /etc/rc3.d/S99cswapache2 In a like manner, if something is already set to run at boot but you don't want it to, you can either rename the script so it doesn't begin with a capital S or just delete the symlink. ===== So why not init scripts ===== This is the simple and long accepted way of starting things. Why why bother with something new? Several reasons actually: - Init scripts are stupid. If your application or daemon stops running there is nothing to notice that it stopped and restart it. The long time work around for this was either to use HA software or to have it started in the ///etc/inittab// which will restart a failed process but isn't smart enough to know that at some point it should give up, it'll just keep restarting it forever. - Init scripts start sequentially, they don't start in parallel. This is because any script may be dependent on any earlier run script and since it doesn't know what those dependencies are it can't make any type of judgment on how to go about things. Consequently you get longer boot times. - The system has no idea whats broken, little less why. This is back to the "init is stupid". If a script runs and errors out for some reason the init system doesn't care, it just keeps going on down the list of stuff to start. - Because the init scripts have no sense of dependencies, if restarting Apache, for instance, requires restarting OpenLDAP as well, it doesn't know to do that unless you specifically modify the script. It'd be nice if when something was restarted anything else effected by it would be restarted as well. - And on and on.... ===== SMF ===== We've lived with these flaws far too long. Now several solutions have been available and one of the most powerful is Solaris's Service Management Facility, otherwise known as SMF. SMF is dependancy aware, human friendly, and very smart. With SMF we start to look at our applications and daemons as services. Using the //svcs// command we can view running services, if you add the "-a" option it'll show you all services running or not. $ svcs STATE STIME FMRI legacy_run Nov_19 lrc:/etc/rc2_d/S20sysetup legacy_run Nov_19 lrc:/etc/rc2_d/S72autoinstall legacy_run Nov_19 lrc:/etc/rc2_d/S73cachefs_daemon legacy_run Nov_19 lrc:/etc/rc2_d/S85cswsaslauthd legacy_run Nov_19 lrc:/etc/rc2_d/S89PRESERVE legacy_run Nov_19 lrc:/etc/rc2_d/S98deallocate legacy_run Nov_19 lrc:/etc/rc3_d/S50cswapache2 online Nov_19 svc:/system/svc/restarter:default online Nov_19 svc:/system/filesystem/root:default online Nov_19 svc:/network/loopback:default ... online 10:20:09 svc:/network/nfs/cbd:default online 10:20:09 svc:/network/nfs/nlockmgr:default online 11:25:07 svc:/application/postgres:default Here we can see the legacy init scripts that are running, and a few of the SMF services that are online, as well as when they last changed state (ie: started). You'll notice that each service has an identifying "FMRI" (Fault Management Resource Identifier) which is used by other Solaris frameworks such the Fault Management Architecture (FMA). Dealing with services is easy. We can use the //svcadm// command to "enable", "disable", "refresh", "restart", or otherwise change the state of a given service. $ svcs -a | grep -i mysql disabled Nov_17 svc:/network/cswmysql5:default $ svcadm enable svc:/network/cswmysql5:default $ svcs -a | grep -i mysql online 16:54:36 svc:/network/cswmysql5:default $ svcadm restart svc:/network/cswmysql5:default $ date Tue Nov 21 16:55:26 PST 2006 $ svcs -a | grep -i mysql online 16:55:27 svc:/network/cswmysql5:default In this example above I looked for any MySQL services and found //network/cswmysql5//, so I enabled it, verified that it was online, then restarted it and checked again. Notice that the time at which it was started is displayed. Now lets see one way in which SMF is superior to legacy init scripts. When SMF starts something it has a "contract" for that service. That contract keeps track of whats running for any given service. Using the "-p" option we can see what processes are part of a services contract and take advantage of that intellegance. $ svcs -p network/cswmysql5 STATE STIME FMRI online 16:55:27 svc:/network/cswmysql5:default 16:55:27 28938 mysqld_safe 16:55:27 29004 mysqld $ kill -9 29004 $ svcs -p network/cswmysql5 STATE STIME FMRI online* 17:00:01 svc:/network/cswmysql5:default 16:55:27 28938 mysqld_safe 17:00:01 29228 mysqld $ mysql -u mysql ... mysql> \q Bye Notice here that I used //svcs -p// to list the processes associated with my MySQL5 service. Then I brutally killed mysqld and faster than I can blink the proccess was restarted! You can see that represented by the "STIME" for mysqld. The asterisk ("online*") indicates that the service is currently in a transistion state, in this case transitioning to online, but as you can see MySQL is already back in action. But SMF isn't restarting thing in brain-dead mode like an inittab, we can define thresholds reguarding restarts. For instance, if SMF restarts a service more than 3 times in 60 seconds, something probly very wrong and it should stop attempting it. At that point it'll put the service in a "maintance" mode, and it will stay that way until you clear the state with //svcadm clear some/service//. Lets look at an example of something broken trying to start. I'm going to break MySQL and then try to start it... $ mv /opt/csw/mysql5/var/ /opt/csw/mysql5/xxx-var/ $ svcadm enable network/cswmysql5 $ svcs network/cswmysql5 STATE STIME FMRI maintenance 17:29:01 svc:/network/cswmysql5:default $ svcs -vx svc:/network/cswmysql5:default (?) State: maintenance since Tue Nov 21 17:29:01 2006 Reason: Restarting too quickly. See: http://sun.com/msg/SMF-8000-L5 See: /var/svc/log/network-cswmysql5:default.log Impact: This service is not running. So I moved MySQL's data directory, obviously it can't start without it. When I enable the service it ends up in "maintenance". Using SMF's most magical command //svcs -vx// we can see a listing of all services that failed to start, why they failed to start, some information about them, the log location, all dependencies of that service that can't start as a result, and even a URL to a page that'll tell us more! Now lets resolve the issue and bring the service back online: $ mv /opt/csw/mysql5/xxx-var/ /opt/csw/mysql5/var/ $ svcs network/cswmysql5 STATE STIME FMRI maintenance 17:29:01 svc:/network/cswmysql5:default $ svcadm clear network/cswmysql5 $ svcs network/cswmysql5 STATE STIME FMRI online 17:32:57 svc:/network/cswmysql5:default The usefulness of the //svcs -vx// command can not be overstated. The first thing I run when logging into any Solaris 10 or OpenSolaris machine is this command. **So how do you actually use SMF with your own service?** SMF Services are defined in XML Manifests. These **manifests** describe how to start, stop, restart, and refresh (reload the configuration) your application, what dependancies it has, various thresholds, as well as various meta-data that may be useful such as man pages that apply to that service. In addition to the manifest, scripts just like your legacy init scripts can be used which we call **methods**, or "method scripts". Service configuration changes can be made by using the //svccfg// ("Service Config") tool. The most common uses of this command are to import or export a manifest. For instance, I'm curious what the manifest for that MySQL5 service looks like: $ svccfg export network/cswmysql5 That probly looks really intimidating at first glance, but its really not so bad if you just break it down. First we define our dependencies, for instance MySQL is dependent on the network loopback service and the local filesystems service. There are 3 "exec_methods" which define the methods for start, stop, and restart. If you can start your app or daemon in just a single line then you don't need an external method script, but in the case of this service it opts for a script. Notice the "stop" method uses an SMF shortcut which just kills the processes rather than use a script or command. This is only a very simple example, you can put lots more information in there, but its pretty simple XML when you just break it down. When you create a new SMF Manifest, you simple put the XML in a file and use //svccfg import my_service.xml// to import it. This is only a taste of what SMF can do for you, for more information please check out the following resources: - [[http://opensolaris.org/os/community/smf/|The OpenSolaris SMF Community]] - [[http://docs.sun.com/app/docs/doc/817-1985/6mhm8o5rl?a=view|System Administration Guide: Basic Administration; Chapter 14. Managing Services]] - [[http://www.cuddletech.com/blog/pivot/entry.php?id=182|The Cuddletech SMF Cheat-Sheet]] - [[http://www.blastwave.org/smf/|The Blastwave SMF Repository (Share Your Manifests!)]]