One thing that is key to make a web services business successful is to have the service up all the time. For almost as many years as I have run my web hosting business, I have relied on Big Sister to monitor my network performance and some specific metrics and to notify me of problems when they arise. I recently switched to the Trustix Linux Sunchild distribution, and I had problems getting Big Sister setup there. On top of Trustix, I run hosting management software called H-Shpere which has some components with which I am not as familiar as I would like.
As I worked through the installation, I ran into and solved a couple of different problems which mostly related to the constraints I mentioned above. In the rest of this post I will provide a series of instructions that a systems administrator can use as a starting point to setup Big Sister, more or less out-of-the-box, to monitor their own cluster of H-Sphere servers. Once you have finished these instructions, you will still want to learn much more about monitoring so you can adjust and refine the scenario described here to meet your own needs.
Big Sister Screen monitor home
To verify the information for this post, I did another install on a fresh Trustix “server”, which was setup using the “minimal” option. If you are interested, I was able to make it run on a Pentium or Pentium 2 with 64MB of RAM. Maybe if you monitor a whole bunch of servers then you will need to get a bigger display server; I’m not there yet.
You should understand the steps I describe before wildly following my instructions because if some of your training wheels fall off in the process, I assume no responsibility for any slips or falls you may take. (On the other hand, if you impress your boss, you can take the credit too!)
A Big Sister Network Monitor setup has two different parts. On each monitored server, an agent runs. The Big Sister agent is a program that runs in the background and queries a computer (or even remote computers, like switches or routers). The agent then reports this information to the display server. The agent process on Linux is named uxmon.
With regard to H-Shpere software, I do not see any reason you cannot set this up on one of the web servers in your H-Shpere cluster, but I have not. I have a separate server outside of the cluster I use for my own testing and other server needs.
Trustix platform preparation
Once you have completed a minimal Trustix install on your big sister diisplay server, you need to install some additional packages that Big Sister will need.
swup --install apache perl rrdtool-perl
This will also install some other prerequisite modules. One of the reasons I like Sunchild, even though Trustix seems to be floundering, is the distribution’s philosophy of not installing anything you didn’t ask for… not even perl.
rrdtool and rrdtool-perl are found in the contributed section of the Trusitx distribution. If it is not already, you must follow the steps found in
/etc/swup/swup.conf to add that to your list of repositories and also add the key for that to your swup keys.
The Big Sister RPMs expect a user and group named bigsis (and bigsis) to exist. The install script will actually create these if they do not exist, but I had better luck specifying the options to create the user before installing the RPMs. As we shall note later, Trustix is
pretty anti-social very secure in some of its default settings. By telling adduser the home directory and group of the user and overriding the Trusix defaults, we will be compatible with the “RedHat-ish” directory layout of the RPMs. These commands are needed on both the display server and any servers that only run the agent, uxmon.
adduser bigsis -d /var/lib/bigsister -g bigsis -m
The display server needs to create files so that the web server can read read them. I guess there are a few ways to do this, but the most direct seems to be to change the default user umask of 077 (see what I mean by anti-social?) to 022. So I added the following line to the file:
/var/lib/bigsister/.bash_profile in order to allow the files created by the user to be world readable. This step is not required on the servers that you will be monitoring, only on the display server.
Next I grab the files I need from the download site. For each monitored server, you should install:
and on the display server, you must also install:
Configuring the agent
The agent configuration file is
/etc/bigsister/uxmon-net and a basic one is provided that will work if you just turn on bigsister. There are three main sections to a simple uxmon-net file, and I have just a few changes I need to make because I am running Trustix.
At the top of the uxmon-net file are the default settings that are in effect. None of these need to be changed to setup basic monitoring. In this line the value 5 for frequency and perf (performance) means that data is collected by the agent every 5 minutes and the interval between sending performance data to the server is 5 minutes. If you wish to monitor something with SNMP such as a switch or router, then your community will probably be something else too. I did not have to change any of the
DEFAULT lines in the files supplied to make Big Sister work.
DEFAULT community=public frequency=5 perf=5 ALL
In each uxmon-net file you must define, or describe, the host(s) you wish to monitor. The description consists of listing the system “features” and then defining a name for the system you will monitor. Most of the time, you will monitor the localhost.
DESCR features=unix,linux localhost
Sometimes you may wish to monitor a remote switch or router via SNMP. That might look more like this in the uxmon-net file:
DESCR features=remote cat2948g-01.some.com
The hostname of the remote device has to resolve to an ip address in this case.
Configuring the test(s) you want the agent to perform
Finally, in the configuration file
/etc/bigsister/uxmon-net you will tell the agent what test to perform.
localhost proc=sshd procs
localhost proc=httpd procs
In the case of the display server, I need to be sure I can get to the box to work on it via the sshd process and I want to be able to see the web interface to Big Sister so the httpd process should also be monitored. The configuration can also specify a minimum or maximum number of the process that should be running. More information can be found in the documentation for each test.
There are a number of tests in addition to simply checking for running processes for instance:
localhost load memory network cpuload
The cpuload test output
Telling the agent were to send the data
The uxmon process communicates with the display server. It needs to be told where the display server is; that configuration looks like this:
Again, the hostname of the Big Sister display server must resolve to an ip address in this case.
Some things cannot be monitored on a Linux system by an ordinary user. For instance, I do not think “df” works right for regular users. So uxmon must be able to run a limited number of tests as the super user. These tests are placed in the file
/etc/bigsister/uxmon-asroot. The format of this file is the same as uxmon-net. In my case this file contains the disk tests and some ping tests.
And a couple of notes about H-Sphere
On my CP I wanted to monitor the control panel process, SiteStudio, PostgreSQL and named. For the moment I have settled for this in the uxmon-net file on that server:
localhost proc=httpd procs
localhost proc=java procs
localhost proc=postmaster procs
localhost proc=named procs
I don’t know a whole lot about qmail, and so I am not so sure about normal there either, but this is what I have in uxmon-net on my cluster’s mail server:
localhost item=mail proc=qmail-send procs
localhost item=mail proc=qmail-lspawn procs
localhost item=mail proc=qmail-rspawn procs
localhost item=mail proc=qmail-clean procs
localhost item=mail proc=qmail-todo procs
localhost item=mail proc=qmail-clean procs
localhost item=mail proc=spamd procs
localhost item=mail proc=clamd procs
item=mail instructs the display server to show these tests grouped under the heading “mail”, instead of the default column for the processes (procs) test which is procs. This column heading is shown on the example screen capture at the top of this post.
Starting and stopping; and checking it all out
To start and stop Big Sister use the
service bigsister start
service bigsister stop
When you install the bigsister-server RPM, it will modify your httpd.conf file. It adds a line to include
/etc/bigsister/httpd.conf. You will have to restart your web server the same way.
service httpd stop
service httpd start
Finally, if all went well, you should be able to see the results of monitoring after a few minutes. Some tests provide data to the display server more rapidly than others do. Whenever you change your
uxmon-* files, you must restart Big Sister for the changes to take effect. Use the name of your Big Sister display server:
more obligitory screen captures
Getting email notification when a test fails
Just as with uxmon the default notification on each bigsister monitored server needed very little adjustment to make it work correctly. In my case I want changes in status to be emailed to me and the file
/etc/bigsister/bb_event_generator.cfg controls where and how notifications are send. In fact, I looked in this file and created an alias on the localhost to direct the default address, alarm, eventually to the address where I can be alerted.
Firewall configuration and similar considerations
Big Sister (uxmon) communicates with the display server over TCP port 1984. You should take precautions to secure that port on the bigsister display server; I only allow traffic which comes from hosts I expect to monitor. You should also be aware that Big Sister does not encrypt the data. There are limited instructions for tunneling Big Sister communications in SSH in the old documentation, and maybe in the new.
You should also use .htaccess or another method to control access to the the Big Sister display server. There is an administrative interface that would allow malicious users to bad things without much difficulty.
More to do with Big Sister
Although one can get started in a few hours with the defaults; Big Sister can monitor large networks and do so with a great degree of flexibility and power. I know there are networks where Big Sister monitors hundreds of servers and devices. In my own little corner of the world, I also monitor Cisco switches and Pix. (Pixs??? Pixes??? Pixi?) Any SNMP network devices can be monitored. There are modules for Windows servers, and a lot more detail that one can configure into Big Sisters monitoring.
Getting more help
In addition to the new and old documentation, I would like to improve this post, if you have questions you may leave them in the comments by using the form below. I will reply to those that I can. There is a small but helpful community of users, and they have a mailing list which you can subscribe to. Among other places you can find bigsister-general archived on my company web site. On the chance someone reading this has a company in need of help in Scotland or Switzerland the Big Sister website mentions two firms in those countries who provide professional level services relating to the software.