Computing stuff tied to the physical world

System status

In Software on Sep 6, 2011 at 00:01

Yesterday, the WordPress server dropped out. Looks like some joker decided to grab the whole site, no throttling, nothing. Oh well, I guess that’s what fat pipes lead to.

In principle, WordPress should be able to handle such things, but I think it caused the underlying MySQL to run out of memory and abort.

A restart of the WordPress VM fixed it, once I found out. But this gives me an excuse to describe some things I’ve been starting to set up here.

One of of things I did this summer, was set up a collectd process on most of the machines running here – collectd is a bit Linux-centric, but one of the things it can do is grab various system statistics and broadcast it on the LAN over UDP. There are loads of plugins, and you can interface to it with scripts to feed it anything you like.

The nice thing about UDP broadcasts, is that senders don’t need to know anything about receivers. As with the RF12 driver, you just kick stuff into the network and then it’s up to others to decide what to do with it, if anything.

Collectd is also extremely low overhead, both in terms of CPU & memory load and in terms of network load. I’ve enabled it permanently on five systems, sending out 50..100 system parameters each, once every 30 seconds.

Here’s a partial view of the information being collected:

Screen Shot 2011 09 05 at 12 57 52

This was generated in the browser using JeeMon and jstree, as an exercise (I’m looking for a more convenient JavaScript widget to adjust a live tree using JSON).

Broadcasting is very convenient, as the collectd tool and the RF12 driver have shown, because there’s no setup. In theory, packets can get lost without acknowledgements and re-transmissions. In practice, some do indeed get lost, but very rarely. No problem at all for periodic / repetitive data collection.

I created a small script to display system status on my development Mac, using a tool called GeekTool. It can put information right on the desktop, i.e. as unobtrusively as you want, really. Here’s what I see when there are no windows obscuring the desktop background:

Screen Shot 2011 09 05 at 12 42 27

On the left is a periodic “ps“, the output on the right is from “ssh myserver jeemon overview.tcl“. This is a remote call to the server using a small script to pull info out of a text file and format it. The output shows that there are 5 systems online (running collectd, that is), and the script last ran at 12:41 in 39 ms.

Perhaps more interesting is that the server also collects this data from the LAN. To do this, I just launch JeeMon next to this “main.tcl” script:

    History group * 1w/5m 1y/6h
    collectd listen sysinfo

The first line is optional – it defines the parameters for historical round-robin data storage. So what I end up with, is the last 7 days of information, with details every 5 minutes, as well as up to one year of information aggregated per 6 hours.

The second line starts listening for collectd packets on the standard UDP port, tagging the results with the “sysinfo” prefix. As side effect, JeeMon will update a “stored/state.map” text file once a minute with the current values of all variables plus some other details. This is what was used to generate the above GeekTool on-screen status. There’s a built-in web server to query and access the saved historical data.

There’s no further setup. It’s up to the clients to decide what to collect and how often to send it. Data storage on the server is fixed, since this is stored in round-robin fashion (like rrdtool). It’s now collecting nearly 600 800 variables and using about 100 125 Mb of disk space. Data collection adds 2% load on a 2.66 GHz CPU.

This could also be used to collect home sensor data.

  1. JC, I’m sure I’m preaching to the converted, but if you have an iptables base firewall between the outside world and your wordpress server, it’s relatively easy to implement limits. You can cap bandwidth per connection, number of simultaneous connection for each remote IP, max number of connections they can make per second etc etc.

    • Actually, I never went into iptables… might be a good reason to start looking into it!

      Yes, the front-end is a VM running Linux and nginx (really nice setup: extremely low resource usage for serving static sites, and having a reverse proxy as switchboard and “URL tweaker” is very flexible).

  2. I’ve looked at collectd in the past, but never quite figured out the server end of things — I got bogged down in Catci, et al.

    By the way, I also stumbled onto the fact that the rsyslogd on each node can send selected info out via UDP which is also very nice.

Comments are closed.