Yesterday, the WordPress server dropped out. Looks like some joker decided to grab the whole site, no throttling, nothing. Oh well, I guess that’s what fat pipes lead to.
In principle, WordPress should be able to handle such things, but I think it caused the underlying MySQL to run out of memory and abort.
A restart of the WordPress VM fixed it, once I found out. But this gives me an excuse to describe some things I’ve been starting to set up here.
One of of things I did this summer, was set up a collectd process on most of the machines running here – collectd is a bit Linux-centric, but one of the things it can do is grab various system statistics and broadcast it on the LAN over UDP. There are loads of plugins, and you can interface to it with scripts to feed it anything you like.
The nice thing about UDP broadcasts, is that senders don’t need to know anything about receivers. As with the RF12 driver, you just kick stuff into the network and then it’s up to others to decide what to do with it, if anything.
Collectd is also extremely low overhead, both in terms of CPU & memory load and in terms of network load. I’ve enabled it permanently on five systems, sending out 50..100 system parameters each, once every 30 seconds.
Here’s a partial view of the information being collected:
Broadcasting is very convenient, as the collectd tool and the RF12 driver have shown, because there’s no setup. In theory, packets can get lost without acknowledgements and re-transmissions. In practice, some do indeed get lost, but very rarely. No problem at all for periodic / repetitive data collection.
I created a small script to display system status on my development Mac, using a tool called GeekTool. It can put information right on the desktop, i.e. as unobtrusively as you want, really. Here’s what I see when there are no windows obscuring the desktop background:
On the left is a periodic “
ps“, the output on the right is from “
ssh myserver jeemon overview.tcl“. This is a remote call to the server using a small script to pull info out of a text file and format it. The output shows that there are 5 systems online (running collectd, that is), and the script last ran at 12:41 in 39 ms.
Perhaps more interesting is that the server also collects this data from the LAN. To do this, I just launch JeeMon next to this “
History group * 1w/5m 1y/6h collectd listen sysinfo
The first line is optional – it defines the parameters for historical round-robin data storage. So what I end up with, is the last 7 days of information, with details every 5 minutes, as well as up to one year of information aggregated per 6 hours.
The second line starts listening for collectd packets on the standard UDP port, tagging the results with the “sysinfo” prefix. As side effect, JeeMon will update a “
stored/state.map” text file once a minute with the current values of all variables plus some other details. This is what was used to generate the above GeekTool on-screen status. There’s a built-in web server to query and access the saved historical data.
There’s no further setup. It’s up to the clients to decide what to collect and how often to send it. Data storage on the server is fixed, since this is stored in round-robin fashion (like rrdtool). It’s now collecting nearly
600 800 variables and using about 100 125 Mb of disk space. Data collection adds 2% load on a 2.66 GHz CPU.
This could also be used to collect home sensor data.