Archive for January, 2010

Production behaviour testing

Thursday, January 28th, 2010

I’ve been thinking about monitoring production sites and applications for a while now. Network equipment hands out more stats than you can shake a stick at over SNMP, and that’s been the basis of many a successful monitoring setup.

Unfortunately the state of the network does not directly help if you’re really trying to understand issues that an end-user may be having. So we all got on board with monitoring the state of the machines the apps are running on, tracking memory, CPU, interrupts and so on.

Bad app, good system

Oddly enough, this doesn’t help too much either. One of the first profiling jobs I did for a customer consisted of collecting months worth of stats from an ICL machine running SVR4, and throwing it all into a spreadsheet to analyse. Luckily it graphed easily, and we could see memory maxing out as users came online, recovering partially at lunchtime, and completely at the end of the day. Swap space followed the same pattern; so the customer bought more memory (which wasn’t a cheap option in those days). That prevented the machine from dipping into swap, but didn’t help the application performance. Other indicators seemed to show that the system was in a reasonable state, except load average was high (which in those days was basically IO wait related).

Eventually I called on friends in the ICL performance centre; they knew the application in question to be troublesome, as it had been naïvely ported from another platform and was basically just inefficient with resources; from memory I think it was distrusting the filesystem to do its job and repeatedly opening & closing files for frequent small writes. No tuning on the host could accommodate that (although these days I’d be tempted to give the application a virtual filesystem, or something).

Of course, we had insufficient resources (tools, time, money & indeed knowledge) to examine this further, leaving the customer’s problems unresolved.

When do we debug an app?

Now, with access to the source and a debugger, or possibly these days with just Dtrace in hand, we can prod and poke at the internals of a running application to see just what is happening. But we only do this when enough users are unhappy enough to complain. What we should be doing is monitoring the performance of applications themselves, as they are running on real production systems. That way we can see when acceptable performance turns into unacceptable performance, as the real environment around the app changes.

Given that a good application release should include simulated load testing, our interest should lie in the actual user-experienced performance on the live production systems. These are affected by all sorts of factors that are not under the control of the app; database delays, other processes on the server hogging resources …

Web application performance use-case testing

As a extension of the current thinking about Behaviour Driven Infrastructure testing (go and read up on Lindsay Holmwood’s thoughts on this, thanks for your presentation at LCA2010), we should be looking at driving use-case tests through the production infrastructure, treating it as a black box (the way the user does), and gathering performance stats on each step.

I’ve been looking at cucumber-nagios as a way to get started on this, given that Nagios is the canonical status monitoring tool, and the domain-specific language that cucumber uses is highly business-readable. However, although Nagios is often good-enough as a performance collecting service, I need more granularity than a single test scenario. I want per-step timings …

Per-step performance

A typical cucumber test run of a web application looks like this :-

$ cucumber inodepages.feature
Feature: inode.co.nz
  It should be up
  And the home page should introduce Inode
  And the Contact page should have my PGP key fingerprint

  Scenario: Visiting home and contact pages                                                          # webhome.feature:6
    When I go to http://inode.co.nz                                                                  # _steps.rb:1
    Then I should see "Inode is a small IT consultancy & services company based in Dunedin NZ. " # _steps.rb:1

  Scenario: Visiting the contact page via the home page                                                                           # webhome.feature:11
    When I go to http://inode.co.nz                                                                                               # _steps.rb:1
    And I follow "Contact"                                                                                                        # _steps.rb:9
    Then I should see "B50F\302\240BE3B\302\240D49B\302\2403A8A\302\2409CC3 8966\302\2409374\302\24082CD\302\240C982\302\2400605" # _steps.rb:1

2 scenarios (2 passed)
6 steps (6 passed)
0m0.790s

This gives a total time value, which fits in with performance targets, but doesn’t do so well with actually showing us which steps have changed over time. Looking in to the guts of the cucumber steps (which inherently are unique per project), it seems reasonable that we can record our own timing values. Here’s a part of a steps file from a cucumber-nagios project :-

When /^I go to (.*)$/ do |path|
  visit path
end

When /^I submit the form named "(.*)"$/ do |name|
  submit_form(name)
end

The visit path and submit_form calls are methods of Webrat. It’s not especially difficult to wrap each of these calls with some extra code to collect the delta time for each call into webrat, and then to spit that out somewhere.

Of course it would take someone more aware of what’s going on inside the various modules to know whether this is the best place to take such action, but to my mind it’s the beginnings of great performance data … now we have to decide what to do with it. I’m thinking it will make an interesting plugin for collectd

Wordpress on Debian 5 with nginx, apache2 and admin under HTTPS

Saturday, January 16th, 2010

This Wordpress instance is being served by a combination of Nginx and Apache2, with the admin pages under HTTPS using the Admin SSL plugin. It’s all done with the Debian 5 packaged versions and some nifty configuration …

Nginx

The public webserver is Nginx, which is the high-performance front-end webserver you should be using. Nginx is listening for both HTTP and HTTPS. It is directly serving all static content it can, based on a regexp to detect the types.

As added complications, the /wp-uploads/ files are in a different location, and need to have the requested file name rewritten. And there’s no point wasting our time sending static content over HTTPS, we’re going to redirect all of it to HTTP. This will trigger browser warnings regarding mixed content, but that’s not a big problem.

location ~ \.(html|css|js|png|gif|jpg|svg|ico|txt)$ {
    if ($scheme = https) {
        # Refuse to serve static content under HTTPS
        rewrite ^/(.*)$ http://$host/$1 break;
    }
    # Intercept and serve static files directly
    if ($request_uri ~ /wp-uploads/.*$) {
        rewrite ^/wp-uploads/(.*)$ /$1 break;
        root /var/www/wp-uploads/inode.co.nz;
    }
    if ($request_uri !~ /wp-uploads/.*$) {
        root /var/www/inode.co.nz;
    }
}

Everything else is proxied down to an Apache running mod_php, on localhost. This is pretty straightforward, except the Apache itself is running HTTP and HTTPS (using the same certificates as Nginx), so I have to choose which proxy to talk to :-

# Choose the proxy based on the current encryption scheme
if ($scheme = http) {
    proxy_pass http://127.0.1.1:80;
}
if ($scheme = https) {
    proxy_pass https://127.0.1.1:443;
}

We are also setting up some new headers so that Apache can use mod_rpaf, enabling it to log real end-user IP addresses in its logfiles.

Apache

The application webserver is Apache, running mod_php (libapache2-mod-php5), mod_rewrite (which is not enabled by default on Debian) and mod_rpaf (libapache2-mod-rpaf). Apache is listening to both HTTP and HTTPS, but will look out for requests to /wp-admin/ and /wp-login.php and make sure that they are redirected to HTTPS. This is basically the same setup as recommended by the Wordpress Codex.

# Bounce sensitive requests to https
<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteRule ^/wp-admin(.*) https://inode.co.nz/wp-admin$1 [QSA,L]
    RewriteRule ^/wp-login.php(.*) https://inode.co.nz/wp-login.php$1 [QSA,L]
</IfModule>

The HTTPS server will also rewrite non-admin page requests, and direct you back to the HTTP version.

# Bounce non-sensitive requests to http
<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteRule !^/wp-admin(.*) - [C]
    RewriteRule !^/wp-login.php(.*) - [C]
    RewriteRule ^/(.*) http://inode.co.nz/$1 [QSA,L]
</IfModule>

Admin SSL

This is almost enough, as most of the admin URLs presented by Wordpress are relative, and will therefore be HTTPS if the requested page was HTTPS. But for Wordpress versions below 2.6 (i.e. the standard Debian version is 2.5.1) you will need to install the Admin SSL plugin. The default configuration is fine, just install and enable the plugin (specifically, we are not using Shared SSL).

Conclusion

Here is a diagram showing what we have achieved :-

Wordpress under nginx & apache

HTTPS will be automatically invoked when accessing admin pages, and will be automatically switched off at all other times. Only requests that need PHP will be passed on to Apache, everything else will be handled by nginx.

Why Wordpress?

Sunday, January 10th, 2010

Why have I chosen Wordpress as the CMS for http://inode.co.nz/?

Normally, I wouldn’t choose to rely on anything written in PHP … but more than that, I wanted something that was supported from the Debian repository. And while the significantly more interesting Plone is in there as well, the development environment they prefer doesn’t like being wedged into The Debian Way — I couldn’t install simple themes, and the recommendation was “ignore the packaged version, install from source”. Sorry, that violates the Inode philosophy of system administration.

So, Wordpress is enough of a CMS to be used as a manager for a simple website, which this is. It’s provided in a workable form from the Debian repository, so it will be security tracked and updated. And addons (such as themes) work the way they are supposed to, easily.