Multi Host Trac using NGINX

Sunday, 2010-01-10 02:52, 1263091927 seconds since Unix epoch

So, after using NGINX as my primary web server for over six months, I’m quite happy with it. The sites I’ve migrated have all been running without real practical problems what so ever. During my usage of NGINX, one of the most useful aspects is it’s use of assignable variables in the configuration files. Where I needed to write the same twenty-something lines of configuration for every similar virtual host using Apache, NGINX allows me to replace all that with a single write-once works-for-all virtual host. Trac is one of the things for which this will come in handy. I’m hosting a dozen of Trac sites, all requiring their own Apache Location directive. I’ve replaced all of that with a few lines of NGINX config and a single init-script. I’m going to show you how.

Before we start, I have to tell you I’ve only tested this using Trac 0.11.6 and NGINX 0.7.64 on Debian GNU/Linux. It’ll probably work everywhere, except on Windows. Trac’s FastCGI simply won’t run.

Just like NGINX hasn’t got mod_php, mod_python is equally absent. Which is a good thingtm. Trac supports running every site in FastCGI mode since 0.9, making it entirely NGINX-compatible. There’s even a config sample on the wiki, which we are not going to use.

After creating your Trac environment using trac-admin, you’ll have to deploy the site first. Since I’m using Debian, it’s going to end up somewhere under /var/www/. I’ve also put my Trac sites in /var/trac/, just to make things a little more complicated. Put your files wherever you fancy, I’ll keep using these paths. Say, we’re about to host helloworld’s project trac page.

trac-admin /var/trac/helloworld initenv
mkdir -p /var/www/trac && chown www-data:www-data /var/www/trac
trac-admin /var/trac/helloworld deploy /var/www/trac/helloworld
chown -R www-data:www-data /var/www/trac/helloworld /var/trac/helloworld

Replace www-data with whatever user your NGINX is using. This will initialize your Trac environment and deploy the site-specific static files to the webroot. This will also provide you with the FastCGI server, which in turn can be started using Lighty’s spawn-fcgi. As always, I’ll supply the init script you can use to automate this. This time though, I’ve simplified things a bit. Since all of the Trac FastCGI processes are the same anyway, we can use symlinks and the init script’s basename to unify our configuration. The only thing you have to do to start the site’s FastCGI daemon during the system boot is to copy the fcgi-trac-base script to /etc/init.d/, and the following.

ln -s /etc/init.d/fcgi-trac-base /etc/init.d/fcgi-trac-helloworld
update-rc.d fcgi-trac-helloworld defaults
/etc/init.d/fcgi-trac-helloworld start

Now we’ve got Trac itself running, it’s time to get NGINX to actually serve the site. I’ve chosen, because of SSL limitations, to host Trac sites under https://www.domain.tld/trac/project instead of a sub domain. I think the config’s easy enough to change this behavior. The first thing we want to do is to make sure static content, like CSS and images, is served directly by NGINX instead of tunneled through FastCGI. All the following configuration goes, in order, into your domain’s server { } block.

location ~ ^/trac/([0-9a-zA-Z\-_]*)/chrome(.*)$ {
    alias /var/www/trac/$1/htdocs$2;
}

All of the static content will now be caught before Trac’s even touched. This increases the speed of serving this content drastically. Next up, calling Trac itself. It becomes a little tricky from here, since we’re juggling with regular expressions and variables. Once you’ve set them up correctly though, you shouldn’t have to edit a single line anymore when adding extra Trac sites. First, we want to store which Trac site we’re accessing. In this case, we want the helloworld part from our Trac URI.

if ($uri ~ ^/trac/([0-9a-zA-Z\-_]*).*$) {
    set $trac_host $1;
}

We’ll make good use of this variable. The next step is to call the right Trac FastCGI server. Because we’ve used a self-configuring init script, we can safely assume the location of the listening UNIX socket. We can even use this variable to point to the right authentication file, if you wish to secure your Trac sites. The right way to create this authentication file is using Apache’s htpasswd.

mkdir /etc/nginx/trac
htpasswd /etc/nginx/trac/trac.helloworld.passwd johndoe
chmod 400 /etc/nginx/trac/trac.helloworld.passwd
chown www-data /etc/nginx/trac/trac.helloworld.passwd

I’ll explain the following block in parts, because there are quite some gotcha’s hidden between the lines.

location ~ ^/trac {
    auth_basic            "Trac";
    auth_basic_user_file  /etc/nginx/trac/trac.$trac_host.passwd;

I’ve used a regular expression match in the location instead of a regular location definer because otherwise the PHP interpreter out of my previous NGINX post would try to parse anything that ends with .php, including PHP files in Trac’s browse source functionality. It would fail of course, but Trac will also fail to produce the pretty syntax highlighted source code. You do have to make sure Trac’s configuration comes before PHP’s.

    fastcgi_split_path_info ^(/trac/[0-9a-zA-Z\-_]*[/]*)(.*)$;

It took me a while, and some help, to figure this out. To get the right PATH_INFO FastCGI variable, you can’t just use the regular expression in the previous if-statement. It will work, except for URIs with urlencoded characters in them, like spaces. NGINX keeps these %something characters, while FastCGI’s PATH_INFO expects these strings to be supplied decoded. The special function fastcgi_split_path_info corrects this error, and will supply you with a correct value stored in the $fastcgi_path_info variable. You’ll have to be using NGINX 0.7.31 or later to get this to work.

    fastcgi_pass   unix:/var/run/trac-fastcgi-$trac_host.sock;

Now we can pass everything to our eagerly waiting UNIX socket, which has been set up by the fcgi-trac-base init script. As you can see, it’s important the project name throughout the config matches exactly. Otherwise, some components might fail to find the right locations.

    fastcgi_param  HTTPS            on;
    fastcgi_param  QUERY_STRING     $query_string;
    fastcgi_param  CONTENT_TYPE     $content_type;
    fastcgi_param  CONTENT_LENGTH   $content_length;
    fastcgi_param  SCRIPT_NAME      /trac/$trac_host;
    fastcgi_param  PATH_INFO        /$fastcgi_path_info;
    fastcgi_param  AUTH_USER        $remote_user;
    fastcgi_param  REMOTE_USER      $remote_user;
    fastcgi_param  REQUEST_METHOD   $request_method;
    fastcgi_param  SERVER_NAME      $server_name;
    fastcgi_param  SERVER_PORT      $server_port;
    fastcgi_param  SERVER_PROTOCOL  $server_protocol;
}

Finally, we can add the right variables to get FastCGI the information needed to serve the Trac pages. Remove the HTTPS variable if you don’t use HTTPS for your Trac sites. Also, remove the *_USER variables if you don’t use NGINX’s HTTP authentication for Trac.

Now you can easily add new Trac sites by following the next steps.

trac-admin /var/trac/$PROJECT initenv
trac-admin /var/trac/$PROJECT deploy /var/www/trac/$PROJECT
chown -R www-data:www-data /var/trac/$PROJECT /var/www/trac/$PROJECT
htpasswd /etc/nginx/trac/trac.$PROJECT.passwd $USER
chmod 400 /etc/nginx/trac/trac.$PROJECT.passwd
chown www-data /etc/nginx/trac/trac.$PROJECT.passwd
ln -s /etc/init.d/fcgi-trac-base /etc/init.d/fcgi-trac-$PROJECT
update-rc.d fcgi-trac-$PROJECT defaults
/etc/init.d/fcgi-trac-$PROJECT start

I’ve combined all of the config into a single file you can place in your NGINX config directory, which you can include inside any server {} block.

Play MKV on the PS3 For Free

Monday, 2009-11-23 20:01, 1259006487 seconds since Unix epoch

Okay, I’m getting just about sick of all of these half-assed solutions out there. I’ve done some research and I’ve made a script that usually works. It converts a H.264/AC3 MKV file to H.264/AAC. I’ve also downsized the AC3 to 192Kbit/s AAC stereo, because almost nobody has proper 5.1.

The script does make some assumptions, but it’ll work for almost 99% of the content out there. You can download it here or copy-paste from this piece of code. Oh, you’ll need some tools as well. Those would be mkvtoolnix, gpac and ffmpeg on Debian. Can’t find the packages you need? Debian-Multimedia might help.

#!/bin/bash

if [ -z "$1" -o -z "$2" ]; then
  echo "Usage: $0 movie.mkv movie.mp4"
  exit
fi

FPS=`mkvinfo "$1" |grep -m 1 fps | awk ' { print $6 }' | sed 's/(//'`

echo "Detected $FPS fps first stream"

mkvextract tracks "$1" 1:/tmp/mkv2ps3.264 2:/tmp/mkv2ps3.ac3
ffmpeg -i /tmp/mkv2ps3.ac3 -ab 192k -ac 2 -acodec libfaac /tmp/mkv2ps3.aac
MP4Box -new "$2" -add /tmp/mkv2ps3.264 -add /tmp/mkv2ps3.aac -fps $FPS
rm /tmp/mkv2ps3.{264,ac3,aac}

Edit: If you’re having trouble with double free or memory corruption errors at the end of the script, you’re using a broken gpac. Here’s a little something to get yourself a functional MP4Box. Run as root:

cd /usr/src/
cvs -z3 -d:pserver:anonymous@gpac.cvs.sourceforge.net:/cvsroot/gpac co -P gpac
cd gpac
chmod +x configure
./configure --static-mp4box
make
cp bin/gcc/MP4Box /usr/local/bin

Debian PHP Session Sharing Stopped Working

Thursday, 2009-08-27 15:27, 1251386831 seconds since Unix epoch

The preferred method of sharing sessions between (sub)domains has always been the session.cookie_domain PHP setting. For example, if I want to keep my blog’s session in the photography pages, I simply set PHP’s session.cookie_domain to ".jrrzz.net".

But all of a sudden, all of this stopped working. Visiting one of the domains while having an active session on another destroyed all active sessions all together. After searching through the docs and several angry users later, I’ve found the culprit.

The suhosin security patch encrypts the session data using the DocumentRoot string. Since this usually varies between sub domains, you’ll have to disable this in suhosin’s own configuration file. Simply set the PHP directive suhosin.session.cryptdocroot to off.

How to Migrate from Apache to NGINX

Friday, 2009-06-26 00:10, 1245975012 seconds since Unix epoch

Just imagine you’ve got a few web sites to look after, like me, have a tendency to over-engineer things, like me, want the best out of your hardware, like me, and have some free time on your hands, unlike me. What do you do? Right! Migrate Apache to NGINX. “Why NGINX?” you might ask. Well, Apache eats too much RAM for starters. If a website is lagging because of a faulty database, Apache will prefork itself to death and claiming all of your precious RAM in the process. NGINX is fast. Really fast. It uses less resources while doing way more useful work. It has it’s shortcomings too of course. It can’t handle that many configuration options. You can throw anything HTTP related to Apache and it’ll have some kind of module that understands it. NGINX can understand HTTP. That’s about it. But on the other hand, that’s all I want a web server to do. And finally, the logo. Whereas Apache has a purple feather, NGINX is the People’s Server of the Great Soviet Union. I mean, how cool is that? In Soviet Russia, NGINX serves you!

So first, we’ll need NGINX. My machines are running Debian GNU/Linux amd64, so it should just be an apt-get install. The latest stable NGINX is stuck in experimental, and because I didn’t want to be working with the legacy version, I rolled my own package. It’s available from the WasdaPuntEnEl apt repository if you’re too lazy to build your own. I’ve also uploaded the source, so you’re welcome to port it to your own platform.

The first thing you’ll need for a proper migration is a second IP address. You should know how to configure your OS to get that effect. If you can’t get a second external IP address, I’d suggest reading SSH or OpenVPN documentation. With this dual-IP method you can test your web sites on both Apache and NGINX, and verify the absence of difference. Well, except for the obviously enhanced speed that is. Reconfigure your Apache to only listen at the first IP address, the one you’re already using to serve from. Check for Listen 80 and friends. On Debian you can usually find these directives in /etc/apache2/ports.conf. Change it to only listen on your primary IP address like this: Listen 1.2.3.4:80.

The Debian package I’ve forked shipped with a decent default configuration. It has, just like Debian’s Apache, the sites-available and sites-enabled directories, a conf.d and sane defaults. The /etc/nginx/nginx.conf file is quite understandable.


user www-data;
worker_processes  2;

error_log  /var/log/nginx/error.log;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    access_log  /var/log/nginx/access.log;

    sendfile        on;

    keepalive_timeout  65;
    tcp_nodelay        on;

    gzip  off;

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

We only need two worker processes, this is really enough to completely saturate the 100Mbit/s pipe this particular 2-core server is connected to. It’s basically a rule of thumb to take a worker process for each available CPU, with a limit of six or so. Unless you’re serving 1×1 gifs, which NGINX can do insanely fast from RAM by the way, you’ll be fine. You can increase the worker_connections value to compensate for increased traffic. So this config can serve up to 2084 parallel connections, which is enough for this particular server.

Next up: virtual hosts. Just like Apache, you can set up a virtual host in a file located in /etc/nginx/sites-availabale and symlink it into /etc/nginx/sites-enabled to enable that virtual host. The configuration of the virtual host is almost like Apache’s, with a few gotchas.


server {
        listen   87.253.149.111:80;
        server_name jbisc.org www.jbisc.org;

        access_log  /var/log/nginx/jbisc.access.log;

        if ($http_host !~ www.jbisc.org) {
                rewrite ^(.*)$ http://www.jbisc.org$1;
                break;
        }

        set $webroot /var/www/jbisc;
        location / {
                root   $webroot;
                index  index.html;
        }
}

This little blurp configures NGINX to serve JBISC’s site from /var/www/jbisc/, the same place Apache reads it from. The rewrite syntax is more readable than Apache’s. Any code monkey can read, and understand, the www-forcing code. You can also use variables in configuration files, something that you can’t do without once you’re used to it. I’ve set $webroot just for good practice, because you’ll need it’s location on more places than one in complex configurations.

Most of the sites hosted on my machines are written in PHP. Including this blog. So we’d need some kind of PHP support. Apache has mod_php, which includes the PHP interpreter into the Apache process. It’s the fastest way to communicate to PHP from the web server, but it just doesn’t scale that well. Every Apache process will have this interpreter on board, even if it’s sending you a 1×1 gif image. The best way, that I know of, to serve PHP from NGINX is to use FastCGI. Unlike CGI, which starts an interpreter for every request, FastCGI has a number of interpreter processes running, listening on a socket. Luckily PHP has a built-in FastCGI method. First, you’ll need PHP listening on a socket. To get this thing running we’ll use spawn-fcgi from Lighty. It’s a nice little wrapper program making it easier to manage the PHP processes. I’ve modified an init-script that I found somewhere on the web to start a few PHP processes, which will listen on a Unix socket for incoming FastCGI requests. It will also drop PHP’s privileges to Debian’s www-data user. Copy it to /etc/init.d, make it executable and (if you wish) add it to your boot using update-rc.d. Don’t forget to start it before adding the following config to NGINX.


location ~ \.php$ {
        fastcgi_pass   unix:/var/run/php-fastcgi.sock;
        fastcgi_index  index.php;
        fastcgi_param  SCRIPT_FILENAME   $webroot$fastcgi_script_name;
        include fastcgi_params;
}

This bit of configuration should go into your existing server { } section. It will match the URL for the PHP extension, and send the request to the eagerly waiting PHP FastCGI socket. The fastcgi_params include is another configuration file listing all of the default fastcgi_param directives. It’s just a shorthand. Don’t forget to add index.php to your index directive. Otherwise you’ll end up with a 403.

You won’t miss mod_rewrite at all. I’ve easily migrated all of the mod_rewrite configuration to NGINX rewrites. They both share basically the same regular expression syntax. You should get rid of, or block, the .htaccess files. NGINX doesn’t support those.


if (-f $request_filename) {
        break;
}

if (!-e $request_filename) {
        rewrite ^(.+)$ /index.php?q=$1 last;
}

These two should go in the location / { } part of you virtual host. These rewrites are used to serve WordPress for instance. Everything that can be translated to a file name gets hosted directly, otherwise it’s the q argument for index.php.

There’s one final little piece of config I’d like to share. I love mod_userdir. It’s just a nice way of putting stuff online without difficult configuration. The following should go into a file, which can be included in the server { } part of your virtual host. It realizes the same behavior, including PHP support. This won’t work server-wide, just for the domains you include this file for.


location ~ /~([^/]+)(.*\.php)$ {
        alias /home/$1/public_html$2;
        fastcgi_pass   unix:/var/run/php-fastcgi.sock;
        fastcgi_param  SCRIPT_FILENAME   $request_filename;
        include fastcgi_params;
}
location ~ /~([^/]+)(.*)$ {
        autoindex on;
        index index.html index.htm index.php;
        alias /home/$1/public_html$2;
}

I’ve also managed to get both Rails and Django running, using mostly the same technique. The only difference between those frameworks and PHP scrips is that the frameworks ship with either their own web server you can proxy (like Thin for Rails) or a FastCGI server (like manage.py runfcgi for Django).

You can test your migrated sites by editing your local DNS resolving to resolve the hostname to the second IP address. You can either change you local hosts-file, or make your local DNS caching server apply the changes. If everything checks out you can reconfigure both NGINX and Apache to respectively listen on the primary IP address and stop listening on it. A quick restart of both of them finishes your migration.

This entire server has been migrated in little under four hours, with some relatively complex web sites. It’s also still running Apache to host the Subversion and Trac sites. Subversion over HTTP is one of those things NGINX just isn’t designed for (yet).

Blog Fixed

Sunday, 2009-06-21 22:35, 1245623752 seconds since Unix epoch

As probably nobody noticed, heck, I didn’t even notice it until recently, this site was severely broken. The theme didn’t survive the WordPress update. I’ve got a new (customized) theme now, which I like. You should like it too.

I’ll add some real content some time soon. I promise.

Bye MailScanner

Friday, 2009-06-05 15:36, 1244216215 seconds since Unix epoch

Because of this:

Mail Queue

I’ve replaced MailScanner with a plain ol’ spamassassin postfix content filter. And believe it or not, spam is caught way more efficiently now. No more delays, double file extension nags and 2k line config files.

PHP APD Completely Useless

Saturday, 2009-05-23 14:25, 1243088739 seconds since Unix epoch

We all know PHP isn’t a very well thought out programming language. It tries to do a lot of things, but fails to do most of those correctly.

PHP has a nice little extension, the Advanced PHP Debugger. APD for short. It allows you to analyze and alter PHP’s internals. It also supplies the rename_function function. Apart from it’s reversed name (it should be function_rename, like function_exists etc), it’s quite useful. It gives PHP some aspect oriented features, like allowing you to rename mysql_query to add transparent logging.

But PHP wouldn’t be PHP if they didn’t fuck this up. It’s okay to rename PHP’s own functions, but don’t try to rename your own.


jorrizza@shoebox:/tmp$ cat > balls.php
function foo() {
return 0;
}
rename_function('foo', 'original_foo');
var_dump(function_exists('original_foo'));
original_foo();
?>
^D
jorrizza@shoebox:/tmp$ php balls.php
bool(true)
Segmentation fault
jorrizza@shoebox:/tmp$

Kapow! Segfault! It’s even better when you’re using Apache mod_php. An apache child will segfault, making other mod_php processes behave quite strangely all of a sudden. This bug has been known since 2007, but nobody seems to care. It’s a Zend extension, we can’t support that, oh no.

OpenBSD GCC Fun

Monday, 2009-05-18 16:08, 1242662893 seconds since Unix epoch

First some good news. The cluster I’m building is finally able to stream H264/AVC using RTSP. It’s also able to chain RTSP links in between nodes in order to support tree-shape content distribution within the cluster. All of this functionality is easily accessible from a neat little Ruby API.


require 'vlm'
vlm = VLM.new('127.0.0.1', 4212, 'admin')
vlm.broadcasts.each do |broadcast|
vlm.play(broadcast)
end

I had to fix some things in VLC to make it at least workable for my setup. Some really weird assumptions and race conditions still plague the code base. The weirdest error was the following, while dynamically linking Live555.

undefined symbol ‘__gxx_personality_v0′

This symbol is used by GCC in some Java related internal stuff. What the hell is this error doing in VLC’s output?

Because VLC can’t be built (anymore) using the OpenBSD standard GCC 3 compiler, I had to use GCC 4 from the ports collection. Apparently, when dynamically linking GCC 3 compiled binaries with a GCC 4 product, some symbols are lost in translation. The solution was shipping my own GCC 4 compiled Live555 library.

Monitoring DRBD using Munin

Tuesday, 2009-02-24 23:33, 1235518411 seconds since Unix epoch

Maybe some of you have noticed, quite some wasda.nl machines went down today. The cause of this problem was a disconnected DRBD, which went unnoticed for quite some time. This has been solved after some kill-dash-nines, hard resets, blood, sweat and tears.

Because network admins don’t really like zombie killing parties a solution had to be found. We’ve got munin running for quite some time now. It does a great job at warning the right people if shit is about to hit the fan. So I’ve written a small script which graphs DRBD network and disk usage. It also sends out an error if a DRBD gets disconnected.

You can grab it here. The configuration is fairly simple. Just copy this into your plugin directory and symlink it with these names:

  • drbd_net_N: Network traffic by /dev/drbdN
  • drbd_disk_N: Disk usage by /dev/drbdN

One note of caution though. This plugin uses API and Proto version 86 of DRBD. I haven’t tested it with any other version. The code is easy enough so you’re welcome to edit it.

Rescue Lost Directories using Debian aptitude and dpkg

Thursday, 2008-09-25 20:01, 1222372860 seconds since Unix epoch

We all do stupid things sometimes. I’ve just deleted the contents of my /bin/ directory. Luckily we’ve got the Debian package manager to save the day. Because it has super cow powers. Mooh!

Okay, with a lost /bin/ directory we haven’t got our shell anymore so we’ll have to boot using a Debian rescue image. I’ve used bootp to boot my machine into Debian rescue, but you can use any medium your machine can boot from. I’ve tarred a working /bin/ directory from another box into an archive and put it back, so the machine could be used again. I’ve used wget for this, but some kind of USB storage device will also work. Make sure the donor box is running the same architecture you’re using.

Now we can boot back into our broken Debian system. But we’re not quite there yet. These binaries in /bin/ might be Slackware or Gentoo binaries, who knows. Well, you, but you still might miss some binaries. We want all of our own Debian binaries back. Here’s a quick three-line solution how to refill your /bin/ (or any other directory for that matter) with Debian’s files. You can replace /bin with any directory you’d like, and dpkg will restore it for you.

cd /tmp
for package in `dpkg -S /bin |awk -F”:” ‘{print $1}’`; do aptitude download `echo $package | sed -e ‘s/\,//’`; done
dpkg -i *.deb

Tomorrow: how to make dpkg order a pizza for you while doing the laundry.