Thoughts and ramblings

Deploying Moodle - continued

2013-07-11T14:26:00.001+01:00

One thing that I didn't really touch upon in my previous blog post was the method in which we actually deploy Moodle to our live servers.

Our production Moodle system is... well, complicated. We only have 14,000 students at present, but we're trying to really push use of the Moodle and the last thing that we want to happen is for it to go down.

To the same aim, we want to be able to patch servers without any downtime, and be able to take servers out of our load balancing pools to enable us to prepare updates seamlessly to our end users.

I will try and improve this blog posting a bit more when I have some time...

Systems

Architecture

We run our deployment on a purely virtualised infrastructure, currently that's VMWare VSphere 5.1. We have a separate infrastructure team who provide that infrastructure. The VMs sit on a pair of fully-redundant and replicate Storage Area Networks (SANs), and our moodledata is served over NFS by a NAS (Network Attached Storage) server.

Rather than having on beefy server to handle all of the load, we've found it's more efficient to have lots of smaller servers. Our VMWare specialists (Matt, and Graham) inform me that it's far better for scheduling if we have fewer processor cores than too many. If we have more than 4 processor cores on a VM, then VMWare has to try a lot harder to allocate the resources, and it's harder to migrate those VMs around the various blades.

At present, our architecture looks something like this (sorry, it's a little out of date, but should give you a fair idea):

Web servers

We have five live web servers (moodle-web[0,1,2,3,4]). These currently have two cores allocated, and 3GB RAM. They only need a small disk (20GB). Looking at our statistics throughout the year, we're likely to relinquish 1G of this memory on each VM as it's largely going unutilised.

We went for five servers because we want to be able to theoretically lose a whole blade which may have a couple of web servers on it, and not lose service. Theoretically VMWare should handle this automagically, but we've seen cases where this hasn't happened as it should.

Futhermore, we frequently pull one or two of these servers out of the pool to perform maintenance. I'll be doing this next week in our Moodle 2.5 upgrade. I'll take two of the servers out of the pool, prepare them with the upgrade and make sure that everything is all funky dory, and then I'll perform the upgrade, swing over to the new servers, and upgrade the old ones.

These web servers serve their content using Apache2 and we currently use mod_php5 rather than fastcgi. We didn't find any particularly stunning performance improvements with any of the CGI methods, but this may be something we re-evaluate in the future.

We currently ues APC, but we may consider switching to PHP Opcache when we upgrade to PHP 5.5 at some point. We follow Debian releases so we're unlikely to see PHP 5.5 for a couple of years yet.

Load balancers

To handle all of these web servers, we have a pair of load balancers. I should point out that we're Debian nuts and we love Open Source. We do our load balancing in Software with two software VMs.

These are also low-powered with 2 cores, and 1GB RAM (actually, one of them has 2GB but we intend to reduce this back to 1GB).

We currently terminate our SSL connections on these load balancers with nginx. When we made this decision, we were in two minds as to whether it was the 'right' thing to do, but in retrospect it has worked very well for us. It's a toss-up between being able to scale vertically at the web server, or at the load balancer. The web servers don't need public IP addresses, whilst the load balancers do. However, our web servers cost more (in terms of resources), and Lancaster University is extremely fortunate enough to have access to an entire slash-16 address range with 65534 globals.

In addition to terminating the SSL with nginx, we also do an amount of caching. Any images, and javascript, served by certain intelligient endpoints in Moodle are cached more aggressively in memory on the load balancer. This way, we don't even hit the load balancing software to serve them, and we don't hit the web servers either. Handy!

We also use X-Accel-Redirect to serve many of the files from our NFS server using nginx directly; rather than having PHP buffer them from disk. This is much more efficient and saw a huge drop in our CPU usage on the web servers with only a minor increase in CPU usage on our frontends. Basically, web serving software is designed for serving bytes off disk, whilst php is not. Again, it may seem a bit strange to serve the files on the load balancer rather than the web servers, and this is something that we may change in due course, but at present we are forced to use Apache on our web servers, and the mod_sendfile Module for Apache2 which does the same thing as X-Accel-Redirect for nginx is much less mature.

After traffic has been terminated, and cached content retrieved and served, it's then passed to our load balancing software, haproxy.

haproxy is an awesome little tool, which supports a range of really powerful features including different allocation methods, session stickiness, and it is also protocol aware for some protocols. It also has handy logging.

I'll briefly mention that we use keepalived to manage the VRRP layer of our stack. Each of our load balancers has a dedicated management IP, and a virtual service IP. The management IP never changes and reflects the name of the server providing service (e.g. moodle-fe0.lancs.ac.uk). Meanwhile, the VIPs are free to fly wherever they need.

We also use round-robin DNS to direct traffic to the two load balancers. I'd consider something like multicast DNS, but in our current environment all of our servers are in the same pair of datacentres and are only a mile apart and use the same IP range as everything else on campus. There's really very little point at this time.

At any point, we can take a load balancer out of service for maintenance. We frequently do so and our end users shouldn't notice at all. They'll still get sent to the same web servers that were handling their request before.

Software distribution

As I mentioned before, we're Debian nuts. We love Debian. We use it for pretty much everything (I think we have one Ubuntu box for BigBlueButton, but that's a Debian derivative anyway).

Server configuration

We also have a configuration management suite called configutil. It was written by a former employee, Chris Allen, and was originally based on Oxford University's configtool. However, we've pretty much rewritten it now and it does some pretty cool stuff. This includes distributed iptables, user management, package management, service management, and file deployment. A large chunk of this is actually handed over to puppet, but we build the puppet manifest with configbuild, and deploy the files with configutil/luns-config.

We keep all of our server configuration in git too (did I mention, we like git), build the configuration using configbuild. Servers have a deployment tool called luns-config which syncs against the configuration server.

In addition to liking Moodle, Mahara, Debian, and git, we also like security. In fact, we really really like security.

I'm not just talking about security in getting onto our systems (all of our servers are behind our corporate firewall, plus have strict iptables. We then enforce ssh keys on all servers). We also like our configuration to be safe. Our configuration is served over SSL, with client-side key verification too using our internal certificate authority. We generate revocation lists frequently and if the list goes out of date (6 monthly IIRC), we stop serving any configuration. A server can only retrieve configuration for the server named in it's configuration management certificate. On our package management server, we employ the same type of client-side certificate requirement so only systems with a valid SSL certificate and key-pair can access our configuration.

Software deployment

So now we've got all of that out of the way... I did mention that we really like Debian right? Right, good. Because we deploy all of our software in the form of Debian Packages. I mean all of it.

We've gone down this route for a number of reasons, some of them theoretical advantages, and some of them learned from experiences. They really come down to these though:

we want to be sure that we know what software is on a server;
we want to be sure that each server in a group is identical;
we want to be able to install a new server quickly and easily;
we want the ability to roll back to a previous version if we screw up; and
we have a tendency to twitch if we come across things out of place.

Basically, what it comes down to, is that we want to be able to quickly and easily build replacement servers, add new servers, re-install servers, etc. Most of our servers are entirely disposable. We try to keep all data on dedicated storage. As I mentioned before, our moodledata is on NFS. In reality, most of our data across all servers is stored on our NAS and served over NFS.

So if we discover that we're breaching the limits of our server configuration, we can scale horizontally (that is the right one isn't it?) and have a server built with a known configuration in a very short period of time (typically about an hour).

To this aid, we package all of our software. So each Moodle installation is a separate Debian package. Debian packages are awesome.

When we upgrade Moodle, we update the code using git (see thamblings.blogspot.com/2013/07/upgrading-moodle-from-git.html for my post on that topic). Once we've done that, we merge our new deployment branch into a new git branch - luns-moodle-lu_2.5.

This has the debian packaging information in it and this is where we create our package from.

Why not just keep our packaging data in our LUVLE-2-5.deployment branch? Well, we could do, but we feel that this is cleaner. I mean cleaner both in terms of separation of processes, and history.

For example, if we discover a bug in our package (like a missing dependency), then we want to make that change on our packaging branch. We don't want that change mixed up with the history of our Moodle codebase.

If you're interested in our package skeleton, I've put it in a gist at https://gist.github.com/andrewnicols/ae439676d116e9a6582f. These files all go into the debian directory, and then you can run dch --create to create an empty Changelog. You will, of course, need to update the control file to reflect your package name.

So once we've made our chnages, we merge the deployment branch into our packaging branch; we incremement the version number; build the package; add it to our package server; and deploy each of the frontends. Here's a summary of that process:

Hmm - that looks very long winded, but in manys ways it's just lots of small repetitive tasks which separate concerns, and make our lives easier in the long-run.

Now we've done that, to deploy on our five web servers, and our cron server we just run:

sudo apt-get update; sudo apt-get upgrade

Nice, and easy.

Upgrading Moodle from Git

2013-07-06T22:31:00.002+01:00

Upgrading Moodle with Git

Background

I've been working on a new deployment of Moodle for Lancaster University for the past two years or so. Our project started out with Moodle 2.1, and we upgraded for our initial pilot to Moodle 2.2.
Since then, we've upgraded from Moodle 2.2 to 2.3; and we're now planning the upgrade from 2.3 to 2.5.
We manage all of our upgrades with git, and our deployment using Debian packages.
I've been asked a couple of times to write about our upgrade methodology and reasoning so hopefully others will find this useful.
We use a variety of git features, but new features are added to git all the time which change our deployment methodology from time-to-time. At present we use:

branches

Branches, Tags, and Remotes

We have quite a few of these, but they really do make our life easier. Git is a fantastic tool, and if used to it's full extent, having a lage number of branches actually makes your life much easier, and less complicated. Having multiple remotes helps to separate various concerns too so you can ensure that it's harder to inadvertantly publish your institution's IP.

Branches

In summary, we have one branch per feature, hack, or change in core Moodle - no matter how small. These are named in a (hopefully) sensible naming scheme to help identify them from one another easily and quickly. The name describes the project/customer (usually LUVLE in our case), the version, the type of the change, and the frankenstyle name for that change. For some changes, we have an optional short name to describe the branch further. Our naming scheme works out as:


        {customer}-{major-version}-{change type}-{frankenstyle}[-{shortname}]

Where we have several related features which must co-exist and cannot be used without one another, we use a custom frankenstyle name of set_{name}.
As an example, these are some fo the branches for our impending Moodle 2.5.0 upgrade:

LUVLE-2.5-feature-mod_ouwiki
LUVLE-2.5-feature-block_panopto
LUVLE-2.5-feature-local_luassignment
LUVLE-2.5-feature-set_bigbluebuttonbn
LUVLE-2.5-hack-mod_resource-singlefiles

All of these branches are based on the same upstream tag - in this case, v2.5.0 for the 2.5.0 version of Moodle. We always use this tag. Even when 2.5.32 has come out we will still use 2.5.0 (though hopefully we won't ever get that far behind!). This may seem a touch strange at first, but when it comes to merging all of our features and changes together into a single testing or deployment branch, we want to avoid any merge conflicts created by different versions. It's also much easier when it comes to subsequent newer versions of Moodle in the future.
By having each feature in it's own branch, we're able to develop, and test that branch entirely in isolation.
In the rare cases that we are working with a feature which needs a minimum release version which includes a minor increment (e.g. 2.3.1), we check out from that tag instead, but we try to avoid this to make things simpler.
In addition to all of the feature and hack branches, we also have a range of testing and deployment branches. Generally, we have a main test branch which contains the same branches as our deployment environment. This is updated frequently when we want to test an upgrade to a whole branch in combination with the rest of our installation, or to test an upgrade to Moodle. Meanwhile, we typically only have a single deployment branch - LUVLE-2.5-deployment. This is to avoid any confusion and potential for dropped branches.

Remotes

As I mentioned before, one great reason for multiple remotes is to give you a separation of concerns. There are times where you don't wish to push some of your branches to the public, other times where you're working on bug fixes you don't really need to push to an internal repository, and all manner of other reasons besides.

I have the following remotes to make my life easier:

origin - git.moodle.org/moodle.git - my main upstream;
integration - git.moodle.org/integration.git - the moodle.org integration branch. Useful when fixing issues that crop up during integration;
public - github.com/andrewnicols/moodle.git - the repository I push any bug fixes and new features for the community to; and
cis - ciggit.lancs.ac.uk/moodle.git - our main internal repository.

As a general policy, and from experience of making oopsies, I've found it best to have each remote start with a different letter - it also makes tab completion much less frustrating.

The process

When a new release of Moodle comes out, we've typically taken a bit of laborious approach to things. Whilst there are a lot of steps, I feel that in the long run they've been less frustrating than trying to resolve any merge conflicts; and we've saved time trying scratching heads trying to work out where this change, or that whitespace conflict came from.

Initial set-up

Once we've checked out a new branch for every single feature, and hack, we begin to bring them all together. That's nice and easy when you're just starting out - just git merge a lot:
And hey-presto... we should have our Moodle 2.5 installation ready for testing and deployment. Once we're happy with our installation, we usually then create a deployment branch from that testing branch.

Grabbing a fix from upstream

We frequently come across issues which have already been fixed in upstream Moodle, or which we ourselves have helped to fix. Sometimes we also backport features from a newer branches onto our production branch if we really really want it.

We do all of this with the fantastic git cherry-pick command which allows you to pick a commit, or a number of commits, and apply them to your current branch.

Updating our local branches

For updating one of our feature branches, we simply make our changes to that specific branch, and then merge them back in again:

Upgrading Moodle - Minor releases

In reality, a minor update to Moodle is just the same as an update to one of our local branches. If anything, it's probably simpler:

Upgrading Moodle - Major releases

This is where things get much more complicated, and where the number of steps and the complication rapidly increases. That said, in my opinion they also reduce the confusion later on.

Externally provided code

We start by grabbing the latest version of the externally provided plugins and starting brand new branches for them. There's usually very little point in keeping the history for those branches as it doesn't contain any of the upstream commit messages.

Local branches which need updating

For our local branches, we want to preserve this history of our changes. We also want to remove any confusion with merges to newer versions to keep the history as clear as possible.

To do so, we use the wonderful git rebase --onto command.

With a normal rebase command, git takes every commit since your branch diverged from the new upstream, and attempts to replay each of them on top of the new head.

The --onto tells rebase where to take the commits from for the re-application. That's to say, that if you only have one commit since you branches from the tag, it grabs that commit, and immediately tries to replay on top of the target version. Ordinarily, it would attempt to reapply your commit on top of every other commit.
That's it. It's really simple, but it needs to be done for each and every branch that you have.

Finally, once all branches have been updated, we merge them into a new testing branch and begin our testing phase.

Tracking things

In order to make sure that we don't lose track of anything, or forget a branch during an upgrade, we make use of our issue tracking software Redmine.
When we start to put a version together, for example Moodle 2.2, we create a new task for the next upgrade - in this case, 2.3.
As we include each of our feature branches, we create a new subtask under the 2.3 task.
When we came to upgrading to Moodle 2.3, we then go through each of those subtasks and make any relevant notes from the upgrade process. We also create a new task for the subsequent upgrade (e.g. Moodle 2.5).
If we are no longer including a branch because it is now redundant or we have decided to change the functionality offered, we also note this in the relevant issue.
This all helps to ensure that we don't forget an issue and that we keep a record of all changes.

Rapid Moodle development using Git

2011-10-19T10:39:00.001+01:00

Recently I've been doing a lot of Moodle development, and every time I start a feature, or work on a bug, I've been creating a new branch. I often also create a new database, and a fresh install just to make sure that there's nothing fruity going on. All this has meant that I've had branches and databases coming out of my ears.

Since I've just taken possession of a new desktop for work, I've taken the opportunity to start afresh with my Moodle branches and I'm trying to be more organised in my branch creation. To that end, I've got the following system going:

Generic moodle bug: MDL-<bug number>-<revision>
Version for master: MDL-<bug number>-master-<revision>
Version for 2.0: MDL-<bug number>-MOODLE_20_STABLE-<revision>
Version for 2.1: MDL-<bug number>-MOODLE_21_STABLE-<revision>

This allows me to start work on a bug, and have relevant revisions to my patches in a sane and reasonably sensible (if not a touch long) fashion.

To make life simpler still, I've added to my moodle/config.php. This selects my database (and optionally database username which is sometimes handy) based on my branch name.

<?php

$branch = exec("git branch --no-color | grep '^* '| sed 's/^* //'");
$dbuser = 'moodle';

// First check for generic branch parents
if (preg_match('/master/', $branch)) {
    $newbranch = 'master';
} else if (preg_match('/MOODLE_20_STABLE/', $branch)) {
    $newbranch = 'MOODLE_20_STABLE';
} else if (preg_match('/MOODLE_21_STABLE/', $branch)) {
    $newbranch = 'MOODLE_21_STABLE';
} else if (preg_match('/MDL-/', $branch)) {
    // Any remaining MDL- matches which don't specify a branch will be
    // assumed to be on master
    $newbranch = 'MOODLE_21_STABLE';
}

// And now more specific parents
switch ($branch) {
    case 'example':
        $dbuser = 'some-other-dbuser';
        $newbranch = 'master';
        break;
    default:
        break;
}

$branch = $newbranch;

...

$CFG->dbname = 'moodle-' . $branch;
$CFG->dbuser = $dbuser;

I guess I'll see how it goes, but so far it's working well and I intend to replicate this with Mahara too.

Mahara 1.4 Cookbook Review

2011-10-09T16:00:00.000+01:00

A short while ago I was asked to review the Mahara 1.4 Cookbook, written by Ellen Murphie. I was quite excited to see what suggestions she had to offer on the many differing ways to use Mahara.

The book is split into eight chapters covering different types of user and use-cases with each chapter being made of a number of recipes (well, it is a cookbook!). There are suggestions utilising many of the features of Mahara, plus in depth steps on how to carry out each recipe. Although the book does assume that you've got a little prior knowledge on how to use Mahara, there are some very basic recipes for those who have not.

The first chapter, Mahara for the Visual Arts, focuses primarily on visual arts and makes some great suggestions, such as combining the Collections feature with Audacity to create an audio-guided tour which I particularly liked. I was especially pleased to see discussion of Creative Commons licensing -- a subject which more authors and artists should be aware of! The second chapter, Literature and Writing, gives some good ideas for using Pages to present journals in a variety of different ways. I liked the combination of RSS feeds and Journals within a Page to present a newspaper page (A Daily Gazette) complete with topical and up-to-date external content such as YouTube. I was also pleased to see that the book covered ePortfolio for professional use and not just for students with the inclusion of chapter three, The Professional Portfolio. I liked the way that a combination of Secret URLs were suggested as a way to give different potential employers access to a personalised Curriculum Vitae. The suggestion of uploading HTML, and Copying Pages were very useful to avoid duplicating effort and I'm glad that they were included.

Chapters four and five focused more on using Mahara as a teaching tool than for users wanting to create portfolios and I was surprised to see them in the middle of the book rather than at the end. Chapter four, Working with Groups, gives ideas on different ways of using groups - primarily for collaboration, but also for assessment. It gives an introduction to the possible uses of groups, and details some of the basic operations (adding and removing members), and also goes on to detail some ideas on how to use groups to engage students more. Primary education and teaching is a topic I'm not overly familiar with and I was surprised by the number of recipes in chapter five, The Primary Education Portfolio. I was intrigued by many of the ideas - I'd previously assumed that primary school aged students wouldn't typically work heavily with Mahara. I liked the recipe on creating a reading list with book reports.

I felt that chapter six, The Social Portfolio, gave lots of good ideas to help users to organise their profile page and include all sorts of external content (e.g. Twitter, and external blogs) and it was good to see mention of the RSS export features of Mahara.

Chapters seven and eight were more focused on higher education and the recipes focused on using features of Mahara to exhibit work and information for college applications. Several of the recipes suggested similar ideas but used the techniques in different ways. I liked that the topic of archiving portfolio content was covered, though disappointed that LEAP2A was not actually discussed.

Overall, I found the book very interesting and it gave me some thoughts on how others might be using Mahara. I was a little disappointed that some of the tips at the end of the book weren't included earlier (notably the ability to upload a zip file and unzip it in Mahara, rather than uploading each file individually -- this wasn't touched upon until chapter 7), but I don't think that this detracts from the book as a whole as they were still covered.

I think that this book would be ideally suited to users wanting to be able to make their work stand out and be seen, but also to teachers and advisors looking for ideas to give their students. I shall definitely be recommending this book to others.

Mahara, Mahara, and more Mahara

2011-09-15T10:59:00.000+01:00

Just a quick post really, I've been working really hard on all sorts of bits and pieces so not had much time to blog about several of the things I've been planning to blog on. Very frustrating but something I intend to rectify.

I've been a core Mahara developer for about 18 months, and working heavily with it for about two years now -- time flies when you're having fun! It's good fun working for a Mahara Partner and it's really good to be allowed time for open source contributions.

Sadly, I've been really busy at work recently so not had a huge amount of time for OSS work, but in my spare I've been working on some new features for Mahara which should be really uber cool. First off, I've been working with a patch that Penny Leach wrote about 2 years ago but which never got integrated. It's to add phpunit unit testing to Mahara. This should really help us to produce much better code and the plan is to run the unit tests on upload of every single patch, for both MySQL and Postgres.
I'm still trying to iron out some issues with MySQL and this, but hopefully soon I'll have that finished.

In a similar vain, I've then written a Command Line Interface (CLI) library for Mahara which should allow for much easier creation of CLI tools. I've then used that to create a CLI installer for Mahara, and a CLI upgrade script. Although this probably won't actually affect many end users, I do think that there are a certain group of users who will use this. There's also software such as CPanel which includes Mahara - but has typically had issues installing it for users without a CLI interface.
The main reason for writing these was again for testing purposes. Not only does it make life much easier for those of us developing mahara (I have a different database for every mahara branch so installing mahara at the drop of a hat is much appreciated), but it also means that we can machine test both installation, and upgrade of mahara as part of patch submission.

I've also been asked to review the upcoming Mahara 1.4 Cookbook (published by packt Publishing). Looking forward to reading it and seeing what suggestions other people have for using it. I'll be doing a quick demo of Mahara to a group of users within Lancaster University so I plan to use some of it's suggestions.

Server Distributions...

2010-12-29T21:12:00.002+00:00

I've been thinking quite a bit over the past couple of weeks as to what is the 'best' server distribution. We use Debian at work and I think that it's healthy to re-evaluate decisions once in a while, and see if they still make sense.

I'm a strong believer in using a distribution which is well supported, has plenty of software pre-packaged in a sensible format, but still allows you to roll your own packages without pulling teeth. It shouldn't include X11 as standard, but should include useful things such as LVM and the software raid stack.

In my mind, the key players of the Linux distribution world, which most people seem to consider fo servers are:

Debian;
Ubuntu;
RHEL;
CentOS;
Fedora; and
SUSE.

Of course, there are a many many other distributions which people use for their servers, but I think I'll limit my thoughts to these six. I've excluded Solaris and BSD for this post - perhaps I'll cover them in another. I'm not going to look at SUSE for the moment - partially because I've had some frustrating experience with it, but partially because it's a similar model to CentOS with both it's support model and it's packaging model.

RHEL, Fedora and CentOS

RHEL

While it can't be denied that RHEL is well supported, it's hardly an affordable level of support. You buy support on a per-processor and virtual-guest level with prices from around $399/year. Of course, there will doubtless be a variety of discounts, bulk discounts, educational discounts, yadda yadda, etc, etc, but it hardly makes for an affordable model if you're running servers which aren't mission critical. If you're a bank, and downtime costs you serious money, then I'm sure that it's worth buying, but for most, it's very difficult to justify.

Fedora

As a result, many people seem to use Fedora. My main reason against using Fedora is it's bias towards a desktop system. I'm not someone who wants their servers to have X11 installed, or all of the junk that you get with a window manager such as Gnome, or KDE. Fedora loses my vote primarily for this reason. The other real gripe that I have with Fedora is it's lack of support. The release cycle for Fedora is every six months, with that release being supported until your release + 2 has been out for a month. That means that, given the 6-monthly release cycle, you have security support for approximately 13 months if you install that server on day 1 of the release. Sure, you can jump from release, to release, to release, upgrading constantly, but the upgrade 'process' (if you can really call it that) is somewhat convoluted, and it's not really polite to cause a deliberate and avoidable downtime to all of your users every 6-12 months is it just because you choose to use a Desktop distribution for your server.

CentOS

I imagine that many users who are perfectly happy running Fedora on their desktops, but don't want to pay out for the exorbitant cost of RHEL support, therefore run CentOS. CentOS seems to be the mythical beast which perfectly encapsulates the issues I've already raised. It's a free system, with community support; and the support life cycle for each major version is seven years with releases made available 4-8 weeks after Red Hat publish the Source RPMs for RHEL. Minor releases seem to be made available approximately every 5-10 months.

So CentOS seems to be a really viable solution. I do have a couple of issues with it, mostly related to how they handle packages, or rather how they just don't seem to exist! Of course, a number of core packages do exist - things like Postgres, Apache, Perl, etc. But centos.org doesn't have a method for searching the list of available packages. It doesn't have many perl modules available (as far as I can tell) for example.

Of course, if you want to run software which isn't available out of the standard distribution packages, you probably want to roll your own packages anyway. You can, of course, build your own RPMs but again, it doesn't seem to be something that many people do. Instructions are available from Fedora and are valid for CentOS too.

Debian and Ubuntu

Debian

I should admit now, that I'm already a Debian convert. I use it for my work desktops, and all of my servers. The release cycle has recently changed to use a bi-annual freeze with the distribution released once that release is considered stable. Support for a release is available for about 1 year after it's moved to old-stable and, thus far, the support has been pretty good IMO. DSAs tend to be addressed pretty quickly, with packages generally released quickly too (sorry no stats for this) and few regressions caused by these.

On the packaging side, over 25,000 packages are available and cover a wide variety of software. Perl modules are well-catered for, as are python modules. Debian developers may only become an official developer after going through a pretty stringent process involving having your GPG key signed by other developers, and having a period of sponsorship by another developer. All packages are signed and verified which adds that warm-fuzzy feeling too.

Creating packages is also pretty well supported and very well documented and really is a breeze.

Ubuntu

Ubuntu is a derivitive of Debian which was created back in 2004 when Debian wasn't creating release cycles frequently enough for many. It still uses many of the same packages and much of the work done feeds back into the Debian project. Releases are every 6 months, but a Long-Term Support option is available which is released every second year and has support for five years.

Packages are the same as in Debian, and often newer versions of packages are available than in Debian.

However, I have a few niggles with Ubuntu which do put me off it a little. Only the core repository is supported, and only some of these packages are themselves supported. There have also been a fair few regressions in Ubuntu security releases which concern me.

My Summary

I think that, in summary, I'm pretty happy with my current distribution choice of Debian. I think that of the distributions I've looked at writing this post, it meets my requirements for security support, and release lifecycle. The availability of packages is very good (in my opinion) and it's really pretty easy to roll your own packages.

Debugging Mahara

2010-10-29T16:42:00.004+01:00

As I've mentioned before, I do quite a bit of development work on Mahara.
There are some really handy debugging features which it's worth knowing about.

Configuration
As Mahara ships, debug, info, warn an environ error messages are sent to your error log.
Environ message are also sent to screen.
The default settings for logging are in lib/config-defaults.php under the Logging Configuration section.
I tend to have my log levels set to LOG_TARGET_SCREEN | LOG_TARGET_ERRORLOG as below:

$cfg->log_dbg_targets     = LOG_TARGET_SCREEN | LOG_TARGET_ERRORLOG;
$cfg->log_info_targets    = LOG_TARGET_SCREEN | LOG_TARGET_ERRORLOG;
$cfg->log_warn_targets    = LOG_TARGET_SCREEN | LOG_TARGET_ERRORLOG;
$cfg->log_environ_targets = LOG_TARGET_SCREEN | LOG_TARGET_ERRORLOG;

Functions
You can then use the various log functions to log anything. Variables are printed in a sane format which is easy to read. It's really quite a breath of fresh air when you compare it to something as primitive as var_dump:

log_debug('hello');
log_info($object);
log_warn($othervar);
log_environ($yetanothervar);

Notes

Mahara uses javascript form submission all over the place and if you're trying to debug a form using javascript, the log mesages won't be printed to screen until you next load the page.

As a result, it's often really helpful to tail the apache error log a lot of the time.

The power of git - splitting one file into multiple commits

2010-08-20T14:36:00.001+01:00

Another really handy thing with Git, which I do use regularly, is it's ability to split lots of changes to the same file into separate logical commits.

As far as I know, any of the other VCSs I've used in the past haven't supported this and the only sane way I know of doing so would be to copy the file away, and use vimdiff to copy the logical commits one-by-one, committing each in turn.

Thankfully, git makes this really easy with the git add command.

Set up an example repository

# Create a new git repository for this example
cd /tmp/
mkdir git-add-pi
cd git-add-pi/
git init

# Make our first commit
echo "Example line" > example-file
git add example-file
git commit -m "First part of the example file"

Make some changes to our example file
Make a change to the beginning of the file:

(echo "Some big sweeping change to the file"; cat example-file) > tmp; mv tmp example-file

and make a change to the end of the file:

echo "Another big sweeping change to the same file" >> example-file

The magic
We actually wanted to split that file into two logical commits. With subversion, that would be a pain, but with git it's really easy with git add --pi:

523 git-add-pi:master> git add -pi example-file
diff --git a/example-file b/example-file
index d503a0c..82d4c17 100644
--- a/example-file
+++ b/example-file
@@ -1 +1,3 @@
+Some big sweeping change to the file
 Example line
+Another big sweeping change to the same file
Stage this hunk [y/n/a/d/s/?]?

Because the lines are so close to one another, git hasn't automatically split them up. Specify s to split them:

Stage this hunk [y/n/a/d/s/?]? s
Split into 2 hunks.
@@ -1 +1,2 @@
+Some big sweeping change to the file
 Example line
Stage this hunk [y/n/a/d/j/J/?]?

Well, we want to commit this hunk, so hit y

Stage this hunk [y/n/a/d/j/J/?]? y
@@ -1 +2,2 @@
 Example line
+Another big sweeping change to the same file
Stage this hunk [y/n/a/d/K/?]? n

We didn't want this hunk as part of this logical commit, so choose n

Checking what we've done
We can double check what we've added so far:

524 git-add-pi:master> git diff --cached
diff --git a/example-file b/example-file
index d503a0c..8bca897 100644
--- a/example-file
+++ b/example-file
@@ -1 +1,2 @@
+Some big sweeping change to the file
 Example line

And commit as normal:

525 git-add-pi:master> git commit -m "First change"
Created commit ad01f32: First change
 1 files changed, 1 insertions(+), 0 deletions(-)

What about the rest of the file?
You can repeat the add -pi as much as you like, or since we only have one more line to commit:

526 git-add-pi:master> git add .
git commi527 git-add-pi:master> git commit -m "final commit"
Created commit 16722ed: final commit
 1 files changed, 1 insertions(+), 0 deletions(-)

The power of git - splitting a commit

2010-08-19T21:31:00.007+01:00

Every blog has to have a customary first post - or at least, almost every blog seems to. Rather than mine just saying that I'm starting a new blog, and I'll try and keep it up to date and blah blah blah, I thought I'd actually post some content.

For the past 9 months or so, I've been fairly heavily involved in an open source project called Mahara. Mahara is an ePortfolio system and I'm sure that I'll post about it more in the near future so I won't go into lots of detail now. Needless to say though, the work I've been doing on it has involved a variety of customisations and any sane developer stores, manages and tracks their work using a version control system or VCS. The mahara project uses git.

Git is probably the most powerful version control system I've come across and enables you to manage your source and commits with minimal effort and confusion - it's probably the only VCS I've ever used which works for you rather than making you work around it. That said, some of the really powerful features of git take a bit of understanding.

Earlier today I was telling someone how you can use git in some really cool and interesting ways. One of the things I do frequently is to take a commit and split it into a series of logical commits and I thought that I'd share this.

Setting up a quick git repository
If you want to try and follow with my examples, I've pushed my sample repository to gitorious.org. You can skip the first part by cloning it with:

git clone git://gitorious.org/thamblings/git-split-commits.git

Say that you've created a quick git repository with a few files in it:

# Create a new git repository
cd /tmp
mkdir git-example
cd git-example/
git init

# Commit the first file
echo "First File" >> one 
git add one 
git commit -m "First file committed"

# Now add a few more commits/files
echo "Second File" >> two 
echo "Third File" >> three
git add .
git reset
git add two 
git commit -m "Second File"
git add three
git commit -m "Third File"

# Now add two files in the same commit
echo "Fourth File" >> four
echo "Fifth File" >> five
git add .
git commit -m "Fourth and Fifth Files"

# And another file
echo "Sixth File" >> six 
git add .
git commit -m "Sixth File"

# Let's see what we've done
git log

But we made a mistake...
But wait a second - files four and five should have been in different files. Let's split them out.

Let's rewrite history
First we'll check out a new branch, and then we'll do an interactive rebase to edit the commit:

git checkout -b fixcommits
git rebase -i HEAD~2

We should get something like:

pick 0d3abd7 Fourth and Fifth Files
pick feaeea8 Sixth File

We want to edit commit 0d3abd7, so change that to an edit - you can use 'e' instead:

edit 0d3abd7 Fourth and Fifth Files
pick feaeea8 Sixth File

That takes us to the point after the commit 'Fourth and Fifth Files' and before 'Sixth File'.
We can then use git reset the state of our current HEAD.

git reset HEAD~1

We can then re-add and re-commit each file:

git add four
git commit -m "Fourth File"
git add five
git commit -m "Fifth File"

We can then continue our rebase and we'll be left with our finished article:

git rebase --continue

Let's get things back to master
And if we're happy with what we have, we can check it back in to our master branch with another rebase

git checkout master
git rebase fixcommits

The same also works for other git additions. So you could add parts of a file with an interactive git add (git add -pi) for example.