One thing that I didn't really touch upon in my previous blog post was
the method in which we actually deploy Moodle to our live servers.
Our production Moodle system is... well, complicated. We only have
14,000 students at present, but we're trying to really push use of the
Moodle and the last thing that we want to happen is for it to go down.
To the same aim, we want to be able to patch servers without any
downtime, and be able to take servers out of our load balancing pools
to enable us to prepare updates seamlessly to our end users.
I will try and improve this blog posting a bit more when I have some time...
I will try and improve this blog posting a bit more when I have some time...
Systems
Architecture
We run our deployment on a purely virtualised infrastructure, currently
that's VMWare VSphere 5.1. We have a separate infrastructure team who
provide that infrastructure. The VMs sit on a pair of fully-redundant
and replicate Storage Area Networks (SANs), and our moodledata is
served over NFS by a NAS (Network Attached Storage) server.
Rather than having on beefy server to handle all of the load, we've
found it's more efficient to have lots of smaller servers. Our VMWare
specialists (Matt, and Graham) inform me that it's far better for
scheduling if we have fewer processor cores than too many. If we have
more than 4 processor cores on a VM, then VMWare has to try a lot
harder to allocate the resources, and it's harder to migrate those VMs
around the various blades.
At present, our architecture looks something like this (sorry, it's a little out of date, but should give you a fair idea):
Web servers
We have five live web servers (moodle-web[0,1,2,3,4]). These currently
have two cores allocated, and 3GB RAM. They only need a small disk
(20GB). Looking at our statistics throughout the year, we're likely to
relinquish 1G of this memory on each VM as it's largely going
unutilised.
We went for five servers because we want to be able to theoretically
lose a whole blade which may have a couple of web servers on it, and
not lose service. Theoretically VMWare should handle this
automagically, but we've seen cases where this hasn't happened as it
should.
Futhermore, we frequently pull one or two of these servers out of the pool to
perform maintenance. I'll be doing this next week in our Moodle 2.5
upgrade. I'll take two of the servers out of the pool, prepare them
with the upgrade and make sure that everything is all funky dory, and
then I'll perform the upgrade, swing over to the new servers, and
upgrade the old ones.
These web servers serve their content using Apache2 and we currently
use mod_php5 rather than fastcgi. We didn't find any particularly
stunning performance improvements with any of the CGI methods, but this
may be something we re-evaluate in the future.
We currently ues APC, but we may consider switching to PHP Opcache when
we upgrade to PHP 5.5 at some point. We follow Debian releases so we're
unlikely to see PHP 5.5 for a couple of years yet.
Load balancers
To handle all of these web servers, we have a pair of load balancers. I
should point out that we're Debian nuts and we love Open Source. We do
our load balancing in Software with two software VMs.
These are also low-powered with 2 cores, and 1GB RAM (actually, one of
them has 2GB but we intend to reduce this back to 1GB).
We currently terminate our SSL connections on these load balancers with
nginx. When we made this decision, we were in two minds as to whether
it was the 'right' thing to do, but in retrospect it has worked very
well for us. It's a toss-up between being able to scale vertically at
the web server, or at the load balancer. The web servers don't need
public IP addresses, whilst the load balancers do. However, our web
servers cost more (in terms of resources), and Lancaster University is
extremely fortunate enough to have access to an entire slash-16 address
range with 65534 globals.
In addition to terminating the SSL with nginx, we also do an amount of
caching. Any images, and javascript, served by certain intelligient
endpoints in Moodle are cached more aggressively in memory on the load
balancer. This way, we don't even hit the load balancing software to
serve them, and we don't hit the web servers either. Handy!
We also use X-Accel-Redirect to serve many of the files from our NFS
server using nginx directly; rather than having PHP buffer them from
disk. This is much more efficient and saw a huge drop in our CPU
usage on the web servers with only a minor increase in CPU usage on our
frontends. Basically, web serving software is designed for serving
bytes off disk, whilst php is not. Again, it may seem a bit strange to
serve the files on the load balancer rather than the web servers, and
this is something that we may change in due course, but at present
we are forced to use Apache on our web servers, and the mod_sendfile
Module for Apache2 which does the same thing as X-Accel-Redirect for
nginx is much less mature.
After traffic has been terminated, and cached content retrieved and
served, it's then passed to our load balancing software, haproxy.
haproxy is an awesome little tool, which supports a range of really
powerful features including different allocation methods, session
stickiness, and it is also protocol aware for some protocols. It also
has handy logging.
I'll briefly mention that we use keepalived to manage the VRRP layer of
our stack. Each of our load balancers has a dedicated management IP,
and a virtual service IP. The management IP never changes and reflects
the name of the server providing service (e.g. moodle-fe0.lancs.ac.uk).
Meanwhile, the VIPs are free to fly wherever they need.
We also use round-robin DNS to direct traffic to the two load
balancers. I'd consider something like multicast DNS, but in our
current environment all of our servers are in the same pair of
datacentres and are only a mile apart and use the same IP range as
everything else on campus. There's really very little point at this
time.
At any point, we can take a load balancer out of service for
maintenance. We frequently do so and our end users shouldn't notice at
all. They'll still get sent to the same web servers that were handling
their request before.
Software distribution
As I mentioned before, we're Debian nuts. We love Debian. We use it for
pretty much everything (I think we have one Ubuntu box for
BigBlueButton, but that's a Debian derivative anyway).
Server configuration
We also have a configuration management suite called configutil. It
was written by a former employee, Chris Allen, and was originally based
on Oxford University's configtool. However, we've pretty much rewritten
it now and it does some pretty cool stuff. This includes distributed
iptables, user management, package management, service management, and
file deployment. A large chunk of this is actually handed over to
puppet, but we build the puppet manifest with configbuild, and deploy
the files with configutil/luns-config.
We keep all of our server configuration in git too (did I mention, we
like git), build the configuration using configbuild. Servers have a
deployment tool called luns-config which syncs against the
configuration server.
In addition to liking Moodle, Mahara, Debian, and git, we also like
security. In fact, we really really like security.
I'm not just talking about security in getting onto our systems (all of
our servers are behind our corporate firewall, plus have strict
iptables. We then enforce ssh keys on all servers). We also like our
configuration to be safe. Our configuration is served over SSL, with
client-side key verification too using our internal certificate
authority. We generate revocation lists frequently and if the list goes
out of date (6 monthly IIRC), we stop serving any configuration. A
server can only retrieve configuration for the server named in it's
configuration management certificate. On our package management server,
we employ the same type of client-side certificate requirement so only
systems with a valid SSL certificate and key-pair can access our
configuration.
Software deployment
So now we've got all of that out of the way... I did mention that we
really like Debian right? Right, good. Because we deploy all of our
software in the form of Debian Packages. I mean all of it.
We've gone down this route for a number of reasons, some of them
theoretical advantages, and some of them learned from experiences. They
really come down to these though:
- we want to be sure that we know what software is on a server;
- we want to be sure that each server in a group is identical;
- we want to be able to install a new server quickly and easily;
- we want the ability to roll back to a previous version if we screw up; and
- we have a tendency to twitch if we come across things out of place.
Basically, what it comes down to, is that we want to be able to quickly
and easily build replacement servers, add new servers, re-install
servers, etc. Most of our servers are entirely disposable. We try to
keep all data on dedicated storage. As I mentioned before, our
moodledata is on NFS. In reality, most of our data across all servers
is stored on our NAS and served over NFS.
So if we discover that we're breaching the limits of our server
configuration, we can scale horizontally (that is the right one isn't
it?) and have a server built with a known configuration in a very short
period of time (typically about an hour).
To this aid, we package all of our software. So each Moodle
installation is a separate Debian package. Debian packages are awesome.
When we upgrade Moodle, we update the code using git (see thamblings.blogspot.com/2013/07/upgrading-moodle-from-git.html
for my post on that topic). Once we've done that, we merge our new
deployment branch into a new git branch - luns-moodle-lu_2.5.
This has the debian packaging information in it and this is where we
create our package from.
Why not just keep our packaging data in our LUVLE-2-5.deployment
branch? Well, we could do, but we feel that this is cleaner. I mean
cleaner both in terms of separation of processes, and history.
For example, if we discover a bug in our package (like a missing
dependency), then we want to make that change on our packaging branch.
We don't want that change mixed up with the history of our Moodle
codebase.
If you're interested in our package skeleton, I've put it in a gist at
https://gist.github.com/andrewnicols/ae439676d116e9a6582f.
These files all go into the debian directory, and then you can run dch
--create to create an empty Changelog. You will, of course, need to
update the control file to reflect your package name.
So once we've made our chnages, we merge the deployment branch into our
packaging branch; we incremement the version number; build the package;
add it to our package server; and deploy each of the frontends. Here's
a summary of that process:
Hmm - that looks very long winded, but in manys ways it's just lots of
small repetitive tasks which separate concerns, and make our lives
easier in the long-run.
Now we've done that, to deploy on our five web servers, and our cron
server we just run:
sudo apt-get update; sudo apt-get upgrade
Nice, and easy.