The Millhouse Group Blog: April 2011

Friday, 29 April 2011

Ultimate Ubuntu Build Server Guide, Part 3

Groundwork, Phase 3: Calling Names

Note from future self; Although this setup guide is now superseded by cloud-based tools, certain elements are still useful simply as good practice, such as the "groundwork" in the early stages of this guide. As such, this article has been spared the chop

This is Part 3 of my Ultimate Ubuntu Build Server Guide.

A productive network needs a comprehensible naming convention and a reliable mechanism for dishing out and looking up these names.

Naming Thine Boxen

On my first day at my current workplace I asked what the URL of the Wiki was. "wiki" was the reply. That kind of smile-inducing, almost-discoverable name is exactly the kind of server name we are looking for. At the other end of the scale is the completely-unmemorable, partly-implemented corporate server-farm naming scheme that means your build box is psal04-vic-virt06-r2 and your wiki lives on 12dev_x64_v227a. Ugh.

DNS and DHCP

The cornerstone of being able to talk to machines "by name" is of course, DNS. You need your own DNS server somewhere on the network. I know what you're thinking - that means a big, noisy, power-sucking Unix box and bind and resolf.conf and ... argh!.

Not necessarily.

Unless you've been living under a rock for the last 10 years, you'll have noticed that there are now rather a lot of small networked devices around, all running some flavour of Linux. Your DSL router is almost certainly one of them, but while it probably offers DHCP services, it probably won't be able to serve up DNS entries (as aside from proxying DNS entries from upstream). That's OK. There are other small Linux boxen that will do the job.

I speak of NAS devices. I'm personally using a Synology DS209, which is the kind of web-configured, one-box-solution, Linux-powered network überdevice I could have only dreamt about 10 years ago. In addition to storing mountains of media files and seamlessly acting as a Time Capsule for my MacBook, this neat little unit also runs SynDnsMasq, a port of the amazing dnsmasq DHCP/DNS server.

`dnsmasq`

A simple, elegant and functional tool that runs off one superbly-commented configuration file, dnsmasq will make your local network much more navigable thanks to local DNS addresses - ssh user@buildbox is much better than ssh user@10.12.14.16, don't you think?

Having full control of your DHCP server (as opposed to the primitive on/off on most domestic routers) also allows you to set up effectively-permanent address allocations based on MAC addresses. This gives you all of the advantages of static IP addresses for servers, but allows you to have a centralised repository of who-is-who, and even change things on the fly, if a server goes offline for example.

By running this software on my NAS, I get all of these features, plus I save scads of power as it's a low-power unit AND it switches itself on and off according to a Power Schedule so it's not burning any juice while I'm asleep. I've configured the DHCP server to actually tell clients about two DNS servers - the primary being the NAS itself, the secondary being my ADSL router. That way, if I start using a client at 10.55pm, I can keep surfing the web after the NAS goes to sleep at 11pm - the client will just "fail over" to the main gateway.

Name Your Poison

The actual names you use for servers and clients are of course a very personal choice. One of the best schemes I've used was based on animals, with increasing levels of maturity and/or decreasing domesticity based on their function. A table's worth a thousand words here I think!

Function	Species	Dev	Test	Prod
App Server	Canine	Puppy	Dog	Wolf
Web Server	Equine	Foal	Horse	Zebra
Database Server	Capra	Kid	Goat	Ibex

While this caused much hilarity amongst non-technical people ("Horse will be down until we bounce the Goat" is not something you hear in many offices!), it actually worked very well.

The scheme I'm using at home is a simple "big cat" scheme - lion, tiger, cheetah, leopard etc - but I've taken the opportunity to "overload" the names in my dnsmasq configuration - so buildbox currently resolves to the same machine as cheetah - but of course should that duty ever change, it's just a one-line change on the NAS to fix it.

Tuesday, 26 April 2011

Ultimate Ubuntu Build Server Guide, Part 2

Groundwork, Phase 2: Sensible IP Addresses

Note from future self; Although this setup guide is now superseded by cloud-based tools, certain elements are still useful simply as good practice, such as the "groundwork" in the early stages of this guide. As such, this article has been spared the chop

This is Part 2 of my Ultimate Ubuntu Build Server Guide.

First things first. If you're running a development network on a 192.168.x.y network, you're going to want to change that, stat. Why? Because you simply don't (or won't) have enough numbers to go around.

Yes, you're going to hit your very own IPv4 address exhaustion crisis. Maybe not today, maybe not tomorrow, but consider the proliferation of WiFi-enabled devices and virtualised machines in the last few years. If you've only got 250-odd addresses, minus servers (real and virtualised), minus workstations (real and virtualised), minus bits of networking equipment, and each developer has at least one WiFi device in her pocket, you're probably going to have to start getting pretty creative to fit everyone in. And I haven't even mentioned the possibility of being a mobile shop and having a cupboard full of test devices!

To me, it makes much more sense to move to the wide open spaces of the 10.a.b.c local network range. Then not only will you have practically-unlimited room for expansion, but you can also start encoding useful information into machine addresses. Allow me to demonstrate with a possible use of the bits in the a digit:

 7 6 5 4 3 2 1 0
 | | | |
 | | | \- "static IP"
 | | \--- "wired"
 | \----- "local resource access OK"
 \------- "firewalled from internet"

Which leads to addresses like:

Address	Meaning	Example Machine Type
`10.240.b.c`	fully-trusted, wired, static-IP	Dev Servers
`10.224.b.c`	fully-trusted, wired, DHCP	Dev Workstations
`10.192.b.c`	fully-trusted, WiFi, DHCP	Known Wireless Devices
`10.128.b.c`	partly trusted WiFi DHCP	Visitor Laptops etc
`10.48.b.c`	untrusted wired static-IP	DMZ

You've still got scads of room to create further subdivisions (dev/test/staging for example in the servers group) and access-control is as simple as applying a suitable netmask.

In the above case, sensitive resources could require a /10 (trusted, firewalled) IP address. Really private stuff might require access from a wired network - i.e. a /11. Basically, the more secure the resource, the more bits you need in your a octet.

It might be a bit painful switching over from your old /24 network but I think in the long term it'll be well worth it.

Next time, we'll look at how to name all these machines.

Friday, 22 April 2011

Ultimate Ubuntu Build Server Guide, Part 1

Groundwork, Phase 1: A Quality Local Network

Note from future self; Although this setup guide is now superseded by cloud-based tools, certain elements are still useful simply as good practice, such as the "groundwork" in the early stages of this guide. As such, this article has been spared the chop

OK, so for the next I-don't-know-how-long I'm going to devote this blog to a comprehensive step-by-step guide to setting up an absolutely rock-solid Continuous-Integration/Continuous Delivery-style build server.
Fundamentally it'll be based around the latest release of Ubuntu Server, but there are a number of steps that we need to address before we even introduce an ISO to a CD-ROM.

Possibly the dullest, but most potentially useful is Getting Our Local Network Sorted Out. What do I mean by that?

Working out a sensible IP addressing scheme
Maintaining a comprehensible naming convention for machines
Dedicating an (almost-)always-on box to do DNS lookups and DHCP handouts
Having as few statically-addressed machines as feasible
Keeping DHCP clients on as stable an IP address as possible
Having good bandwidth where it's needed
Using the simplest-possible network infrastructure
Offering simple options for both backed-up and transient file storage

You might be surprised how many "proper" system-administered networks fail on at least one of these hurdles; so while we wait patiently for the Natty Narwhal I'm going to take the next few weeks to go through them.

Tuesday, 19 April 2011

The Joy of CSS

Why It's hard to be a CSS Rockstar

Writing CSS is hard. Writing good CSS is really hard.

I've been using CSS in anger for about 4 years now, and I'd rate my skills at no more than 5 out of 10. I'm the first to admit I spend a lot of time Googling when I'm tweaking CSS, and I'm sure I'm not the only one. I've only ever come across one Java-world developer who could craft elegant, cross-browser CSS solutions with appropriate use of semantic HTML, and he was an exceptional front-end guy who could make PhotoShop walk around on its front paws, and write clean, performant JavaScript while balancing a ball on his head. Seriously.

So why do otherwise-competent software developers find it so hard to produce good CSS?

width: doesn't really mean width: The W3C's box model might be intuitive to some, but ask a typical software developer to draw a box with width: 100px; border: 5px; and I can virtually guarantee that the absolute width of the result will be 100 pixels while the internal (or "content" in W3C-speak) width will be 90 pixels. Given this, it becomes slightly easier to forgive Microsoft for their broken box model in IE5

Inconsistent inheritance: As OO developers, we've come to expect every property of an object to be inherited by its children. This is not the case in CSS, which can lead to a non-DRY sensation that is uncomfortable

It's a big API: Although there is a lot of repetition (e.g.: border; border-top; border-top-width; border-top-color; border-left; border-left-style; etc etc etc) there are also tons of tricky shortcuts which behave dramatically differently depending on the number of "arguments" used. Compare border-width: thin thick;

to border-width: thin thin thick;

to border-width: thin thin thin thick;

You can't debug CSS Selectors: The first move of most developers when they have to change some existing styling is to whack a background-color: red; into the selector they think should be "the one". And then have to hunt around a bit more when their target div doesn't turn red ...

Semantic, understandable and succinct?!?!: Most developers understand that using CSS classes with names like boldface is not cool, and nor is using identifiers called tabbedNavigationMenuElementLevelTwo - but getting the damn thing even working is hard enough without having to wonder if the Gods of HTML would sneer at your markup...

Friday, 15 April 2011

Safely Ignored

After attempting for almost two weeks to get through to the Australian Taxation Office's main phone line, I completed a 2-minute automated process and had the dubious satisfaction of being told that "future request letters can be safely ignored".

Which got me thinking about extending that metaphor to humans. So without further ado, allow me to present my list of People who can be Safely Ignored:

Search Engine Optimization "experts" - Let's be honest, by "Search Engine" you mean Google, and if you use said search engine to search for improve google pagerank you'll get Google's OWN ADVICE. Now use the money you saved on an SEO expert to make your content worth looking at.

Excel-holics - People who insist on copying the up-to-date data from a web page in order to have a local copy that will get stale, just so they can be in their familiar rows-and-columns walled garden. It's madness, and we all know the typical error rates in hacked-together spreadsheets ...

iPhone Bandwagoneers - The "Jesus Phone" as it's known over at The Register is a competent and capable smartphone. That's all. On several occasions I have been amused by a flustered hipster desperately asking "does someone have an iPhone I can borrow?" - meaning "I need to go to a website (but I'm scared to be seen using a non-Apple product)". Sad.

Microsoft - They've jumped the software shark. With nothing worthwhile on the desktop since Windows XP and a mobile OS that manages to make high end hardware perform like a low-end $29 outright phone from the post office.

Tuesday, 12 April 2011

Sophistication via Behavioural Chaining

A Nice Pattern

A few years ago I had the pleasure of working with a truly excellent Java developer in the UK, Simon Morgan. I learnt a lot from looking at Simon's code, and he was a terrific guy to boot. One of the really eye-opening things he was doing was obtaining very sophisticated behaviour by stringing together simple evaluation modules.

This is just like how programmers have always solved problems - breaking them down into manageable chunks - but goes much further. We're not solving a problem here (well we are, but it sorta falls out the end rather than being explicit), rather, we are approximating the sophistication of an expert human being's behaviour when solving the problem. Wow.

Simon was using a hybrid of two patterns; Chain of Responsibility; and Strategy. The basic approach was to iterate over an injected list of Strategy implementations, where the Strategy interface would normally be as simple as:

Operand applyTo(Operand op);

BUT instead of returning a possibly-modified Operand, he defined a Scorer interface that looked like this:

float determineScore(Scenario scenario);

Individual Scorers can be as simple or as complicated as required. For Simon's particular case, each one tended to inspect the database, looking for a particular situation/combination, and arrive at a score based on how close that was to the "ideal". For this, it made sense to have an AbstractDatabaseAccessingScorer which every Scorer extended.

The float that each scorer returned multiplied a running total, that started at 1.0. At the end of a scoring run, a possible Scenario would have a score - somewhere from 0.0 to 1.0. Some aspect of the Scenario would then be tweaked, and the score calculated again. At the end of the evaluation run, the highest-scoring Scenario would be selected as the optimal course of action.

While this worked very well, Simon realised that in developing his Scorers, he'd unwittingly assigned some of them lower importance, by getting them to return scores only {0.0, 0.5} for example. He went on to refactor this out, and instead each Scorer was required to provide a {0.0, 1.0} score, and assigned a weight multiplier, so that some Scorers could be given greater power in influencing the choice of Scenario. This really boosted the power and subtlety of the system - to the extent that he started logging his scoring runs profusely in order to get some understanding of how his home-grown "neural net" was coming up with some results.

Often, the choice of winning scenario was a matter of choosing between final scores of 0.00000123 versus 0.00000122 - when dealing with such close decisions, it was worthwhile flagging the situation to allow a human to examine it and possibly tweak some weight modifiers to get the optimal outcome. In time, this would lead to even better approximation to an expert human's selection behaviour.

We never came up with a name for this pattern, but it has always stuck in my mind as a nice one (albeit with limited applications). Evaluator Chain seems to sum it up fairly well, and I'm working on a library that will give a convenient, templated API for domain-specific implementations, the first of which will be the selection of a winning sports team based on past performance data.

So if this is my last blog post, you'll know I've cracked it and made my fortune in sports betting ...

Friday, 8 April 2011

NetVista Update

"Hey, how's that ultra-small, ultra-quiet, ultra-low-power app server going?" I don't hear you ask.

Awesome, thanks for not asking!

  root@netvista1 # uptime
    20:12:01 up 166 days, 3:11, load average: 0.10, 0.03, 0.02
  root@netvista1 #

Yep, that little black box has been a great success. Any effect on the household electricity bill has been negligible, and it's been a fantastic platform to host public-facing (if untrafficked) Java experiments.

I'm in the process of bringing up netvista2, emphatically not as a load-balanced production server but rather as a public-facing "staging" server so I can begin A/B production environment switching as part of a Continuous Delivery approach.

Something else I want to look at is using Java SE for Embedded to give a bit more RAM space for webapps. Stay tuned.

Tuesday, 5 April 2011

Whoooosh! Part 2

Making it happen

YSlow gave this site a solid 'B' for performance, but I knew it could be better.

The main culprit was the lack of expiry headers on resources, meaning the browser had to reload them each page visit. Dumb. I was shocked to find that Tomcat had no easy way to get this going, but this has now been rectified in Tomcat 7.

Next up was optimising my images. YSlow can helpfully slurp your site through Yahoo's smush.it service which saved me a good 25% in download cruft. Win.

"Sprited" CSS images were something I knew I wanted to do, but had yet to find a nice tool to make it happen. Now that SmartSprites is a reality, I've been using it to both optimise the images and generate the CSS that controls them. I'm also planning on using JAWR (which utilises SmartSprites) as part of my standard toolchain for Java web development, pushing out minified, optimised, spritified HTML/CSS/JS content whenever the build target is not "dev".

Things were going well, and I was up to 'A' Grade, but my "favicon" was still being re-downloaded each-and-every page visit. I had to tell Tomcat that it was an image, and then the Expires Filter could deal with it as per all the others:

  (in conf/web.xml):

  <mime-mapping>
    <extension>ico</extension>
    <mime-type>image/x-icon</mime-type>
  </mime-mapping>

The final step was getting in under Pebble's covers and tweaking the page template, such that:

Javascript sits at the bottom of the page; and
Practically-useless files (like an empty print.css) are not mentioned

And with that, I'm currently scoring 95%. I'm losing the 5% for not gzipping my content (which will probably have been remedied by the time you read this), and having too many separate JavaScript files (a fact I need to investigate further - where is this blog even using JavaScript?) - but I'm happy; my little corner of the web is a snappier place.

Friday, 1 April 2011

Whoooosh! Part 1

When was the last time a computer really amazed you with its speed?

So much power, all those cores, so many Gigahertz, tens of megabits per second and yet we still spend a lot of our time watching spinners/throbbers/hourglasses/progress meters. Why?

I'd have to say the last bastion of impressive speed is Google. Nobody else is doing breathtaking displays of performance any more, and it saddens me.

Growing up in the golden age of 8-bittery, my first experiences of computers involved painful cassette-based loads (not to mention hugely-unreliable saves), that took tens of minutes. Next stop was an Apple IIe, and the speed and reliability of its (single-sided) 5.25" drive made the frequent PLEASE INSERT SIDE 2 operations seem worthwhile.

My first hard-disk experience was at the other end of a high-school Farallon PhoneNet and it was barely quicker than the floppy drive in the Mac Plus I was accessing it from, but it was definitely an improvement.

Next stop was my Dad's 486 with a whopping 120Mb of Western Digital Caviar on its VESA Local Bus. What a beast - that thing flew through Write on Windows 3.11! Moore's Law (or at least the layman's interpretation of it - that raw speed doubles every 18 months) seemed to be in full effect; the world was an exciting place.

But then it ran Windows 95 like a dog. My Pentium 60 was definitely better (especially after I upped its RAM from 8 to 40Mb) but that snappiness was gone again when Windows 98 got its claws into it.

Now the internet's come along, and we've gone through the whole Intel giveth, Microsoft taketh away cycle again, but this time ADSL connections and Apache give, and poor HTML compliance, bloated, inefficient Javascript and whopping images are making it feel like the dial-up days are still here.

When I first started hacking together web pages, I would copy them in their entirety onto a floppy disk (remember those?) and load them into the browser from there for a taste of dialup speed. It worked really well for spotting places where I could get content to the eyeball faster.

If you are putting anything on the web, please do the modern-day equivalent and run YSlow against your stuff. And let's get that whoooosh back on the web!