The Millhouse Group Blog: December 2010

Friday, 17 December 2010

Taking A Break

I won't be posting for a while, although I have three good reasons to break my little Monday & Thursday pattern:

I'm off on honeymoon for the next eight days

Then there's Christmas

And then finally there's a new contract to sink my teeth into

So although it could well be a month before I get back here, I should at least be refreshed and inspired. Or something ...

Monday, 13 December 2010

A Big Push For Git

There is a better way

[

For about 12 months now, I've been running a Git source code repository on my Ubuntu server, and it's been working out great.

I had a small Subversion repository prior to that but decided to completely nuke it and start from scratch, so I can't talk about migration issues. What I can talk about is my fellow developers' attitudes when they found out that I was running a DVCS in a development shop with one seat, and a conventional client-server continuous-integration environment.

Hilarity:

"DVCSes are for big teams!"
"DVCSes are for people with laptops who get disconnected!"

Incredulity:

"But there's nothing wrong with Subversion!"
"Does it even work with a CI server?"

I understand why the new generation of version-control systems (Mercurial, Git, et al) feel the need to differentiate themselves from the previous generation, with that big 'D'. And yes, they do support a "highly-distributed" mode of operation very well. But this emphatically does not rule out using them in small, well-connected environments. Here's the silver bullet that shoots down the first 3 objections listed above:

With a DVCS, I can commit as many times as I like without those changes:

Being seen by anyone else
Triggering a build on my CI server

It doesn't sound like a big deal, but when you're working on a big change, it's really nice to be able to save incremental commits - possibly even (gasp!) broken commits without breaking the CI build. Your commit comment can summarise where you are in your refactor, and rolling back a failed experiment is trivially easy. When you are completely happy, just push the changes to the build box.

Please don't knock DVCSes until you've experienced the liberating yet confidence-inspiring feeling of local-commits combined with a continuous-integration server running a full build-and-test on "pushed" code. It really is the best of both worlds.

And as for toolchain support, Hudson's Git support works great, and I note that Atlassian's super-powerful Bamboo has added Git support. On the client side, I'm using the EGit plugin with good results - operationally, it feels almost exactly the same as Subversion, just with that extra all-important push option when it's time to show the build-box some new code.

As an aside, I plan on documenting the entire procedure for setting up a Git code repository and CI (Hudson) build server on the Ubuntu Server platform - for my reference as much as anything. Watch this space.

Friday, 10 December 2010

Old-skool, Nu-skool

I think it's pretty fair to say that software developers are pretty particular creatures. We like both our physical and virtual operating environments to be just so and the slightest variation or disturbance in these environments is hugely detrimental to our efficiency.

It struck me the other day that despite the constant advances in so many areas of software (and as an aside: how many other non-academic professions all-but-require the level of attentiveness to the latest developments in the field? Certain areas of medicine?), a number of the things developers like best simply have not changed in the last 15 years or more!

What am I talking about?

UNIX-style servers - once you're away from the .NET world, you just won't see anyone hosting anything on a Windows box

Command-lines - closely related to the first point, but developers will often still choose to type rather than click, even on their own desktop. Cygwin is clear evidence that a powerful command-line is a powerful developer tool.

vi[m] - it just never goes away. Developers who probably first used it in university - and what learning curve that is! - never forget those basic keystrokes, no matter how long it's been between <Esc>:wq's. And how quickly the power-moves come back. It's a text-editing supercharger.

Despite their age, these aspects of the ideal developer environment are actually cherished. Held up as shining examples of refined excellence. Not everything old is gold though ...

If you hear the unmistakable shriek of a dot-matrix printer, in an office, here in the second decade of the twenty-first century, JUST RUN.

Tuesday, 7 December 2010

A New Flavour of ORM

Object-Reality Mapping

Despite the best efforts of many, the Object-Oriented vs. Relational Database "impedance mismatch" is still with us. Don't get me wrong; the aforementioned projects and specifications have done sterling work abstracting away tons of tedious, error-prone, boilerplate rubbish so that we can concentrate on our domain model.

But there lies the problem.

Far too often, domain models are so elaborate, so needlessly-baroque and overengineered that actually squeezing them down into two-dimensional database tables takes quite extraordinary levels of hackery. Now that the persistence-layer tools are so good, some developers are failing to stop and think "hang on, just because we can have a seven-level inheritance tree and persist it, doesn't mean we should".

Here's a trivially simple example. The good-old User class. Just about every webapp has one. Let's look at the (imagined) requirements:

A User should have a login name, real name, email address and (hashed and salted) password
There should be a special class of User called Admin who has greater power to ~~mess up~~configure the system

At this point, 8/10 OO developers will get tremendously excited - they've spotted an actual situation where inheritance can be applied!. And we could even stick a Person on the top for people who aren't yet users!:

  Person
    |
    ^
    |
   User
    |
    ^
    |
  Admin

Gosh how exciting!

Except do we really need to model people who don't use the system? Will we ever? And is an Admin really a specialisation of a User? Isn't it actually a role that a user can play? We could persist that far more easily with a simple enumerated type. Or if we're really doing the simplest thing that can possibly work then couldn't we just have a boolean isAdmin flag?

Another example. Consider the classic Parent-Child relationship; let's use Course and Student as our concrete types. Traditional object modeling would have Course holding a collection of Student, with Student holding a reference back to Course. But now stop and think. If all this data is being persisted anyway, how often would we actually use that collection from Java-land? If we really need that information, it's a trivial DAO method away - the parent reference in the child maintains the referential integrity, so why use a bidirectional relationship?

With just a few small optimisations like those suggested above, a lot of the supposed "impedance" drops to zero. So the next time your square object model won't fit through the round RDBMS hole, cut a few corners!

Friday, 3 December 2010

Warning Shots Across The Bow

A Ten-Step Program

As a software contractor, it would be extremely unusual to have any say in your host company's hiring. Unfortunately, this means that the people who have arguably seen the "most action" (to use a military expression) have the least input in this critical component of a successful company.

As such, we generally get introduced to new faces with no background information - how much experience do they claim to have? Are they fresh to the programming language? Software development? The country?

I find it very educational to examine a new starter's first few code checkins. Probably not the absolute first, because they will be on their best behaviour, and possibly pair-programming. About the fourth or fifth submission, ideally a brand-new source file. You can tell a hell of a lot from the quality of this, and you can be pretty objective in scoring it. A new person starts with 10 points; knock off a point for each of these mistakes:

No comment block at the top of the code - a developer should be proud of their work and be happy to stand behind it/explain it/answer questions about it

Unused members - every IDE on earth will warn about this. The fact that they are checking in code without paying attention to such warnings (or worse, have turned them off) is a concern

Raw generic types - they've been in the core APIs for six years now - no-one should be writing code with "naked" Lists and Classes

Pointless boxing - returning a Boolean from a method that just contains if statements could indicate a lack of understanding about the costs of autoboxing

Copy/paste coding - yes, it's still out there. Any reluctance to write reusable code shows a disregard for future maintainability

Concrete parameters and return types - returning ArrayList is a warning that the correct use of interfaces is a foreign concept to the developer. This can be a hard habit to break for some.

if-y code - no, I don't mean code exhibiting dubious qualities! (Although it probably is) - I mean code with if statements nested more than 1-level deep. At best, it warns of a reluctance to refactor into small clean methods. At worst, it could indicate an inability to spot good targets for object-orientation.

Failure to use standard libraries - a lack of familiarity with any in-house utility classes can be excused. But there is no excuse for reinventing methods that belong in Commons StringUtils et al

Inefficient/awkward program flow - Are unchanging conditions being repeatedly re-evaluated? Is there an expensive calculation in the termination condition of an if statement? We may program in very-high level languages but efficient use of the CPU is always to be considered.

Wasteful data representation - this can occur both in redundant member variables but also in the choices made when serializing data for transmission "over the wire". The latter in particular is another warning sign that low-level efficiency was not a consideration.

So what do you do when your new colleague scores a 4? I'm still formulating my thoughts about that! It's not an easy situation, that's for sure, and will be the subject of a future post.