The Millhouse Group Blog: facepalm

Showing posts with label facepalm. Show all posts

Sunday, 8 September 2024

UnSAFe at any speed

I first encountered the [Scaled Agile Framework for enterprise](https://scaledagileframework.com/), (from here-on referred to as SAFe) in 2014, when the large enterprise I was working for decided it was what was needed in order to ship solutions faster. (*Spoiler:* it wasn't, it didn't help in the slightest, it made us considerably slower, at considerable expense). I'll let you peruse their website at your leisure, but before you go, remember the tenets of the [Agile Manifesto](https://agilemanifesto.org/) (emphasis mine): - Individuals and interactions over *processes* and tools - Working software over comprehensive *documentation* - Customer collaboration over contract negotiation - Responding to change over *following a plan* Now look at this, the ***SAFe 6.0*** I-don't-even-know-what-to-call-it diagram:

All I see is prescriptive *processes*, *documentation* and *plans*. You don't "do" agile development by signing up for expensive certifications that basically allow you to continue to ship 4-times-a-year but call yourself an agile workplace when you're recruiting. You also won't fool *anyone* which half a clue during the recruitment process. Just one more example. This is grabbed from a PDF from [Accenture](https://sai2.wpengine.com/wp-content/uploads/delightful-downloads/2020/01/Key_Accenture_Learning_on_Scaled_and_Distributed_Agile_August-18-for-SAFe.pdf), who are [fully on-board with SAFe](https://scaledagile.com/case_study/accenture/), I suspect because: - Acronym - Includes the word Agile - You can get certified for it (at great expense) - It seems to make sense if you don't look at it too closely - It actually makes you go slower than before (more billable hour$) Ready? Here we go. This is how easy it is to "integrate" agile and waterfall:

So simple! It's super-cool that they like, don't have any interdependencies whatsoever! So clean!

Tuesday, 21 May 2024

Facepalm: Vanity email, insanity-email

Wow. Long time no [facepalm](https://blog.themillhousegroup.com/search?q=facepalm). Guess I must be in the right job! This was a good one though. So at work when a new customer signs up, one of the *many* things we do is create an [Auth0](https://auth0.com/) account for them. It's really just a "shell", with nothing of any value in it, but it gives them a stable identity to build other stuff off. To create such a shell account we just need their email address, and we conjure up a [random UUID](https://developer.mozilla.org/en-US/docs/Web/API/Crypto/randomUUID) to use as their password. This has worked flawlessly for *tens of thousands of customers*. Then, today, it didn't. Auth0 gave us: ```PasswordNoUserInfoError: Password contains user information``` I'm sorry, what? A certain amount of back-and-forth ensued with the devs who feed-and-water Auth0. It turns out there's a rule in Auth0 that is trying to avoid users including part of their username in their password. You know, how Granny likes her credentials to be `grangran@hotmail.com / grangran`. So this *particular* customer had a custom "vanity" domain (which I will change for the sake of privacy) and was using a single letter as their email address; e.g.: ```d@dangermouse.com``` *(not their real address)* And the Auth0 check was thus exploding if it found ***any instance of `d` in the random UUID password***. A [quick check](https://stackblitz.com/edit/node-yab2dv?file=index.js) shows that *~85% of UUIDs* generated by [Node's `crypto/randomUUID`](https://developer.mozilla.org/en-US/docs/Web/API/Crypto/randomUUID) will contain a `d`. **Facepalm.**

Sunday, 14 March 2021

Fixing internet access failures after upgrading to MacOS 10.14 Mojave

I innocently upgraded my 2012 Macbook Pro to Mojave (10.14) from High Sierra (10.13) purely because HomeBrew was refusing to let me install any new goodies on such an old OS. The upgrade itself was painless and everything seemed to be in good shape until I tried to access the wider Internet (access to my own local network was completely fine).

Now Internet access problems are extra-infuriating, because "Googling for the solution" instantly becomes far more of a palaver. An interesting quirk of this particular issue was that using my iPhone's "Personal Hotspot" worked perfectly with the Mac. Only accessing the internet directly through the home WiFi was problematic, and other devices were of course completely fine.

So to recap the scenario (and hopefully give this post some Google-ju to help anyone else in this situation), here's what we had:

Macbook Pro was running OSX High Sierra (10.13) with no prior network problems
Network has been carefully set up and tuned for "meshed" operation
Local DNS services provided by DNSMasq running on a Raspberry Pi
DHCP services provided by DNSMasq, handing out fixed addresses to "known" devices like the Macbook Pro
All local network services working fine - e.g. accessing router admin webpages
Internet access when tethered to an iPhone works fine (Mac gets assigned 172.20.10.2, iPhone is 172.20.10.1)
Internet access through wired Ethernet (via a Thunderbolt adapter) works fine
Internet access on home Wifi completely fails; no DNS, no ping, no browsing

After a frustrating day, I decided to blow through the Mojave upgrade in case it had been fixed in Catalina (10.15) ... it hadn't.

I went through probably a dozen different iterations/versions of the "recreate your network stack"/reboot/reset PRAM and/or SMC cycle; nothing was helping, and I was still mystified by how the wired Ethernet was working immaculately while the WiFi failed to get "out" of the local network.

Comparing settings with my wife's Macbook that had perfect Wifi yielded nothing, so just for the hell of it, I decided to check the gateway router's settings one last time to see if for some reason I had a special route or firewall rule set for the Macbook Pro's Wifi IP address (10.240.0.72) only, which would explain why the wired connection (10.240.0.77) had no problems. It wouldn't normally be something I'd do, but given the output of traceroute was showing that the hop to the gateway was working in all cases, I had to start suspecting that the MacOS internal routing of packets was not faulty ...

And lo and behold, WHAT IS THIS? On my (ISP-provided) TP-Link router's Security/DoS prevention page, a place I have rarely if ever visited ... My laptop's Wifi IP address has been automatically added to a blocklist for attempted DoSing of my network ... from the inside ?!?!

I removed the blocklist entry and instantly everything worked perfectly. Oh dear. I have no idea when my laptop started absolutely hammering pings/UDP/TCP SYNs but it must have been at some point during or immediately after the upgrade to Mojave, and the router did what it was configured (completely by default) to do when it saw more than 3600 packets/sec coming from my laptop. Wow.

Reading the rather scant documentation for this router feature indicates that it uses up (extremely precious) router CPU so I've turned this feature totally off - internal DoS prevention seems like a waste of time to me.

Addendum: Disabling SPI and DoS detection/prevention has massively sped up using the router's web UI so I can only imagine it's working wonders for the overall performance. So I guess it's been a win ... overall 😕

Saturday, 29 June 2019

Things I will never do again

Again reflecting on twenty years in the software development industry, there are a number of things I can be pretty certain I will never do again. Either because "we don't do things like that any more" or because I will simply refuse. Sadly, it's often the latter!

In no particular order:

Work on a codebase with tens/hundreds of thousands of lines of code and no unit tests
Deploy production code to a Windows Server that faces the Internet
Build production code on my development machine
Work alongside someone whose title is XML Architect
Use JBoss, or indeed any "Application Server"
Do meaningful work on a project without signing a contract and/or being paid for at least some portion of it
Have to wear a suit and tie
Have to log into Jenkins slave machines to delete files because they've run out of disk
Configure builds in Jenkins by tediously pointing and clicking
Work in a building in the centre of a big city, where an entire floor is devoted to hosting and running racks of servers, storage, switches etc
Have to ensure that a website works on a browser with JavaScript disabled because "accessibility"
Write code that has no tangible benefit to users, but will "game" an executive's KPIs in order to achieve a bonus

Monday, 11 January 2016

Facepalm 2016

The newest entry in my (very) occasional series of career facepalm moments comes from this new year. My current project is using Scala, Play, MongoDB and the Pac4J library for authentication/authorization with social providers like Google, Facebook, Twitter etc. It's a good library that I've used successfully on a couple of previous projects, but purely in the "select a provider to auth with" mode. For this project, I needed to use the so-called HTTP Module to allow a traditional username/password form to also be used, for people who (for whatever reason) don't want to use social login. As an aside, this does actually seem to be a reasonably significant portion of users, even though it is actually placing more trust in an "unknown" website than delegating off to a well-known auth provider like Facebook. But users will be users; I digress.

Setup for Failure

The key integration point between your existing user-access code and pac4j's form handling is your implementation of the UsernamePasswordAuthenticator interface which is where credentials coming from the input form get checked over and the go/no-go decision is made. Here's what it looks like:

public interface UsernamePasswordAuthenticator 
    extends Authenticator<UsernamePasswordCredentials> {

    /**
     * Validate the credentials. 
     * It should throw a CredentialsException in case of failure.
     *
     * @param credentials the given credentials.
     */
    @Override
    void validate(UsernamePasswordCredentials credentials);
}

An apparently super-simple interface, but slightly lacking in documentation, this little method cost me over a day of futzing around debugging, followed by a monstrous facepalm.

Side-effects for the lose

The reasons for this method being void are not apparent, but such things are not as generally frowned-upon in the Java world as they are in Scala-land. Here's what a basic working implementation (that just checks that the username is the same as the password) looks like as-is in Scala:

object MyUsernamePasswordAuthenticator 
    extends UsernamePasswordAuthenticator {

  val badCredsException = 
    new BadCredentialsException("Incorrect username/password")

  def validate(credentials: UsernamePasswordCredentials):Unit = {
    if (credentials.getUsername == credentials.getPassword) {
      credentials.setUserProfile(new EmailProfile(u.emailAddress))
    } else {
      throw badCredsException
    }
  }
}

So straight away we see that on the happy path, there's an undocumented incredibly-important side-effect that is needed for the whole login flow to work - the Authenticator must mutate the incoming credentials, populating them with a profile that can then be used to load a full user object. Whoa. That's three pretty-big no-nos just in the description! The only way I found out about this mutation path was by studying some test/throwaway code that also ships with the project.

Not great. I think a better Scala implementation might look more like this:

object MyUsernamePasswordAuthenticator 
    extends ScalaUsernamePasswordAuthenticator[EmailProfile] {

  val badCredsException = 
    new BadCredentialsException("Incorrect username/password")

  /** Return a Success containing an instance of EmailProfile if  
   * successful, otherwise a Failure around an appropriate 
   * Exception if invalid credentials were provided
   */
  def validate(credentials: UsernamePasswordCredentials):Try[EmailProfile] = {
    if (credentials.getUsername == credentials.getPassword) {
      Success(new EmailProfile(u.emailAddress))
    } else {
      Failure(badCredsException)
    }
  }
}

We've added strong typing with a self-documenting return-type, and lost the object mutation side-effect. If I'd been coding to that interface, I wouldn't have needed to go spelunking through test code.

But this wasn't my facepalm.

Race to the bottom

Of course my real Authenticator instance is going to need to hit the database to verify the credentials. As a longtime Play Reactive-Mongo fan, I have a nice little asynchronous service layer to do that. My UserService offers the following method:

class UserService extends MongoService[User]("users") {
  ...

  def findByEmailAddress(emailAddress:String):Future[Option[User]] = {
    ...
  }

I've left out quite a lot of details, but you can probably imagine that plenty of boilerplate can be stuffed into the strongly-typed MongoService superclass (as well as providing the basic CRUD operations) and subclasses can just add handy extra methods appropriate to their domain object.
The signature of the findByEmailAddress method encapsulates the fact that the query both a) takes time and b) might not find anything. So let's see how I employed it:

def validate(credentials: UsernamePasswordCredentials):Unit = {
  userService.findByEmailAddress(credentials.getUsername).map { maybeUser =>

    maybeUser.fold(throw badCredsException) { u =>
      if (!User.isValidPassword(u, credentials.getPassword)) {
        logger.warn(s"Password for ${u.displayName} did not match!")
        throw badCredsException
      } else {
        logger.info(s"Credentials for ${u.displayName} OK!")
        credentials.setUserProfile(new EmailProfile(u.emailAddress))
      }
    }
  }
}

It all looks reasonable right? Failure to find the user means an instant fail; finding the user but not matching the (BCrypted) passwords also results in an exception being thrown. Otherwise, we perform the necessary mutation and get out.

So here's what happened at runtime:

A valid username/password combo would appear to get accepted (log entries etc) but not actually be logged in
Invalid combos would be logged as such but the browser would not redisplay the login form with errors

Have you spotted the problem yet?

The signature of findByEmailAddress is Future[Option[User]] - but I've completely forgotten the Future part (probably because most of the time I'm writing code in Play controllers where returning a Future is actually encouraged). The signature of the surrounding method, being Unit, means Scala won't bother type-checking anything. So my method ends up returning nothing almost-instantaneously, which makes pac4j think everything is good. Then it tries to use the UserProfile of the passed-in object to actually load the user in question, but of course the mutation code hasn't run yet so it's null- we're almost-certainly still waiting for the result to come back from Mongo!

**Facepalm**

An Await.ready() around the whole lot fixed this one for me. But I think I might need to offer a refactor to the pac4j team ;-)

Tuesday, 12 October 2010

Facepalm Moments, Part One

Self-defeating updates

My gig at the satellite-vehicle-tracking company was an interesting experience. A great deal of stuff was seat-of-the-pants, looks-like-it-works-ship-it kinda software because the CEO had put his house on the line and every week we didn't have a solution put him a week closer to being homeless.

Corners were cut. Unit and integration testing was non-existent. As soon as a (flaky as hell) Version 1.0 was in the wild, we operated in what The Daily WTF calls a Developmestruction environment. Our production server was a Windows Server 2003 box that was infected with so many trojans that Internet Explorer took minutes to start up - but we couldn't afford to clean it and reboot in case it never came back.

Good times, good times ;-)

Anyway, the subject of this classic *facepalm* moment from my career is actually the embedded code that ran in the little boxes that got installed underneath car dashboards. It was pretty standard embedded C code - malloc() was banned, circular buffers ruled and everything ran off a gigantic while(1) loop in main.

The firmware actually worked pretty well. I'd managed to iron out all the tiny leaks that caused it to die when left to run overnight, and it dealt nicely with all of the weird and wonderful failure modes the attached GPRS modem could throw at it. (We had a simple serial link to the modem and treated it just like a standard dialup device, firing AT commands to it, setting up a PPP link and resetting it whenever the dreaded NO CARRIER message arrived - which was a lot, GPRS was pretty sketchy in Australia way back in '03). We'd gone through a couple of major version increments, adding some good features and desperately trying to tidy up the code as we went along, when we got to The Big One. Over-The-Air Firmware Upgrades. I had privately always thought that this feature should have been in the code since version one, but apparently there were always more lucrative things to add. I digress.

The process was pretty simple. We'd point the server-side code at a binary file and it would slice it up into packets using our proprietary protocol that ran on top of UDP, firing them one-by-one to the target device. Upon reception of a packet, the device would run a rudimentary checksum over it and ACK or NAK the segment - a NAK triggering a resend. If the segment had arrived intact, it would be copied out to flash, as there wasn't enough RAM to hold the entire firmware image. Once this extremely tedious process was complete (we'd typically push through about one packet per second, and the download used 200+ packets, with probably one-in-ten needing a resend), the box would run another checksum over the entire firmware image before firing off a special processor-specific machine code incantation that caused it to reflash itself and reboot.

So I'm attempting the first-ever over-the-air download with the aforementioned CEO (who designed the process on the back of a napkin) and things are going pretty well until we get to packet 189 of 220, at which point the device seems to lose all connection with the outside world and reboot its GPRS modem. "Oh well," we say "these things happen, let's kick it off again". Four-plus minutes later, the download dies again at packet 189. We go over the obvious possible causes in the code. There's nothing special about the number that would cause an overflow. There's plenty of flash memory left over. All of the circular buffer pointers look sensible. But because the flash-writing process is somewhat finicky, we're loath to attach a debugger and possibly introduce more weirdness. We're working almost entirely in the dark.

On a whim, I decide to fire up Ethereal (as it was then, now Wireshark) and inspect the packets my development box was sending to the device. And sure enough, we find the culprit lurking in packet 189:

    ...                                                            
    63 64 41 54 45 30 41 54 48 30 45 51 56 31 45 54    cdATE0ATH0ATV1AT 

    4D 30 4E 4F 20 43 41 52 52 49 45 52 41 54 44 54    M0NO CARRIERATDT
    ...

Yes, we'd got to the symbol table in the firmware binary, and the device code that was constantly scanning the receive buffer for "NO CARRIER" found exactly what it was looking for, promptly resetting the GPRS modem.

*Facepalm*