Monday, 23 April 2018

Green Millhouse - Fixing the OpenHAB BroadLink Binding (part 2)

Development of the Broadlink OpenHAB binding has been going very well, but with just one scenario (and arguably the most-common one) not working reliably. With the OpenHAB server up-and-running and the Broadlink binding installed and configured for a particular disconnected device, turn the device on.

The copious logging I've added to the binding told me that this was resulting in an error with code -7 when the device was first detected, and I was a little stumped as to how to proceed. As many before me have done, I started entering random semi-related strings into Google to see what might turn up. And I hit gold. The author of the Python Broadlink library did some great work documenting the Broadlink protocol, and while perusing it a particular sentence jumped out at me:

You must obtain an authorisation key from the device before you can communicate.

Of course! When the binding first boots it calls authenticate for each device it knows about, and it does the same when a device setting is changed. But it does not when, after finding a newly-booted device, tries to get its status!

I added an authenticated boolean in the base Handler class, only setting it when authenticate() succeeds and clearing it whenever we lose contact with the device. Any attempt to communicate with the device first checks the boolean and does the authenticate() dance if necessary. And it works like a charm. We're very nearly there now - I just want to DRY up some of the payload-handling methods which seem to have been copy-pasted, add a 'presence' channel that mirrors the internal state of the binding, and remove the heavy dependence on IP addresses, and I think it's good to go.

Saturday, 31 March 2018

Green Millhouse - Fixing the OpenHAB BroadLink Binding (part 1)

You can follow along at Github, but my rebuilding of the Broadlink OpenHAB binding is nearing completion.

I've been building and testing locally with my A1 Air Quality Sensor, and since fixing some shared-state issues in the network layer, haven't yet experienced any of the reliability problems that plagued the original binding.

For reasons that aren't clear (because I'm working from a decompiled JAR file), the original binding was set up like this in the base Thing handler (which all Broadlink Things inherit from):
public class BroadlinkBaseThingHandler extends BaseThingHandler {
   private static DatagramSocket socket = null;
   static boolean commandRunning = false;
   ...
}

As soon as I saw those static members, alarms started ringing in my head - especially when combined with an inheritance model, you've got a definite "fragile base class" problem at compile-time, and untold misery at runtime when multiple subclasses start accessing the socket like it's their exclusive property!

An attempt to mitigate the race-conditions which must have abounded, the `commandRunning` boolean only complicated matters:
    public boolean sendDatagram(byte message[])
    {
        try
        {
            if(socket == null || socket.isClosed())
            {
                socket = new DatagramSocket();
                socket.setBroadcast(true);
            }
            InetAddress host = InetAddress.getByName(thingConfig.getIpAddress());
            int port = thingConfig.getPort();
            DatagramPacket sendPacket = new DatagramPacket(message, message.length, new InetSocketAddress(host, port));
            commandRunning = true;
            socket.send(sendPacket);
        }
        catch(IOException e)
        {
            logger.error("IO error for device '{}' during UDP command sending: {}", getThing().getUID(), e.getMessage());
            commandRunning = false;
            return false;
        }
        return true;
    }

    public byte[] receiveDatagram()
    {
        try {
            socket.setReuseAddress(true);
            socket.setSoTimeout(5000);
        } catch (SocketException se) {
            commandRunning = false;
            socket.close();
            return null;
        }

        if(!commandRunning) {
            logger.error("No command running - device '{}' should not be receiving at this time!", getThing().getUID());
            return null;
        }

        try
        {
            if(socket != null)
            {
                byte response[] = new byte[1024];
                DatagramPacket receivePacket = new DatagramPacket(response, response.length);
                socket.receive(receivePacket);
                response = receivePacket.getData();
                commandRunning = false;
                socket.close();
                return response;
            }
        }
        catch (SocketTimeoutException ste) {
            if(logger.isDebugEnabled()) {
                logger.debug("No further response received for device '{}'", getThing().getUID());
            }
        }

        catch(Exception e)
        {
            logger.error("IO Exception: '{}", e.getMessage());
        }

        commandRunning = false;
        return null;
    }

So we got a pseudo-semaphore that is trying to detect getting into a bad state (because of shared-state), but itself is shared-state, thereby experiencing the same unreliability.

Here's what the new code looks like:
public class BroadlinkBaseThingHandler extends BaseThingHandler {
    private DatagramSocket socket = null;
    ...

    public boolean sendDatagram(byte message[], String purpose) {
        try {
            logTrace("Sending " + purpose);
            if (socket == null || socket.isClosed()) {
                socket = new DatagramSocket();
                socket.setBroadcast(true);
                socket.setReuseAddress(true);
                socket.setSoTimeout(5000);
            }
            InetAddress host = InetAddress.getByName(thingConfig.getIpAddress());
            int port = thingConfig.getPort();
            DatagramPacket sendPacket = new DatagramPacket(message, message.length, new InetSocketAddress(host, port));
            socket.send(sendPacket);
        } catch (IOException e) {
            logger.error("IO error for device '{}' during UDP command sending: {}", getThing().getUID(), e.getMessage());
            return false;
        }
        logTrace("Sending " + purpose + " complete");
        return true;
    }

    public byte[] receiveDatagram(String purpose) {
        logTrace("Receiving " + purpose);

        try {
            if (socket == null) {
                logError("receiveDatagram " + purpose + " for socket was unexpectedly null");
            } else {
                byte response[] = new byte[1024];
                DatagramPacket receivePacket = new DatagramPacket(response, response.length);
                socket.receive(receivePacket);
                response = receivePacket.getData();
//                socket.close();
                logTrace("Receiving " + purpose + " complete (OK)");
                return response;
            }
        } catch (SocketTimeoutException ste) {
            logDebug("No further " + purpose + " response received for device");
        } catch (Exception e) {
            logger.error("While {} - IO Exception: '{}'", purpose, e.getMessage());
        }

        return null;
    }    


A lot less controversial I'd say. The key changes:
  • Each subclass instance (i.e. Thing) gets its own socket
  • No need to track commandRunning - an instance owns its socket outright
  • The socket gets configured just once, instead of being reconfigured between Tx- and Rx-time
  • Improved diagnostic logging that always outputs the ThingID, and the purpose of the call


The next phase is now stress-testing the binding with a second heterogeneous device (sadly I don't have another A1, that would be great for further tests), my RM3 Mini IR-blaster. I'll be trying adding and removing the devices at various times, seeing if I can trip the binding up. The final step will be making sure the Thing discovery process (which is the main reason to upgrade to OpenHAB 2, and is brilliant) is as good as it can be. After that, I'll be tidying up the code to meet the OpenHAB guidelines and hopefully getting this thing into the official release!

Sunday, 25 February 2018

Green Millhouse - Temp Monitoring 2 - Return of the BroadLink A1 Sensor!

So after giving up on the BroadLink A1 Air Quality Sensor a year ago, I'm delighted to report that is back in my good books after some extraordinary work from some OpenHAB contributors. Using some pretty amazing techniques they have been able to reverse-engineer the all-important crypto keys used by the Broadlink devices, thus "opening up" the protocol to API usage.

Here's the relevant OpenHAB forum post - it includes a link to a Beta-quality OpenHAB binding, which I duly installed to my Synology's OpenHAB 2 setup, and it showed itself to be pretty darn good. Both my A1 and my new BroadLink RM3 Mini (wifi-controlled remote blaster) were discovered immediately and worked great "out of the box".

However, I discovered that after an OpenHAB reboot (my Synology turns itself off each night and restarts each morning to save power) the BroadLink devices didn't come back online properly; it was also unreliable at polling multiple devices, and there were other niggly little issues identified by other forum members in the above thread. Worst of all, the original developer of the binding (one Cato Sognen) has since gone missing from the discussion, with no source code published anywhere!

Long story short, I've decided to take over the development of this binding - 90% of the work has been done, and thanks to the amazing JAD Decompiler, I was able to recover the vast majority of the source as if I'd written it myself. At the time of writing I am able to compile the binding and believe I have fixed the multiple-device problems (the code was using one shared static Socket instance and a shared mutable Boolean to try and control access to it...) and am looking at the bootup problems. And best of all, I'm doing the whole thing "in the open" over on Github - everyone is welcome to scrutinise and hopefully improve this binding, with a view to getting it included in the next official OpenHAB release.

Thursday, 25 January 2018

OpenShift - the 'f' is silent

So it's come to this.

After almost-exactly four years of free-tier OpenShift usage for Jenkins purposes, I have finally had to throw up my hands and declare it unworkable.

The first concern was earlier in 2017 when, with minimal notice, they announced the end-of-life of the OpenShift 2.0 platform which was serving me so well. Simultaneously they dropped the number of nodes available to free-tier customers from 3 to 1. A move I would have been fine with if there had been any way for me to pay them down here in Australia - a fact I lamented about almost 2 years ago.

Then, in the big "upgrade" to version 3, OpenShift disposed of what I considered to be their best feature - having the configuration of a node held under version control in Git; push a change, the node restarts with the new config. Awesome. Instead, version 3 handed us a complex new ecosystem of pods, containers, services, images, controllers, registries and applications, administered through a labyrinth of somewhat-complete and occasionally-buggy web pages. Truly a downgrade from my perspective.

The final straw was the extraordinarily fragile and flaky nature of the one-and-only node (or is it "pod"? Or "application"? I can't even tell any more) that I have running as a Jenkins master. Now this is hardly a taxing thing to run - I have a $5-per-month Vultr instance actually being a slave and doing real work - yet it seems to be unable to stay up reliably while doing such simple tasks as changing a job's configuration. It also makes "continuous integration" a bit of a joke if pushing to a repository doesn't actually end up running tests and building a new artefact because the node was unresponsive to the webhook from Github/Bitbucket. Sigh.

You can imagine how great it is to see this page when you've just hit "save" on the meticulously-detailed configuration for a brand new Jenkins job...

So, in what I hope is not a taste of things to come, I'm de-clouding my Jenkins instance and moving it back to the only "on-premises" bit of "server hardware" I still own - my Synology DS209 NAS. Stay tuned.

Friday, 29 December 2017

The best feeling in software development?

I've been spending a lot of time in Javascript-land these days and while it's pretty exciting, I sure do miss some of the luxuries that really emphasise the maturity and elegance of statically-typed Scala.
Here's an example. It's incredibly-simple, but one of the most satisfying things in software development in my opinion. First, a quick definition:
trait Insertable {
  val insertedAt: Long
} 

And now, the source of all the joy:
def findMostRecentlyInserted(xs:Set[Insertable]):Option[Insertable] = {

}

Yep, that's it. An empty function. It won't even compile.

But this moment, right here, with type signatures locked down and the cursor flashing in the right spot, is where your brain can finally shift up from boilerplate mode into full power. How am I going to get from that set of candidates to the right one? What will be the fastest way? Will it read well? How should I test this?

If I've got my "good boy" hat on I'll stop here and write a few tests, which will of course fail. But sometimes, I'll just let the Scala compiler guide me - a function this simple is a great example of where strong typing really helps prevent errors. Anyone else with me on this?

Thursday, 30 November 2017

bloop - Initial Thoughts

So the Scala Center just announced bloop - a tool completely focused on making the Scala edit/compile/test cycle as fast as possible.

This is awesome

SBT is a beast, but most of the time, its immense powers lie idle. I am totally happy to fire up one tool for checking/fetching dependencies, publishing to repositories and other such infrequent operations, and a different that is totally focused on coding.

bloop completely rings true with the UNIX philosophy (a tool should do one thing and do it well) which has time and again shown to be the best way to build systems; the key thing being composability of elements. I'm very excited about this new development which shows that Scala is truly a developer-focused language. Go Scala!

Sunday, 29 October 2017

Stack Evolution part 2

Referring back to my go-to stack from part 1 of this series:
Javascript JQuery, Moment.js, etc
"Presentation" JSON/AJAX HTML/LESS
Controllers Play (Reactive, Async)
Services Play - RESTful API calls Mondrian
Persistence MongoDB (via ReactiveMongo)

I am simply delighted with the performance, scalability, maintainability and reliability of the entire stack from the Controllers layer down - i.e. Scala, Play and Mongo. (Incidentally, I've been running these apps on Heroku with MongoDB provided by MLab, and they have been similarly excellent). So that will not be changing any time soon.

What is no longer tenable is the mixture of HTML (including form submissions), LESS and per-page Javascript. At the top of the table, (i.e. front-end technologies), there is just too much awesomeness happening in this space to ignore. To me, React.js is the current culmination of the best thinking in the front-end world. The way every concept is the most reduced-down thing that could work (as opposed to the competition's kitchen-sink approach) really makes it a pleasure to learn and use.

Currently I'm absolutely loving Create-React-App as a brilliant bootstrapper that continues to add value even once you're up and running. It's got finely-honed and sensible defaults for things like Webpack, is upgradeable in-place, is beautifully documented and is almost psychic in always offering good suggestions or output as to what it's just done, or what can be done next. I currently have no plans to "eject" Create-React-App from any of the front-end projects I'm working on - it's just too useful to keep around.

Into this mix I've also added React Cosmos - this is a "component showcase" system that allows a super-rapid way to see all of the possible "states" of a given React component. The React props that a component needs are specified in fixture files, and Cosmos supplies a nice web UI to browse around and check that changes made to a component are working well and looking good in all of its potential states. It works excellently with the hot-reloading facilities of Create-React-App and really helps nail down component interfaces.



Another element I'm using to try and keep front-end complexity in check is Styled Components. Go have a read of their Github page for the full run-down but basically, I can get the best of both worlds with global CSS used where appropriate, keeping it DRY, together with individual components that won't mess with each other. It also massively helps in stopping the "mental CSS selector" problems during refactoring as observed by Ryan Florence. Extremely cool.

So to summarise, here's my 2017-and-beyond software stack:

Javascript React.js (with Cosmos)
"Presentation" JSON/AJAX JSX/CSS/Styled Components
Controllers Play (Reactive, Async)
Services Play - RESTful API calls Mondrian
Persistence MongoDB (via ReactiveMongo)