Wednesday 7 November 2018

Green Millhouse - The Broadlink Binding part 3

Apologies to people who were interested in my Broadlink binding for OpenHAB 2.0 ... I migrated my code to the 2.4.0 family and things apparently stopped working. I was short on spare time so assumed something major had changed in the OpenHAB binding API and didn't have time to investigate. Turns out it was actually my own stupid fault, refactoring to clean up the code, I'd managed to cut out the "happy path" that actually updated OpenHAB with values from the device <facepalm />.

Anyway, back into it now with a couple of extra things rectified as well:
This version also keeps track of the ScheduledFuture that is used to periodically poll the Broadlink device, and correctly calls cancel() on it if the Broadlink binding is getting disposed. This should put an end to those
...the handler was already disposed
errors flooding the logs.

This version finally addresses the "reconnect" issues; i.e. when a device unexpectedly falls off the network, re-authentication with it would, more often than not, fail. The fix was to reset most of the state associated with the device once we lose contact with it. In particular, the packet counter, deviceId and deviceKey variables that we obtained from the previous authentication response.

As a result of this, my A1 Environmental Sensor seems to be working in all scenarios. Next up is long-term robustness testing while sorting out device discovery, making sure my RM3 Mini works, and checking the multiple heterogeneous devices will all play nicely together.

This version reflects my testing with my RM3 Mini IR blaster. This device seems more sensitive to certain parameters in the authentication procedure - and it turned out that resetting the packet counter (as introduced in BETA-3) is not needed and can actually cause re-authentication to fail for this device. So there's actually less code in this release, but more functionality - which is always nice.
Device discovery is also working now too. From the OpenHAB Paper UI, go to
Configuration -> Things -> '+' -> Broadlink Binding and wait 10 seconds for the network broadcast to complete. If you have several Broadlink devices active on your network it may take a couple of scans to pick them all up.

Small refactors for code cleanliness, e.g. grouping all networking functions together in Also added specific polling code for the SP3 smart switch device; prior to now this was using the same polling code as an SP2 device, but based on comments on this post from Jorg, it has its own function now, with extra diagnostic logging to hopefully pinpoint what is going on. I suspect that it doesn't reply in the same way as an SP2, so this is an "investigative" release; expect BETA-6 to follow with (hopefully) fixed support for the SP3. I may have to resort to buying one myself if we can't sort it out over the internet ;-)

Fixed misidentification of an SP3 switch as an SP2 (appears to have been a typo in the original JAR). After investigating Jorg's incorrect polling issue, I took a look over at the python-broadlink module code for inspiration, and lo and behold, there was more going on there. This module seems to have had a lot of love (42 contributors!) and seems to have addressed all the quirks of these devices, so I had no hesitations in implementing the changes; namely that testing payload[4] == 1 is not sufficient. The Python code also checks against 3 and 0xFD. Looks like the LS bit of that byte is the one that matters, but if this works, that's fine!

Added a ton more weird-and-wonderful variants of the RM2, courtesy of the aforementioned Python library. Broadlink seem to be constantly updating their device names (RM2, RM2 Pro Plus, RM2 Pro Plus R1, RM2 Pro Plus 2, etc etc), giving a new identification code to each one. I can't fathom a method to their coding scheme, so for the moment, we just have to play catch-up. In the case where we can't identify what seems to be a Broadlink device, I'm now logging (at ERROR level) the identification code we found, so that people out there can feature-request the support for their device.

With thanks to excellent Github contributor FreddyFox, querying SP2/SP3 switch state should now be working. The old code attempted to decode the state of the switch from the encrypted payload, when of course it needs to be decrypted first. Thanks again FreddyFox!

The main feature of this version is support for Dynamic IP Addresses.
There is a new switch available under the advanced Thing properties (open the SHOW MORE area in PaperUI) - it’s marked “Static IP” and it defaults to ON. If you are on a network where your Broadlink device might periodically be issued a different IP address, move it to the OFF position.
Now, if we lose communications with this Thing, rather than mark it OFFLINE, we’ll go into a mini-Discovery mode, and attempt to find the device on the network again, using its MAC address (which never changes). If we find the device, we update its current IP address in its Thing config, and everything continues without a hiccup. If we fail to find it, it goes OFFLINE as normal.
I should give credit to the LIFX binding which uses a very similar scheme - by chance I happened to come across it and realised it would be great for Broadlink devices too. As an extra bonus, it prompted me to redesign the entire discovery system and make it asynchronous; as a result it is now MUCH more reliable at finding devices (particularly if you have multiple Broadlink devices on your network) without having to scan repeatedly.

This version fixes a few small issues:
  • RM3 devices can now have a polling frequency specified (this was an omission from the thing-types.xml configuration file)
  • Thing logging extracted to its own class and improved: Device status ONLINE/OFFLINE/undetermined shown in logs as ^/v/?
  • Each ThingHandler's network socket is now explicitly closed when we lose contact with the device. This seems to help subsequent reconnections.

This version attempts to improve general reliability. The Broadlink network protocol always acknowledges every command sent to a device, so logically, the sendPacket() and receivePacket() functions have been coalesced to sendAndReceivePacket(). This in turn allowed for a simple retry mechanism to be implemented. If we time out waiting for a response from the device, we immediately retry, sending the packet again. Together with some improved logging, this should hopefully be enough to fix (or at least understand) devices prematurely being marked "offline" on unreliable networks.

Here's the latest JAR anyway, put it into your addons directory and let me know in the comments how it goes for you: