Saturday 31 March 2018

Green Millhouse - Fixing the OpenHAB BroadLink Binding (part 1)

You can follow along at Github, but my rebuilding of the Broadlink OpenHAB binding is nearing completion.

I've been building and testing locally with my A1 Air Quality Sensor, and since fixing some shared-state issues in the network layer, haven't yet experienced any of the reliability problems that plagued the original binding.

For reasons that aren't clear (because I'm working from a decompiled JAR file), the original binding was set up like this in the base Thing handler (which all Broadlink Things inherit from):
public class BroadlinkBaseThingHandler extends BaseThingHandler {
   private static DatagramSocket socket = null;
   static boolean commandRunning = false;
   ...
}

As soon as I saw those static members, alarms started ringing in my head - especially when combined with an inheritance model, you've got a definite "fragile base class" problem at compile-time, and untold misery at runtime when multiple subclasses start accessing the socket like it's their exclusive property!

An attempt to mitigate the race-conditions which must have abounded, the commandRunning boolean only complicated matters:
    public boolean sendDatagram(byte message[])
    {
        try
        {
            if(socket == null || socket.isClosed())
            {
                socket = new DatagramSocket();
                socket.setBroadcast(true);
            }
            InetAddress host = InetAddress.getByName(thingConfig.getIpAddress());
            int port = thingConfig.getPort();
            DatagramPacket sendPacket = new DatagramPacket(message, message.length, new InetSocketAddress(host, port));
            commandRunning = true;
            socket.send(sendPacket);
        }
        catch(IOException e)
        {
            logger.error("IO error for device '{}' during UDP command sending: {}", getThing().getUID(), e.getMessage());
            commandRunning = false;
            return false;
        }
        return true;
    }

    public byte[] receiveDatagram()
    {
        try {
            socket.setReuseAddress(true);
            socket.setSoTimeout(5000);
        } catch (SocketException se) {
            commandRunning = false;
            socket.close();
            return null;
        }

        if(!commandRunning) {
            logger.error("No command running - device '{}' should not be receiving at this time!", getThing().getUID());
            return null;
        }

        try
        {
            if(socket != null)
            {
                byte response[] = new byte[1024];
                DatagramPacket receivePacket = new DatagramPacket(response, response.length);
                socket.receive(receivePacket);
                response = receivePacket.getData();
                commandRunning = false;
                socket.close();
                return response;
            }
        }
        catch (SocketTimeoutException ste) {
            if(logger.isDebugEnabled()) {
                logger.debug("No further response received for device '{}'", getThing().getUID());
            }
        }

        catch(Exception e)
        {
            logger.error("IO Exception: '{}", e.getMessage());
        }

        commandRunning = false;
        return null;
    }

So we got a pseudo-semaphore that is trying to detect getting into a bad state (because of shared-state), but itself is shared-state, thereby experiencing the same unreliability.

Here's what the new code looks like:
public class BroadlinkBaseThingHandler extends BaseThingHandler {
    private DatagramSocket socket = null;
    ...

    public boolean sendDatagram(byte message[], String purpose) {
        try {
            logTrace("Sending " + purpose);
            if (socket == null || socket.isClosed()) {
                socket = new DatagramSocket();
                socket.setBroadcast(true);
                socket.setReuseAddress(true);
                socket.setSoTimeout(5000);
            }
            InetAddress host = InetAddress.getByName(thingConfig.getIpAddress());
            int port = thingConfig.getPort();
            DatagramPacket sendPacket = new DatagramPacket(message, message.length, new InetSocketAddress(host, port));
            socket.send(sendPacket);
        } catch (IOException e) {
            logger.error("IO error for device '{}' during UDP command sending: {}", getThing().getUID(), e.getMessage());
            return false;
        }
        logTrace("Sending " + purpose + " complete");
        return true;
    }

    public byte[] receiveDatagram(String purpose) {
        logTrace("Receiving " + purpose);

        try {
            if (socket == null) {
                logError("receiveDatagram " + purpose + " for socket was unexpectedly null");
            } else {
                byte response[] = new byte[1024];
                DatagramPacket receivePacket = new DatagramPacket(response, response.length);
                socket.receive(receivePacket);
                response = receivePacket.getData();
//                socket.close();
                logTrace("Receiving " + purpose + " complete (OK)");
                return response;
            }
        } catch (SocketTimeoutException ste) {
            logDebug("No further " + purpose + " response received for device");
        } catch (Exception e) {
            logger.error("While {} - IO Exception: '{}'", purpose, e.getMessage());
        }

        return null;
    }    


A lot less controversial I'd say. The key changes:
  • Each subclass instance (i.e. Thing) gets its own socket
  • No need to track commandRunning - an instance owns its socket outright
  • The socket gets configured just once, instead of being reconfigured between Tx- and Rx-time
  • Improved diagnostic logging that always outputs the ThingID, and the purpose of the call


The next phase is now stress-testing the binding with a second heterogeneous device (sadly I don't have another A1, that would be great for further tests), my RM3 Mini IR-blaster. I'll be trying adding and removing the devices at various times, seeing if I can trip the binding up. The final step will be making sure the Thing discovery process (which is the main reason to upgrade to OpenHAB 2, and is brilliant) is as good as it can be. After that, I'll be tidying up the code to meet the OpenHAB guidelines and hopefully getting this thing into the official release!