Sunday, 28 April 2019

Whose Turn Is it? An OpenHAB / Google Home / now.sh Hack (part 3)

In this third part of my mini-series on life-automation via hacking home-automation, I want to show how I was able to ask our Google Home whose turn it was for movie night, and have "her" respond with an English sentence.

First a quick refresher on what we have so far. In part 1, I set up an incredibly-basic text-file-munging "persistence" system for recording the current person in a rota via OpenHAB. We can query and rotate (both backwards and forwards) the current person, and there's also a cron-like task that rotates the person automatically once a week. The basic pattern can be (and has been!) repeated for multiple weekly events.

In part 2, I exposed the state of the MovieNight item to the "outside world" via the MyOpenHAB RESTful endpoint, and then wrote a lambda function that translates a Google Dialogflow webhook POST into a MyOpenHAB GET for any given "intent"; resulting in the following architecture:

Here are the pertinent screens in Dialogflow where things are set up.

First, the "training phrases" which guide Google's machine-learning into picking the correct Intent:

On the Fulfillment tab is where I specify the URL of the now.sh webhook handler and feed in the necessary auth credentials (which it proxies through to OpenHAB):

From Integrations -> Google Assistant -> Integration Settings is where I "export" the Intents I want to be usable from the Google Home:

The final piece of the puzzle is invoking this abomination via a voice command. Within the Dialogflow console it is very straightforward to test your 'fulfillment' (i.e. your webhook functionality) via typing into the test panel on the side, but actually "going live" so you can talk with real hardware requires digging in a little deeper. There's a slightly-odd relationship between the Google Actions console (which is primarily concerned with getting an Action into the Actions Directory) and the Dialogflow console (which is all about having conversations with "agents"). They are aware of each other to a pretty-good extent (as you'd hope for two sibling Google products) but they are also a little confusing to get straight in your head and/or working together.

You need to head over to the Actions Console to actually "release" your helper so that a real-life device can use it. An "Alpha" release makes sure random people on the internet can't start using your private life automation software!

I really wanted to be able to ask the Google Assistant in a conversational style; "Hey Google, whose turn is it for Movie Night this week?" - in the same way one can request a Spotify playlist. But it turns out to be effectively-impossible to have a non-publicly-released "app" work in this way.

Instead the human needs to explicitly request to talk to your app. So I renamed my app "The Marshall Family Helper" to make it as natural-sounding as it can be. A typical conversation will now look like this:

Human: "Hey Google, talk to The Marshall Family Helper"

Google: "Okay, loading the test version of The Marshall Family Helper"

(Short pause)

{beep} "You can ask about Movie Night or Take-Away"

"Whose turn is it for movie night?"

(Long pause)

"It's Charlotte's turn"

{beep}

Some things to note. The sentence after the first {beep} is what I've called my "Table of Contents" intent - it is automatically invoked when the Marshall Family Helper is loaded - as discovery is otherwise a little difficult. The "short pause" is usually less than a second, and the "long pause" around 3-4 seconds - this is a function of the various latencies as you can see in the system diagram above - it's something I'm going to work on tuning. At the moment now.sh automatically selects the Sydney point-of-presence as the host for my webhook lambda, which would normally be excellent, but as it's being called from Google and making a call to MyOpenHAB, I might spend some time finding out where geographically those endpoints are and locating the lambda more appropriately.

But, it works!