Saturday, 26 November 2022

AWS Step Functions - a pretty-good v1.0

I've been using Amazon's Step Functions functionality a fair bit at work recently, as a way to orchestrate and visualise a migration process that involves some Extract-Transform-Load steps and various other bits, each one being an AWS Lambda.

On the whole, it's been pretty good - it's fun to watch the process chug along with the flowchart-like UI automagically updating (I can't show you any screenshots unfortunately, but it's neat). There have been a couple of reminders however that this is a version 1.0 product, namely:

Why can't I resume where I failed before?

With our ETL process, frequently we'll detect a source data problem in the Extract or Transform stages. It would be nice if after fixing the data in place, we could go back to the failed execution and just ask it to resume at the failed step, with all of the other "state" from the execution intact.

Similarly, if we find a bug in our Extract or Transform lambdas themselves, it's super-nice to be able to monkey-patch them right there and then (remembering of course to update the source code in Git as well) - but it's only half as nice as it could be. If we could fix the broken lambda code and then re-run the execution that uncovered the bug, the cycle time would be outstanding

Why can't you remember things for me?

Possibly-related to the first point, is the disappointing discovery that Step Functions have no "memory" or "context" if you prefer, where you can stash a variable for use later in the pipeline. That is you might expect to be able to declare 3 steps like this:

    Extract Lambda
      Inputs:
        accountId
      Outputs: 
        pathToExtractedDataBucket
        
    Transform Lambda
       Inputs:
         pathToExtractedDataBucket
       Outputs:
         pathToTransformedDataBucket
         
    Load Lambda
       Inputs:
         accountId
         pathToTransformedDataBucket
       Outputs:
         isSuccessful
  
But unfortunately that simply will not work (at time of writing, November 2022). The above pipeline will fail at runtime because accountId has not been passed through the Transform lambda in order for the Load lambda to receive it!

For me, this really makes a bit of a mockery of the reusability and composability of lambdas with step functions. To fix the situation above, we have to make the Extract Lambda emit the accountId and Transform Lambda aware of and pass through accountId even though it has no interest in, or need for it!; that is:

   Extract Lambda
      Inputs:
        accountId
      Outputs: 
        accountId
        pathToExtractedDataBucket
        
    Transform Lambda
       Inputs:
         accountId
         pathToExtractedDataBucket
       Outputs:
         accountId
         pathToTransformedDataBucket
         
    Load Lambda
       Inputs:
         accountId
         pathToTransformedDataBucket
       Outputs:
         isSuccessful
  
That's really not good in my opinion, and makes for a lot of unwanted cluttering-up of otherwise reusable lambdas, dealing with arguments that they don't care about, just because some other entity needs them. Fingers crossed this will be rectified soon, as I'm sure I'm not the first person to have been very aggravated by this design.