Akka.Net and Microsft Orleans, a good fit for Programmable Voice?
Programmable Voice as a Service (e.g. offered by the likes of Twilio and various competitors) has been around for a long time, it allow you as a developer to make/receive phone calls and then control them programmatically via an API. Azure Communication Services (ACS) is slightly newer to the picture compared to the likes of Twilio and others.
As part of a project I’m working on at the moment, we need to make phone calls to a large cohort of people, ask them a series of questions and then process their responses. For this project we’re going to be using ACS.
I proposed that we used state machines to model the flow of the conversation. One of the things I like about state machines is that they are very expressive, so at a quick glance you can see the basic flow of what you would expect to happen, or the path through the state machine.
A long time ago now I was part of a small team that built a state machine powered system for SMS based workflows - it worked really well (and indeed is still running to this day). One of the biggest differences between SMS and Voice is the real-time nature. If it takes a few seconds to process an inbound SMS and workout what the response should be then it doesn’t matter too much, however when on a phone call response time will be noticable, therefore at scale having a highly performant system is critical.
This post however isn’t about state machines (maybe I’ll do that another day), but instead about Akka.Net and Orleans.
Overview of an ACS phone call
The flowing diagram illustrates a typical call flow orchestrated by Azure Communication Services (ACS). In this example, the call is initiated by our application making a request to ACS Create Call Connection
. However ACS can also handle receiving a call initiated by a person dialing the ACS phone number.
When a call is created we tell ACS what phone number to call, in response ACS provides a unique CallID
for this connection. ACS then rings the phone number. Upon answering, ACS notifies our application that the call is connected. The Automated Conversation
loop represents the core interaction:
- The app sends instructions to ACS (such as playing audio prompts), which ACS relays to the person.
- The person’s responses (DTMF tones, speech, etc.) are captured by ACS and sent back to the application.
This back-and-forth continues until the application signals the end of the conversation, prompting ACS to terminate the call.
Once the call gets into the Automated Conversation loop, we need to take the reins and tell ACS what needs to happen next.
Why use Actors?
Using the actor model, with frameworks like Akka.NET or Orleans, offers an elegant approach to managing the complexity of automated call flows. Actors, as independent units of logic with their own state, can be used to easily model the Automated Conversation
loop, the advantage of taking this approach is that creating multiple different call flows becomes an exercise in creating a new type of Actor.
Each call is represented by an instance of an Actor, which means we can make use of the asynchronous (message-driven) nature of these frameworks. What this means for us is scalability and responsiveness. Akka.NET and Orleans are capable of scaling to handle a huge volume of concurrent Actors, making it well-suited for real-world applications.
Getting started with ACS
Initiating a phone call
Starting a phone call with ACS is very easy
|
|
this sample uses the Azure.Communication.CallAutomation
NuGet package, and assumes that you already have configured:
- Azure Cognitive Services resource
- Azure Communication Services resource (and link it to your Cognitive Services resource)
- Bought a phone number
Responding to callbacks
Responding to callbacks is the backbone of the Automated Conversation
loop.
|
|
this is not an exhaustive list of events that might occur during a phone call.
Modelling a call flow with Akka.NET
This following Akka.NET actor, demonstrates the basic structure for managing an automated call flow:
|
|
- The actor starts when it receives a
StartCall
message, which contains the phoneNumber to dial - The actor then initiates the call using the
Dial()
and then transitions to theWaitingForConnection
state - In
WaitingForConnection
state, it waits for aCallConnected
message, which will be sent to the actor via the ACS callback - When
CallConnected
is received the actor plays a pre-defined message using PlayMessage() and transitions to thePlayingMessage
state - When the Actor receives the
PlayFinished
message it hangs up the call and transitions to theDisconnected
state to theDisconnected
state - In the
Disconnected
state the Actor stops itself, and would do any other clean up required
This example demonstrates the simplest of call flows, call someone, play a message, hang up…however with these fundamentals in place it becomes almost trivial to do something more complicated.
For a full Akka.Net example, including how to integrate Akka.Net into an Aspnet Core API, take a look at my ACS-Caller GitHub repository.
Modelling call flow with Microsoft Orleans
This following Orleans actor Grain, demonstrates the basic structure for managing an automated call flow:
|
|
For a full Orleans example, including how to integrate Orleans into an Aspnet Core API, take a look at my ACS-Caller GitHub repository.
Comparing Akka.Net and Orleans
While both the Akka.NET actor and Orleans grain examples achieve the same basic call flow (dial, play message, hang up), their implementations are different and do a good job of highlighting some of the differences between Akka.Net and Orleans:
-
State Management: The Akka.NET actor uses Become() to switch between different behavior sets (
WaitingForConnection
,PlayingMessage
, etc). This explicitly changes the actor’s message handling logic. The Orleans grain, on the other hand, uses an explicit_currentState
variable and conditional logic within each method to manage its state transitions. -
Message Passing vs. Method Calls: The Akka.NET actor communicates through messages (
CallConnected
,PlayFinished
) which are represented as c# classes. The Orleans grain uses method calls (CallConnected()
,PlayFinished()
).
There are already plenty of articles that outline the differences between Akka.Net and Orleans, so I’m not going into great details here, rather I’m pointing out some of the differences that you can see when comparing the two examples.
Personally I like the way that Akka.Net uses Become()
to express behaviour switching, to me this makes it really nice and clear in the code what the general flow of the phone call will be.
Wrap up
This was my first time taking Akka.Net and Orleans for a spin. I was impressed by both of these libraries and can see they each have their pros/cons. In terms of using this approach to controlling phone calls…given the event driven nature of programmable voice, the Actor model is well suited to handling this kind of problem.
🍪 I use Disqus for comments
Because Disqus requires cookies this site doesn't automatically load comments.
I don't mind about cookies - Show me the comments from now on (and set a cookie to remember my preference)