Akka.Net and Microsft Orleans, a good fit for Programmable Voice?

Ken Ross included in development architecture software design

2025-01-12 1879 words 9 minutes

Contents

Programmable Voice as a Service (e.g. offered by the likes of Twilio and various competitors) has been around for a long time, it allow you as a developer to make/receive phone calls and then control them programmatically via an API. Azure Communication Services (ACS) is slightly newer to the picture compared to the likes of Twilio and others.

As part of a project I’m working on at the moment, we need to make phone calls to a large cohort of people, ask them a series of questions and then process their responses. For this project we’re going to be using ACS.

I proposed that we used state machines to model the flow of the conversation. One of the things I like about state machines is that they are very expressive, so at a quick glance you can see the basic flow of what you would expect to happen, or the path through the state machine.

A long time ago now I was part of a small team that built a state machine powered system for SMS based workflows - it worked really well (and indeed is still running to this day). One of the biggest differences between SMS and Voice is the real-time nature. If it takes a few seconds to process an inbound SMS and workout what the response should be then it doesn’t matter too much, however when on a phone call response time will be noticable, therefore at scale having a highly performant system is critical.

This post however isn’t about state machines (maybe I’ll do that another day), but instead about Akka.Net and Orleans.

Overview of an ACS phone call

The flowing diagram illustrates a typical call flow orchestrated by Azure Communication Services (ACS). In this example, the call is initiated by our application making a request to ACS Create Call Connection. However ACS can also handle receiving a call initiated by a person dialing the ACS phone number.

When a call is created we tell ACS what phone number to call, in response ACS provides a unique CallID for this connection. ACS then rings the phone number. Upon answering, ACS notifies our application that the call is connected. The Automated Conversation loop represents the core interaction:

The app sends instructions to ACS (such as playing audio prompts), which ACS relays to the person.
The person’s responses (DTMF tones, speech, etc.) are captured by ACS and sent back to the application.

This back-and-forth continues until the application signals the end of the conversation, prompting ACS to terminate the call.

Once the call gets into the Automated Conversation loop, we need to take the reins and tell ACS what needs to happen next.

Why use Actors?

Using the actor model, with frameworks like Akka.NET or Orleans, offers an elegant approach to managing the complexity of automated call flows. Actors, as independent units of logic with their own state, can be used to easily model the Automated Conversation loop, the advantage of taking this approach is that creating multiple different call flows becomes an exercise in creating a new type of Actor.

Each call is represented by an instance of an Actor, which means we can make use of the asynchronous (message-driven) nature of these frameworks. What this means for us is scalability and responsiveness. Akka.NET and Orleans are capable of scaling to handle a huge volume of concurrent Actors, making it well-suited for real-world applications.

Getting started with ACS

Initiating a phone call

Starting a phone call with ACS is very easy

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


var acsPhoneNumber = "+441....";
var phoneNumber = "+447........"; // phone number must be in E.164 standard format
var cognitiveServicesEndpoint = "https://...";

PhoneNumberIdentifier caller = new PhoneNumberIdentifier(acsPhoneNumber);
PhoneNumberIdentifier target = new PhoneNumberIdentifier(phoneNumber);

CallInvite callInvite = new CallInvite(target, caller);
var createCallOptions = new CreateCallOptions(callInvite, _callConfiguration.CallbackUri)
{
    CallIntelligenceOptions = new CallIntelligenceOptions() { CognitiveServicesEndpoint = new Uri(cognitiveServicesEndpoint) }
};

CreateCallResult createCallResult = _callAutomationClient.CreateCall(createCallOptions);

this sample uses the Azure.Communication.CallAutomation NuGet package, and assumes that you already have configured:

Azure Cognitive Services resource
Azure Communication Services resource (and link it to your Cognitive Services resource)
Bought a phone number

Responding to callbacks

Responding to callbacks is the backbone of the Automated Conversation loop.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


app.MapPost("/api/callback", (CloudEvent[] cloudEvents) =>
{
    foreach (var cloudEvent in cloudEvents)
    {
        var evnt = CallAutomationEventParser.Parse(cloudEvent);
        
        switch (evnt)
        {
            case CallConnected callConnected:
                // Handle CallConnected event
                break;
            case CallDisconnected callDisconnected:
                // Handle CallDisconnected event
                break;
            case RecognizeCompleted recognizeCompleted:
                // Handle RecognizeCompleted event
                break;
            case PlayCompleted playCompleted:
                // Handle PlayCompleted event
                break;
            case RecognizeFailed failed:
                // Handle RecognizeFailed event
                break;
            default:
                // Handle other event types or unhandled events
                break;  // or throw an exception if unexpected
        }
    }
    return Results.Ok();
});

this is not an exhaustive list of events that might occur during a phone call.

Modelling a call flow with Akka.NET

This following Akka.NET actor, demonstrates the basic structure for managing an automated call flow:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79


public class SimpleCallActor : ReceiveActor
{
    public class StartCall { public string PhoneNumber { get; set; } }
    public class CallConnected { }
    public class PlayFinished { }
    public class CallDisconnected { }

    private readonly CallAutomationClient _callAutomationClient;
    private CallConnection _callConnection;
    private readonly string _messageToPlay;


    public SimpleCallActor(CallAutomationClient callAutomationClient, string messageToPlay)
    {
        _callAutomationClient = callAutomationClient;
        _messageToPlay = messageToPlay;

        Receive<StartCall>(msg =>
        {
            Dial(msg.PhoneNumber);
            Become(WaitingForConnection);
        });
    }

    private void WaitingForConnection()
    {
        Receive<CallConnected>(_ =>
        {
            PlayMessage(_messageToPlay);
            Become(PlayingMessage);
        });

        Receive<CallDisconnected>(_ =>
        {
             // Handle unexpected disconnect
            Context.Stop(Self); // Stop the actor 
        });
    }

    private void PlayingMessage()
    {
        Receive<PlayFinished>(_ =>
        {
            Hangup();
            Become(Disconnected);
        });

        Receive<CallDisconnected>(_ =>
        {
             // Handle disconnect while playing. Might not need specific logic.
            Context.Stop(Self); // Stop the actor
        });

    }

    private void Disconnected()
    {
        // Optional: Log, cleanup, etc.
        Context.Stop(Self); // Stop the actor when disconnected
    }

    private void Hangup()
    {
        _callConnection.HangUp(true);
    }

    private void PlayMessage(string message)
    {
        var playSource = new TextSource(message);
        var options = new PlayToAllOptions(playSource);
        _callConnection.GetCallMedia().PlayToAll(options);
    }

    private void Dial(string phoneNumber) 
    {
        var createCallOptions = // ...
        _callAutomationClient.CreateCall(createCallOptions);
    }
}

The actor starts when it receives a StartCall message, which contains the phoneNumber to dial
The actor then initiates the call using the Dial() and then transitions to the WaitingForConnection state
In WaitingForConnection state, it waits for a CallConnected message, which will be sent to the actor via the ACS callback
When CallConnected is received the actor plays a pre-defined message using PlayMessage() and transitions to the PlayingMessage state
When the Actor receives the PlayFinished message it hangs up the call and transitions to the Disconnected state to the Disconnected state
In the Disconnected state the Actor stops itself, and would do any other clean up required

This example demonstrates the simplest of call flows, call someone, play a message, hang up…however with these fundamentals in place it becomes almost trivial to do something more complicated.

For a full Akka.Net example, including how to integrate Akka.Net into an Aspnet Core API, take a look at my ACS-Caller GitHub repository.

Modelling call flow with Microsoft Orleans

This following Orleans ~~actor~~ Grain, demonstrates the basic structure for managing an automated call flow:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87


public interface ISimpleCallGrain : IGrainWithGuidKey
{
    Task StartCall(string phoneNumber);
    Task CallConnected();
    Task PlayFinished();
    Task CallDisconnected();
}

public class SimpleCallGrain : Grain, ISimpleCallGrain
{
    private readonly ILogger _logger;
    private readonly CallAutomationClient _callAutomationClient; // Inject this
    private CallConnection _callConnection;
    private readonly string _messageToPlay;  // Inject this or set in constructor
    private enum CallState { Initial, WaitingForConnection, PlayingMessage, Disconnected }
    private CallState _currentState = CallState.Initial;

    public SimpleCallGrain(ILogger<SimpleCallGrain> logger, CallAutomationClient callAutomationClient, string messageToPlay)
    {
        _logger = logger;
        _callAutomationClient = callAutomationClient;
        _messageToPlay = messageToPlay;
    }

    public async Task StartCall(string phoneNumber)
    {
        if (_currentState != CallState.Initial)
        {
            _logger.LogWarning("StartCall called in invalid state: {State}", _currentState);
            return;
        }

        await Dial(phoneNumber);
        _currentState = CallState.WaitingForConnection;
    }

    public Task CallConnected()
    {
        if (_currentState != CallState.WaitingForConnection)
        {
             _logger.LogWarning("CallConnected called in invalid state: {State}", _currentState);
             return Task.CompletedTask; // Or throw...
        }

        _currentState = CallState.PlayingMessage;
        PlayMessage(_messageToPlay);

        return Task.CompletedTask;
    }

    public Task PlayFinished()
    {
        if (_currentState != CallState.PlayingMessage)
        {
             _logger.LogWarning("PlayFinished called in invalid state: {State}", _currentState);
             return Task.CompletedTask;
        }

        _currentState = CallState.Disconnected;
        Hangup();   

        return Task.CompletedTask;
    }

    public Task CallDisconnected()
    {
        _logger.LogInformation("Call Disconnected in state: {State}", _currentState);
        _currentState = CallState.Disconnected; // Regardless of previous state, we're now disconnected

        return Task.CompletedTask;
    }

    private async Task Dial(string phoneNumber)
    {
       // ... (Your ACS Dial logic, very similar to the Akka.NET actor example)
    }

    private void PlayMessage(string message)
    {
        // ... (Your PlayMessage logic - make sure it handles any async operations internally, e.g., using await)
    }

    private void Hangup()
    {
        // ... (Your Hangup logic)
    }
}

For a full Orleans example, including how to integrate Orleans into an Aspnet Core API, take a look at my ACS-Caller GitHub repository.

Comparing Akka.Net and Orleans

While both the Akka.NET actor and Orleans grain examples achieve the same basic call flow (dial, play message, hang up), their implementations are different and do a good job of highlighting some of the differences between Akka.Net and Orleans:

State Management: The Akka.NET actor uses Become() to switch between different behavior sets (WaitingForConnection, PlayingMessage, etc). This explicitly changes the actor’s message handling logic. The Orleans grain, on the other hand, uses an explicit _currentState variable and conditional logic within each method to manage its state transitions.
Message Passing vs. Method Calls: The Akka.NET actor communicates through messages (CallConnected, PlayFinished) which are represented as c# classes. The Orleans grain uses method calls (CallConnected(), PlayFinished()).

There are already plenty of articles that outline the differences between Akka.Net and Orleans, so I’m not going into great details here, rather I’m pointing out some of the differences that you can see when comparing the two examples.

Personally I like the way that Akka.Net uses Become() to express behaviour switching, to me this makes it really nice and clear in the code what the general flow of the phone call will be.

Wrap up

This was my first time taking Akka.Net and Orleans for a spin. I was impressed by both of these libraries and can see they each have their pros/cons. In terms of using this approach to controlling phone calls…given the event driven nature of programmable voice, the Actor model is well suited to handling this kind of problem.

🍪 I use Disqus for comments

Because Disqus requires cookies this site doesn't automatically load comments.

I don't mind about cookies - Show me the comments from now on (and set a cookie to remember my preference)