Voice Activating your Windows 10 games using Speech Synthesis, Voice Recognition and Cortana

This blog post is all about using the Windows 10 APIs for integrating Speech Synthesis, Voice Recognition and Cortana with your Unity 5.2 games.

There are a lot of different ways of doing this, but I decided to implement it in this way to keep your focus on the important things that is happening. If you don’t want to know any of this, feel free to download the sample project and try it for yourself. If you wish to add this to your own game, you will need to know this and follow the steps given. You will also most likely use this in a very customized way, something that will be very simple to do once you understand the basics.

We are implementing a fair bit of features here, so to help you get an overview, we are focusing on 4 components today.

1) We got the code that needs to be executed inside of Unity. This code controls everything, and enables you to decide how to voice activate your game world.

2) Then we got one solution that needs to be added to the exported Windows 10 UWA solution that implements the interaction between your Unity game and the Windows 10 APIs. Currently, this takes a few questions with associated answers, feeds it to the Speech and Voice APIs, sets up a listening session and so on.

3) The other solution is the logic that enables you to integrate Cortana with your game. This got nothing to do with the in-game experience itself, but enables Cortana to launch your game, as well as write custom logic using an App Service (a service that runs as a background task in your app).

4) Then we got the logic we need to add to our exported game itself to bind everything together.

This video explains the basics of what’s going on with the technical parts of the plugin.

For an in-depth session about Speech Synthesis, Voice Recognition and Cortana Integration, I recommend checking out this session from BUILD 2015:
https://channel9.msdn.com/events/Build/2015/3-716

Using the plugin in Unity

To use the plugin, you must add the VoiceBot script to the gameobject. You can of course modify how it interacts with your own game logic. This is just an example. Also, the Windows10Interop class needs to be in the project solution, as this is the logic that will communicate with the plugin itself.

Using the example VoiceBot-component

The component is simple. It needs to target a panel that got the dialogue Text in it, as well as the Text itself. These are used to hide or show the questions you can ask, depending on how far away you are from the bot. The Text item itself is used to render the possible questions.

The Windows10Interop class got two functions, one to request speech, another to stop the listening session.

The VoiceBot is communicating with a plugin on the final exported project. So once you got your game running, you will need to export and set up this integration.

Setting up the Windows 10 solution

This will look like a lot of steps but I’m covering everything in details with a lot of screenshots, it usually takes about 15 minutes max.

We got two different components, one is the in-game voice and speech handling, and the other one is integrating your game with Cortana (so you can talk and interact with your app from Cortana on the Widows 10 OS level).

The first thing you need is to add Reference to the BotWorldVoiceCommandService and the VoiceSpeech plugin projects by either referencing the build DLL, or by adding the projects to the solution. The latter is best as you probably will need to customize the code or change it based on the needs of your game. To do this, right-click the solution and add an existing project to it.

Navigate to the EXPORT folder to find the project (or anywhere where you downloaded the source), and add it.

The next thing we need to to is to add a reference to the project from the CortanaWorld project (Our exported solution):

Navigate to Projects and it will automatically show:

We need to do the same for the Speech Plugin-project as well (add solution and reference to it):

Then we need to register the added VoiceSpeech class as an App Service from the Package.appexmanifest, Double click this to open the settings, and click the Declarations tab.

Add an App Service:

Enter the following information:

This lets our app know where and how to find the App Service. It will run in the background of our app, aiding our interaction with Cortana.

Voice and Speech

Now we are ready to interact with the voice recognition and speech synthesis APIs of Windows 10. First, we need to add one more thing to our app, and this is an invisible component that will play the generated voice synth.

Go to MainPage.xaml and open it in design view.

Add this line below the Grid:
<MediaElement x:Name=”Media”></MediaElement>

Next we need to connect our EventListeners from our Windows10Interop class in the Unity-logic to the right functions in the plugin, as well as passing the Media element we just added. This is basically how we interact with the plugin between Unity and Windows 10.
This is done by adding the following three lines of code to the MainPage.xaml.cs file, in the OnNavigatedTo function:

Plugin.Windows10.VoiceSpeech.Media = Media;
Windows10Interop.SpeechRequested += Plugin.Windows10.VoiceSpeech.StartListening;
Windows10Interop.StopSpeechRequested += Plugin.Windows10.VoiceSpeech.StopListening;

The last thing we need to do is to add the Microphone and internet capability to our project. Open the package.appxmanifest;

Click on capabilities and check the microphone and the .

This allows us to use these capabilities in the app.

Cortana

To let Cortana know about your app, and learn how to interact with you, we will need to add a command file that contains all for the interactions we wish to implement. This is happening inside a VCD file – simply an XML type file that contains all of the commands you want to integrate with.

You can add this by creating a new XML file in your project, and add the following content:

<?xml version=”1.0″ encoding=”utf-8″?>
<VoiceCommands xmlns=”http://schemas.microsoft.com/voicecommands/1.2″>
<CommandSet xml:lang=”en-us” Name=”CommandSet_en-us”>
<AppName> Bot World </AppName>
<Example> Bot World, I want to play </Example>

<Command Name=”checkScore”>
<Example> Bot World, Did anyone beat me? </Example>
<ListenFor RequireAppName=”BeforeOrAfterPhrase”> Did anyone beat me </ListenFor>
<Feedback> Yes.</Feedback>
<VoiceCommandService Target=”BotWorldVoiceCommandService”></VoiceCommandService>
</Command>

<Command Name=”startPlay”>
<Example> Bot World, I want to play </Example>
<ListenFor RequireAppName=”BeforeOrAfterPhrase”> I want to play </ListenFor>
<Feedback> Get ready! </Feedback>
<Navigate/>
</Command>
</CommandSet>

</VoiceCommands>

Next, you will need to add the code that will execute if you launch the app with a voice command. This is being done in the App.xaml.cs file, in the OnActivated function:

case ActivationKind.VoiceCommand:
    var commandArgs = args as VoiceCommandActivatedEventArgs;
    SpeechRecognitionResult speechRecognitionResult = commandArgs.Result;
    string voiceCommandName = speechRecognitionResult.RulePath[0];

    switch (voiceCommandName)
    {
        case "startPlay":
            {
                break;
            }
        case "checkScore":
            if (speechRecognitionResult.SemanticInterpretation.Properties.ContainsKey("message"))
            {
                string message = speechRecognitionResult.SemanticInterpretation.Properties["message"][0];
            }
            break;
    }
    break;

It will look like this:

This function is checking how the application was activated. If it was by Voice, it will get the voice command that activated the app, and let you write custom logic based on what command it was.

We also need to register the VCD file, still in the App.xaml.cs file, add this code to the OnLaunched function. This simply takes all the commands and installs it to Cortana. It will be removed if you uninstall the app.

try
{
    var storageFile =
    await Windows.Storage.StorageFile
    .GetFileFromApplicationUriAsync(new Uri("ms-appx:///vcd.xml"));

    await Windows.ApplicationModel.VoiceCommands.VoiceCommandDefinitionManager
        .InstallCommandDefinitionsFromStorageFileAsync(storageFile);

    Debug.WriteLine("VCD installed");
}
catch
{
    Debug.WriteLine("VCD installation failed");
}

It will look like this:

This should be all, now you can try to run your game, ask the sample bot a question from the given list, and interact with it using Cortana.

Download source here:

https://1drv.ms/f/s!AnvjKuzpB3ArlsgVXTwpx52CD5CK-w

6 Responses to Voice Activating your Windows 10 games using Speech Synthesis, Voice Recognition and Cortana

Pingback: MVP Lander: Source code from my MVP Summit session | digitalerr0r
Riftup in VR/MR@JP (@WheetTweet) says:

June 14, 2016 at 9:41 pm

Thanks for your good tutorial! I’ll try cortana with Unity, but unfortunately download link is not available. I’ll appreciate if you could restore the link. Thanks a lot!! @WheetTweet

- digitalerr0r says:
  
  June 14, 2016 at 11:03 pm
  
  Thanks for your feedback! 🙂 Try if this works:
  https://1drv.ms/f/s!AnvjKuzpB3ArlsgVXTwpx52CD5CK-w
  
  - Riftup in VR/MR@JP (@WheetTweet) says:
    
    June 15, 2016 at 3:59 pm
    
    Thanks for quick reply! I try to study the combination with Unity and Cortana!
Johnie says:

June 15, 2016 at 2:49 pm

I always spent my half an hour to read this blog’s articles daily along with a mug of coffee.

nothingbutmusic123 says:

July 2, 2018 at 3:13 am

Hi, thanks for sharing.I tried running your starter code, and everything works except that when I ask what is your name, I receive no response. Also, is it possible to integrate Cortana into the app such that I can directly ask the “eye” any question, and he responses?Thanks!

Voice Activating your Windows 10 games using Speech Synthesis, Voice Recognition and Cortana