Make Your iOS 7 App Speak

Have you ever wanted to add text to speech capability in an iOS application? Before iOS 7, this required using a third party library; however, with iOS 7 speech synthesis is built into the platform. What’s more, adding speech synthesis only requires a few lines of code.

The class that synthesizes text to speech is the AVSpeechSynthesizer. This class works with an AVSpeechUtterance instance that encapsulates the text to synthesize. You simply pass an AVSpeechUtterance instance to the synthesizer’s SpeakUtterance method and the text is “spoken” by the iOS device.

The following example is all you need to have text to speech on iOS 7:

var speechSynthesizer = new AVSpeechSynthesizer ();
var speechUtterance =
  new AVSpeechUtterance ("Shall we play a game?");
speechSynthesizer.SpeakUtterance (speechUtterance);

The AVSpeechUtterance also includes several properties that allow you to control the audio output of the synthesized text. These include:

Rate – The speed at which the speech plays back.
Voice – An AVSpeechSynthesisVoice instance used to speak the text.
Volume – The volume level of the audio used to speak the text.
PitchMultiplier – A value between 0.5 and 2.0 to control the pitch of the spoken text.

In particular, I found the default rate speaks a bit too fast on an iPhone 5. Adjusting the rate to 1/4 the maximum rate, available via the AVSpeechUtterance.MaximumSpeechRate property (there’s also an AVSpeechUtterance.MinimumSpeechRate) produced a better sounding result.

Even better, you can supply a variety of different voices to the synthesizer, ideally based upon the locale. There’s even a helper method, AVSpeechSynthesisVoice.GetSpeechVoices, that will return all the available voices.

I thought it would be fun to revisit the FindTheMonkey app from the previous iBeacon blog post to speak the status message as the user looked for the monkey.

Doing this is incredibly easy. Simply add a Speak method to create the AVSpeechSynthesizer and AVSpeechUtterence instances respectively, and call SpeakUtterance. To spruce it up a bit more, also add a couple sliders to control the pitch multiplier and volume of the AVSpeechUtterance.

Generally, the voice should match the current device locale, but you can set this however you like. Let’s use Australian English here for example:

    void Speak (string text)
    {
        var speechSynthesizer = new AVSpeechSynthesizer ();

        var speechUtterance = new AVSpeechUtterance (text) {
            Rate = AVSpeechUtterance.MaximumSpeechRate/4,
            Voice = AVSpeechSynthesisVoice.FromLanguage ("en-AU"),
            Volume = volume,
            PitchMultiplier = pitch
        };

        speechSynthesizer.SpeakUtterance (speechUtterance);
    }

    void InitPitchAndVolume ()
    {
        volumeSlider.MinValue = 0;
        volumeSlider.MaxValue = 1.0f;
        volumeSlider.SetValue (volume, false);

        pitchSlider.MinValue = 0.5f;
        pitchSlider.MaxValue = 2.0f;
        pitchSlider.SetValue (pitch, false);

        volumeSlider.ValueChanged += (sender, e) => {
            volume = volumeSlider.Value;
        };

        pitchSlider.ValueChanged += (sender, e) => {
            pitch = volumeSlider.Value;
        };
    }

Then just call Speak when the proximity changes and voilà, FindTheMonkey now has text to speech capability!