Speech to Text Conversion Using Speech framework

Updated 26 October 2021



Apple introduced the Speech framework, a useful API for speech recognition. In fact, Speech Kit is the framework which Siri uses for speech recognition.

There are a handful of speech recognition frameworks available today, but they are either very expensive or simply not as good. In this tutorial, I will show you how to create a Siri-like app for speech to text using Speech Kit.

Using Speech Framework

To use the Speech framework, you have to first import it and adopt the SFSpeechRecognizerDelegate protocol. So let’s import the framework, and add its protocol to the AudioDetectionViewController class. Now your AudioDetectionViewController.swift should look like this:

User Permission and Authorization

Before using the speech framework for speech recognition, you have to first ask for user’s permission because the recognition doesn’t happen just locally on the iOS device but Apple’s servers.

All the voice data is transmitted to Apple’s backend for processing. Therefore, it is mandatory to get the user’s authorization.

Let’s authorize the speech recognizer in the viewDidLoad method. The user must allow the app to use the input audio and speech recognition. First, declare a speechRecognizer variable:

Providing the Authorization Messages

Apple requires all the authorizations to have a custom message from the app. In the case of speech authorization, we must authorize two things one is Microphone usage and another is Speech Recognition.

To customize the messages, you must supply these custom messages through the info.plistfile.

Let’s open info.plist file of the project. First, right click on info.plist. Then Open As > Source Code. Finally, copy the following XML code and insert them before the </dict> tag.

After adding permission the app asks the user to give permission like this.

When switching to another screen

In case of when the user does not stop recording and switch to another screen.

Handling Speech Recognition

Now we have to create a new function called startRecording() that handles the speech and translate into the text.

Use of Recognized Text

You can do anything with recognized text in the below method:

Thank you!!!

. . .

Leave a Comment

Your email address will not be published. Required fields are marked*

Be the first to comment.

Start a Project

    Message Sent!

    If you have more details or questions, you can reply to the received confirmation email.

    Back to Home