Code Monkey home page Code Monkey logo

swiftspeech's Introduction

SwiftSpeech

Speech Recognition Made Simple

A few lines of code to do this!

Recognize your user's voice elegantly without having to figure out authorization and audio engines.

Features

SwiftSpeech is a wrapper for Apple's Speech framework with deep SwiftUI and Combine integration.

  • UI control + speech recognition functionality in just several lines of code.
  • Customizable cancelling.
  • SwiftUI style reactive APIs and Combine support.
  • Highly customizable but also keeping your code highly reusable via a composable structure.
  • Fully open low-level APIs.

Installation

SwiftSpeech is available through Swift Package Manager. To use it, add a package dependency using URL:

https://github.com/Cay-Zhang/SwiftSpeech.git

Getting Started

1. Authorization

Although SwiftSpeech takes care of all the verbose stuff of authorization for you, you still have to state the usage descriptions and specify where you want the authorization process to happen before you start to use it.

Usage Descriptions in Info.plist

If you haven't, add these two rows in your Info.plist: NSSpeechRecognitionUsageDescription and NSMicrophoneUsageDescription.

These are the messages your users will see on their first use, in the alerts that ask them for permission to use speech recognition and to access the microphone.

Here's an exmample:

<key>NSSpeechRecognitionUsageDescription</key>
<string>This app uses speech recognition to convert your speech into text.</string>
<key>NSMicrophoneUsageDescription</key>
<string>This app uses the mircrophone to record audio for speech recognition.</string>

One line of code to tackle authorization

In your SceneDelegate.swift, add .automaticEnvironmentForSpeechRecognition() after the initialization of your root view. Boom! That's it! One line of code!

func scene(_ scene: UIScene, willConnectTo session: UISceneSession, options connectionOptions: UIScene.ConnectionOptions) {
    let contentView = ContentView()
        .automaticEnvironmentForSpeechRecognition()  // Just add this line of code!

    if let windowScene = scene as? UIWindowScene {
        let window = UIWindow(windowScene: windowScene)
        window.rootViewController = UIHostingController(rootView: contentView)
        self.window = window
        window.makeKeyAndVisible()
    }
}

For more information, please refer to the documentation for automaticEnvironmentForSpeechRecognition() in Extensions.swift.

2. Try some demos

You can now start to try out some light-weight demos bundled with the framework using Xcode 11's new preview feature. In any of your previews, initialize one of the demo views:

static var previews: some View {

    // Two of the demo views below can take a `localeIdentifier: String` as an argument.
    // Example locale identifiers:
    // 简体中文(**)= "zh_Hans_CN"
    // English (US) = "en_US"
    // 日本語(日本)= "ja_JP"
    
    // Try one of these at a time and have fun!
    SwiftSpeech.Demos.Basic(localeIdentifier: yourLocaleString)
    SwiftSpeech.Demos.Colors()
    SwiftSpeech.Demos.List(localeIdentifier: yourLocaleString)
    
}

Open up the Canvas and resume the preview if needed. You should see what your demo looks like. Then, click on Preview on Device button to run the demo on your device. Hold on the blue circular button to speak and the recognition result will show up! 😉

Here are the "previews" of your previews:

Demos

3. Build it yourself

Knowing what this framework can do, you can now start to learn about the concepts in SwiftSpeech.

Inspect the source code of SwiftSpeech.Demos.Basic. The only new thing here is this:

SwiftSpeech.RecordButton()                                        // 1. The View Component
    .swiftSpeechRecordOnHold(locale:animation:distanceToCancel:)  // 2. The Functional Component
    .onRecognize(update: $text)                                   // 3. SwiftSpeech Modifier(s)

There are three parts here (and luckily, you can customize every one of them!):

  1. The View Component: A View that is only responsible for UI.
  2. The Functional Component: A component that handles user interaction and provides the essential functionality of speech recognition. In the built-in one here, the first two arguments let you specify a locale (language) for recognition and an animation used when the user interacts with the View Component. The third argument sets the distance the user has to swipe up in order to cancel the recording. The framework also provides another Functional Component: .swiftSpeechToggleRecordingOnTap(locale:animation:).
  3. SwiftSpeech Modifier(s): One or more components allowing you to receive and manipulate the recognition results. They can be stacked together to create powerful effects.

For now, you can just use the built-in View Component and Functional Component. Let's explore some SwiftSpeech Modifiers first since every app handles its data differently:

Important: Chaining multiple or identical SwiftSpeech Modifiers together doesn't override any behavior. All actions of the modifiers will be executed in the order where the closest to the Functional Component executes first and the farthest executes last.

// 1
// `SwiftSpeech.Demos.Basic` & `SwiftSpeech.Demos.Colors` use these modifiers.
// Inspect the source code of them if you want examples!
.onRecognize(textHandler: (String) -> Void)
.onRecognize(update: Binding<String>)
.printRecognizedText()

The first kind of modifiers is the most straight forward and convenient. It does something when a new recognition result is yielded.

But frankly, this is more of a shortcut for playing/testing since many apps have to deal with some complicated underlying database and a simple Binding or closure is just not enough for it. And that's when the second set comes to rescue.

// 2
// `SwiftSpeech.Demos.List` uses these modifiers.
// Inspect the source code of it if you want examples!
.onStartRecording(appendAction: (SwiftSpeech.Session) -> Void)
.onStopRecording(appendAction: (SwiftSpeech.Session) -> Void)
.onCancelRecording(appendAction: (SwiftSpeech.Session) -> Void)

The second kind gives you utter control over the whole lifespan of a SwiftSpeech.Session. It runs the provided closures after a recording was started/stopped/cancelled. Inside the closures, you will have access to the corresponding SwiftSpeech.Session, which is discussed below.

// 3
// `SwiftSpeech.ViewModifiers.OnRecognize` uses these modifiers.
// Inspect the source code of it if you want examples!
.onStartRecording(sendSessionTo: Subject)
.onStopRecording(sendSessionTo: Subject)
.onCancelRecording(sendSessionTo: Subject)

The third kind might be useful if you prefer a reactive programming style. The only new argument here is a Combine.Subject (e.g. CurrentValueSubject and PassthroughSubject) and the modifier will send the corresponding SwiftSpeech.Session to the Subject after a recording is started/stopped/cancelled.

SwiftSpeech.Session

Inside SwiftSpeech's Session Handler

If you are filling in a (Session) -> Void handler provided by the framework, use the publishers provided by the Session to receive updates on recognition results.

Currently, a Session has two publishers (you only need to subscribe to one of them): stringPublisher and resultPublisher.

stringPublisher directly emits the speech text recognized (By default, it will emit partial results, which means you may receive multiple events). You will receive a .finished completion event when the Session finishes processing the user's voice (i.e. sfSpeechRecognitionResult.isFinal == true), or you explicitly called the cancelRecording() method.

You can subscribe to stringPublisher in the following way:

speechRecognizer.stringPublisher
    .sink { text in
        print("[SwiftSpeech]: \(text)")
    }
    .store(in: &someCancelBag)

For resultPublisher, the subscribing process is similar, except that the type of the element it will emit is Result<SFSpeechRecognitionResult, Error> which encapsulates the entire partial result from the underlying SFSpeechRecognizer or the error it emits during recognition.

Independent Use

Here's an exmaple of using Session to recognize user's voice and receive updates.

let session = SwiftSpeech.Session(locale: .current)
try session.startRecording()
session.stringPublisher?
    .sink { text in
        // do something with the text
    }
    .store(in: &cancelBag)

For more, please refer to the documentation of SwiftSpeech.Session.

Customized View Components

A View Component is a dedicated View for design. It does not react to user interaction directly, but instead reacts to its environments, allowing developers to only focus on the view design and making the view more composable. User interactions are handled by the Functional Component.

Inspect the source code of SwiftSpeech.RecordButton (again, it's not a Button since it doesn't respond to user interaction). You will notice that it doesn't own any state or apply any gestures. It only responds to the two environments below.

@Environment(\.swiftSpeechState) var state: SwiftSpeech.State
@Environment(\.isSpeechRecognitionAvailable) var isSpeechRecognitionAvailable: Bool

Both are pretty self-explanatory: the first one represents its current state of recording, and the second one indicates the availability of speech recognition.

Here are more details of SwiftSpeech.State:

enum SwiftSpeech.State {
    /// Indicating there is no recording in progress.
    /// - Note: It's the default value for `@Environment(\.swiftSpeechState)`.
    case pending
    /// Indicating there is a recording in progress and the user does not intend to cancel it.
    case recording
    /// Indicating there is a recording in progress and the user intends to cancel it.
    case cancelling
}

Combined with a Functional Component and some SwiftSpeech Modifiers, hopefully, you can build your own fancy record systems now!

Customized Functional Components

🚧 Documentation still in making... Give me a star to keep me motivated! For now, please refer to the source code of the demos provided by the framework for example.

License

SwiftSpeech is available under the MIT license.

swiftspeech's People

Contributors

cay-zhang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.