SwiftUI: .accessibilityLabel powered by SSML
It all began with the question, "How can SwiftUI make VoiceOver speak multiple languages?" Then I went down a rabbit hole to learn more about the relationship between attributed strings and voice synthesis. This worked with some caveats, but it provided an answer that solved the question. Or so I thought. After this interaction, I couldn't stop thinking about the markup language I had to use once while working on a voice assistance startup, and I wondered if it could be applied for accessibility in SwiftUI now?
SSML
SSML is a 20-year-old standard that allows control over aspects of speech such as pronunciation, volume, pitch, and pace across most synthesis-capable platforms.
Speech Synthesis Markup Language Specification (SSML 1.0), introduced in September 2004, is one of the standards enabling access to the Web using spoken interaction.
The complete specification can be found at W3C, as well as some useful examples in Google's Cloud Text-to-Speech documentation:
Basic syntax
<speak>
my SSML content
</speak>
Adds emphasis to an announcement
<emphasis level="moderate">This is an important announcement</emphasis>
Read numbers as cardinals
<speak>
<say-as interpret-as="cardinal">12345</say-as>
</speak>
Simplified pronunciation of a difficult-to-read word
<sub alias="にっぽんばし">日本橋</sub>
AVSpeechUtterance
For our case, we only need to know that at its core, it is an XML, which can be used in one of the AVSpeechUtterance string constructors.
ViewModifier
SwiftUI's Accessibility modifiers are simply ViewModifiers, which I hadn't thought about before. Accepting this nature allows us to overload accessibility labels, detect particular states, and run conditional logic. So, in practice, by introducing custom modifiers and utilizing their capabilities, it is now feasible to create one that naively contains a voice synthesizer using an SSML constructed utterance that plays when focused:
import AVFoundation
import SwiftUI
extension View {
public func accessibilityLabel(_ ssml: SSML) -> some View {
modifier(AccessibilitySSMLLabel(ssml: ssml))
}
}
struct AccessibilitySSMLLabel: ViewModifier {
@AccessibilityFocusState var isFocused: Bool
let ssml: SSML
let synthesizer = AVSpeechSynthesizer()
func body(content: Content) -> some View {
content
.accessibilityElement()
.accessibilityFocused($isFocused)
.onChange(of: isFocused) { _, newValue in
if newValue,
let utterance = AVSpeechUtterance(ssmlRepresentation: ssml.rawValue)
{
synthesizer.speak(utterance)
} else {
synthesizer.stopSpeaking(at: .word)
}
}
}
}
public struct SSML {
public let rawValue: String
init(_ representation: String) {
self.rawValue = representation
}
}
That is it—this serves as a proof of concept. The only question left to address is what degree of abstraction is suitable and whether the modifier varieties. For example, creating one for SSML, as shown below, and another with pitch, voice, and other properties? Or should we expose a single modifier for passing AVSpeechUtterance directly?
No matter which path we take, this demonstrates how adaptable SwiftUI is and how we can use it to create solutions that hide complexity (like all of these SSML aspects) while making the call site as simple as:
Text("Hello, world!")
.accessibilityLabel(
SSML(
"""
<speak>
<prosody pitch="high">
<lang xml:lang="fr-FR">Bonjour!</lang>
</prosody>
After one second, I'm going to speak more slowly.
<break time="1s"/>
<prosody rate="x-slow">
Slow speech using <say-as interpret-as="verbatim">SSML</say-as>...
</prosody>
</speak>
"""
)
)