Voice AI company Speechify has introduced a new native Windows app that leverages locally stored models to facilitate dictation across various applications and read aloud articles, documents, or PDFs using its array of voices. The app operates on Copilot+ PCs with NPUs from AMD, Intel, and Qualcomm, as well as other Windows 11 PCs with GPUs from Intel and AMD. It incorporates three on-device models: neural text-to-speech, real-time voice activity detection, and Whisper-powered transcription. Users have the flexibility to configure the app to switch to cloud-based models or modify them while in use. With over 50 million users, Speechify’s VITS Neural can produce audio at seven different speed presets, enabling users to have content read aloud. The company utilizes the Silero open-source model for voice activity detection.
The company’s CEO, Cliff Weitzman, highlighted the significance of the Windows launch, emphasizing the importance of ensuring reading and writing are accessible to users across different devices and work preferences. The recent introduction of Granola-like meeting transcription, previously limited to browser-based meetings, is expected to expand to native apps across platforms, enhancing the transcription capabilities for meetings on any application or browser.
Source: TechCrunch