Refactored.io
Posts
Meta’s new speech model 🎙️

Meta’s new speech model 🎙️

Plus: Singapore's Robocops

June 20, 2023

Hey folks,

Welcome back after the long weekend! What may seem like the dawn of a dystopian future, Singapore's Changi Airport now features "RoboCops" on patrol duty, with plans to station them city-wide. (link)

Singapore’s RoboCops

In today’s update:

📰 News: Meta’s new generative speech model.

🛠️ Tools: Chatbot on your PDFs, Create an AI replica of yourself.

🧠 Learn: Create a chatbot on your private data in 10 lines of code!

🎲 Misc: GPT-4 pitches are 3x more likely to secure funding!

Reading time is 4 minutes. Let’s dive in 🤿

Meta’s Speech Generation AI

Meta unveiled a new AI model, Voicebox, to generate speech from text and edit audio.

Text-To-Speech (TTS) models have been around for a while, but Voicebox is the first all-in-one generative AI model that combines various tasks like text-to-speech, translation, audio-editing in a single model.

Here’s what the AI can do -

Text-To-Speech: Not only can Voicebox generate audio from a text input, it can also match the audio style of the speech using only 2 seconds of audio sample!

Edit Audio: You could recreate a portion of speech that’s interrupted by noise or replace misspoken words in seconds. Just like an eraser for audio editing.

Easily remove noisy sections in an audio clip

Cross-Lingual style transfer: Voicebox supports 6 languages - English, French, Spanish, German, Polish and Portuguese. The AI is truly cross-lingual. You could provide a text in German, an audio sample in English, and ask the AI to produce speech in Spanish!

Performance: Voicebox outperforms the top english model in TTS, VALL-E by Microsoft, while being 20x faster!

Meta’s Voicebox vs Microsoft’s Vall-e

Why it matters: Voicebox could help creators easily edit audio tracks. For instance, it could correct the pronunciation mistakes and misspoken words or remove background noise in seconds.

Meta did not release this model for public because of potential for misuse. In the last couple of months, we’ve seen an uptick in scams impersonating a family member to extort money (link).

Now all you need is 2 seconds of audio sample to replicate a voice, which anyone could grab from your voicemail!

New Tools
Leverage AI to grow

ChatDoc: A better way to search for information within a PDF document. A Chatbot style interface let’s you run queries on the document, and the answers are supported by direct citations! This tool can extract information from text, tables and even images!

SonicLink: This tool takes audience engagement to the next level. It creates an “AI Replica” of you, allowing you to provide your subscribers with a more personalized and engaging experience. It’s smart enough to replicate your chat style, voice, expertise, and personality! (link)

ScanBoy: A fast document scanner that creates high quality PDFs with selectable text and AI generated file names. This means files are not stored with default names, allowing you to easily search for information later on. (link)

Learn

Build a chatbot on your private data in less than 10 lines of code using LangChain! (link)

Misc
From the corners of the Internet

AI is changing the way work is done in filmmaking, coding and professional services - Financial Times
Bytedance, the parent company of TikTok, ordered more than 100,000 top of the line NVIDIA GPUs, amounting to $1 billion! - Tom’s Hardware
Fast-Food restaurants have ramped up investments in AI for drive-thrus to improve profit margins and boost sales - CNN
GPT-4-generated pitches are 3x more likely to secure funding and 2x more convincing than human ones! May be it’s time to use ChatGPT for your next pitch. - Clarify Capital

How was today's newsletter?

Help us improve our newsletter with your feedback

Loved it ❤️ | Good 😊 | Okayish 😐