We product High Quality Branded, and Legacy Voices for our clients. In the industry these are called “cloned” voices. This support document is for customers interested in our process to get started producing content with our Professional Grade – Branded AI Voices for your Movies, TV series, audio books and more.
Process for Branded Voices (cloned voices, multiple languages),
Legacy Voice (clone voice, native language of speaker):
Step 1: Secure rights to produce either a Proof of Concept (POC) or Production voice. We have a contract that is specific for each voice, for AI Voice Modeling permission rights. It clearly defines on the POC contract and the Production contract that all statements, Text-2-Speech generated is the property of the client, organization or foundation. (Or if you want it to be to the individual, just specify).
It also defines for the POC that the voice would be used only to produce a POC project. And not for any other purpose.
Step 2: Provide 30-60 minutes of studio quality voice, with the expressiveness you want in the voice model. We capture much more than “tone” (inferior systems that use less than 1 minute cannot do what we do).
We capture style, pacing, breathing, mannerisms of the speaker in our professional grade voice models. If studio quality is not available, then we would need 2 hours of content. Also note, the effort to review and finesse a high quality model has a cost associated. So the POC quality will not match the quality of the Production voice. We charge for creating the voice (setup fee, per language). And per minute of usage in production.
Step 3: Voice Approval. Both our POC and Production voices, must be approved before using in the next steps in POC creation or Production. We will submit a voice example with simple short content (1-5 minutes) for review and approval.
Step 4 (optional): A single episode (Paid POC), or clip of production content would be produced with the branded voice for final approval for use in production, or if POC, to get the approval to move to final Production contracts, pricing and voice creation.
Format: Any high fidelity audio format (WAV, M4A, MP4, AAC, OGG, FLAC). There should not be any compression that can be heard in the voice. Bitrates above 96kbs are preferred. We do most of our internal production at 320kbs for audio production. 22.05 kHz for mono. Or 44.1, 48 kHz for stereo is preferred.