A thumbnail for AuthorVoice Pro Creation

Tips to creating great voice samples for AuthorVoice Pro training

June 05, 2024•5 min read

Getting the highest quality AuthorVoice Pro available.

AuthorVoice Pro allows you to train a hyper-realistic model of your voice. This is achieved by training a dedicated model on a large set of voice data to produce a model that’s indistinguishable from the original voice.

Voice Creation

Before you upload your samples, there are a few important points to consider and steps to follow to ensure the best results.

AuthorVoice Pro captures every nuance of the samples you provide. It will accurately replicate all the characteristics of your voice, including any unwanted sounds present in the recording. If your samples have background noise, reverb, echo, or other disturbances, the our system will incorporate these elements into the clone.

Here are some key tips to achieve optimal results:

Single Voice Only: Ensure only one voice is speaking in the audio. Multiple voices or excessive noise can confuse the AI, leading to a subpar clone.
Adequate Recording Time: While 15 minutes of audio is the minimum required, we recommend closer to 1 hours for the best quality and accuracy. Delivered in 5-7 minute long audio files.
Consistent Speaking Style: Match the speaking style in your samples to the intended use of the cloned voice. For example, use an audiobook narration style for your audiobooks.
Quality of Recording: Avoid background noise, reverb, echo, music, or other unwanted sounds in your samples to prevent them from being replicated in the clone.

By following these guidelines, you can ensure a high-quality voice clone that accurately reflects your voice without any unwanted artefacts.

Examples of Good & Bad voice recording

Good delivery:

Bad Delivery:

Keep in mind that all of this depends on the output you want. The AI will try to clone everything in the audio, but for the AI to work optimally and predictably, we suggest following the guidelines.

Your Recording Equipment

For the best results, use high-quality recording equipment, as the AI will replicate every detail of the audio. High-quality input ensures high-quality output. While any microphone can work, we recommend using an XLR mic connected to a dedicated audio interface. Budget-friendly options include the Audio Technica AT2020 or the Rode NT1, paired with a Focusrite interface or a similar device. One of our author's has great results with the Blue Yeti USB Microphone.

(Click on any of the equipment below to see it on Amazon.com).

Rode NT1

Audio Technica AT202

Blue Yeti USB Microphone

Focusrite Vocaster One

Recording tips

Use a Pop-Filter: Use a Pop-Filter when recording. This will minimize plosives when recording.

Microphone Distance: Position yourself at the right distance from the microphone - approximately two fists away from the mic is recommended, but it also depends on what type of recording you want.

Noise-Free Recording: Ensure that the audio input doesn’t have any interference, like background music or noise. The AI cloning works best with clean, uncluttered audio.

Room Acoustics: Preferably, record in an acoustically-treated room. This reduces unwanted echoes and background noises, leading to clearer audio input for the AI. You can make something temporary using a thick duvet or quilt to dampen the recording space.

Audio Pre-processing: Consider editing your audio beforehand if you’re aiming for a specific sound you want the AI to output. For instance, if you want a polished podcast-like output, pre-process your audio to match that quality, or if you have long pauses or many “uhm”s and “ahm”s between words as the AI will mimic those as well.

Volume Control: Maintain a consistent volume that’s loud enough to be clear but not so loud that it causes distortion. The goal is to achieve a balanced and steady audio level. The ideal would be between -23dB and -18dB RMS with a true peak of -3dB.

Sufficient Audio Length: Provide at least 5-7 minutes of high-quality audio per file, that follows the above guidelines for best results - preferably closer to 15 minutes of audio. The more quality data you can feed into the AI, the better the voice clone will be. The number of samples is irrelevant; the total runtime is what matters. However, if you plan to upload multiple hours of audio, it is better to split it into multiple 5-7-minute samples. This makes it easier to upload.

Once you’ve uploaded your samples, we will guide you through each step.

Voice & Audio Quality Test: We'll test two 5 minute audio files of you reading from your book. When this is approved we can start training your voice (AuthorVoice Pro), to narrate your book.

Pro Voice Training: This means that your voice recording has been verified by our audio technician and your MyVoice Clone is in training. (3-5 Days)

Review: We'll create a 3-minute sample of your MyVoice Clone narrating your audiobook.
Once you approve this version, we'll get your MyVoice Clone to narrate your entire audiobook for you.

Scripts

What you read is not as important as how you read it. The AI will try to mimic everything it hears in a voice: the tonal quality, the accent, the inflection, and many other intricate details. It will replicate how you pronounce certain words, vowels, and consonants, but not the actual words themselves. So, it is better to choose a text or script that conveys the emotion you want to capture, and read in a tone of voice you want to use.

An important note: We always prefer you to read your own book’s content. Since you’re talking about something you know well, believing every word - making it uniquely yours.

Authors Audiobooks

Craig Murley

Craig is a brand consultant, creative director, writer & artist who has won 33 creative awards for ads, experimental films, photography & music.

Back to Blog