Before, we transcribed an Indian English video using a Speech-To-Text Indian English transcription Artificial Intelligence (AI). Today we are going to take the same idea forward, and now going to translate the video into Hindi.
To recap, we used a sample clip from Rajya Sabha TV which was a) transcribed using a speech-to-text AI, and b) cleaned up by a human. We will now clean it up a little bit more and then translate the content into Hindi. Finally, we will use a Text-To-Speech AI to speak out the Hindi content.
We will use the video we produced at the end of this post. It is posted below.
First though, a bit of theory; just like school!
The Look-Ahead and Forward Algorithms — The Theory
One of the challenges in working with AI is the Look-Ahead and Forward algorithm usage. The majority of translation AI’s use the look forward algorithm to translate text content.
From our trusty source of all knowledge (Wikipedia), Look-Ahead (backtracking) is as follows, “in backtracking algorithms, look-ahead is the generic term for a sub-procedure that attempts to foresee the effects of choosing a branching variable to evaluate one of its values. The two main aims of look-ahead are to choose a variable to evaluate next and the order of values to assign to it.”
Also, and slightly more technically, a Forward Algorithm is, “the forward algorithm, in the context of a hidden Markov model (HMM), is used to calculate a ‘belief state’: the probability of a state at a certain time, given the history of evidence. The process is also known as filtering. The forward algorithm is closely related to, but distinct from, the Viterbi algorithm.
The Look-Ahead and Forward Algorithms — What do I Need to Know?
Okay, so what does the above mean? Basically, the translation AI looks for a full stop. Once it finds the full stop, it translates the sentence.
No really — what? That’s right. You need to add grammar, primarily full stops and commas so that the text-to-text translation AI works properly.
To do this, make sure you check your transcript for grammatical correctness. Practically this means:
- Add lots of full stops everywhere! Humans are very good at understanding each other, so we don’t need to be told where one sentence ends and where another sentence starts. AI’s — not so much.
- If you watch the video embedded below, the sentences have all been converted into shorter sentences which are in third person. An AI can translate the shorter sentences way better, but take care to not lose meaning.
- It is expected that a human subject matter, expert translator will do post editing after the AI has done its first pass. However, simplifying means that you can translate into many languages quickly, and the post-editing can be reduced. Use your judgement!
- The grammar was changed as shown below. If you watch through the video below, you will notice the open captions do not match what is being said. Note that this step is being taken primarily in preparation for translation.
- After the process was complete I added an Auto-Overlay to show the full outcome. Note that the words are a little bit different, tense is changed to past tense and sentences are shorter.
- Very cool. Now we can begin the translation process. Nicely done!
Translating the Content to Hindi
- So the first thing we need to do is translate the content. Click Action -> Translate and translate this video into Hindi.
- It’ll take a few seconds and then you will see the translated file in your Root pane. It is highlighted in yellow in the below image.
- Open it up, and in the captions tab you will be able to see the translation. In the below image note that the colour of the text has been set to black so it is easier for your post editing.
- When you do your own video, please ensure post edit! Remember, use the AI for the heavy lifting, but use human subject matter experts for the best results.
- Below we use the Auto-Overlay so you can see the output. We are using a black text with a yellow highlight at 80% transparency. The little spinner indicates that the app is applying the Auto-Overlay to the video asset. This is what creates the open captions in the video.
- This is a direct AI translation so some errors are expected. Also, note that the timings are a little bit off. What is happening in the background is our code is trying to do a best fit of the sentences, but manual edits are often required. Please do this as part of the post editing process.
- Well done! Your video is now translated. But you thought it was going to speak Hindi?
AI Dubbing In Hindi — Now The Magic Starts!
This is the simple version —we can get much more sophisticated in our outcome, including voices that sound like young or older people, men or women, and changing the volume and speed of the speech.
It is also possible to break the conversation up into different blocks of speech, properly matching the English content spoken.
- Copy the captions from the Video component into the Audio component.
- Now we will use a Text-To-Speech AI to ‘speak out’ the content in Hindi. Use the Action -> Transcribe to access the Text-To-Speech AI as shown below. This will give us the AI dubbing.
AI dubbed synthetic voice
Next Steps — How To Implement Your Own AI Dubbing
There are many options you have once you AI dubbing is complete. Several clients simply prefer to download the asset and work on it in their preferred audio/video editor. However, much of the same functionality is also present in the app.
- Mute the original audio soundtrack using the ‘minus’ button as shown below.
- Generally it is recommended you slow down the AI dubbing. This is because it is an AI, there is limited scope for tone. People expect changes is tone when talking to each other. An AI is unable to do this so when there is less tone, it helps to slow down the speech to about 90%. This helps people to understand the words.
- When you insert the AI speech into the video as an Auto-Overlay, use the + button to add the AI dubbing *.mp3 file.
- Above, the highlights show a) the plus button at the bottom to add additional audio overlays, b) the original soundtrack muted and new soundtrack added, c) the new soundtrack has its speed reduced to 80% and d) start and end time can be managed on a per conversation block basis.
- The final outcome is below. This has several errors, and needs more work. To see the error, forward to 1:57 and you will hear the sound stops. This is because we have just used one mp3 Voice Overlay for simplicity to explain the workflow.
How To Think About This
- Generally, you would break up the transcript into multiple conversation blocks. This helps space out the AI speaking nicely.
- Select the appropriate gender of the AI if there is a conversation happening between a man and a woman.
- If two men or two women are speaking in a conversation, use multiple AI voices. This helps your viewers understand what is happening.
- Remember, the key win is SEO. When deploying the asset to some social media channel like LinkedIn or YouTube, make sure to translate the relevant metadata. This helps search engines index your content making it reachable in other languages.
Becoming HumanRelated posts: