For those of you who might not have tried it yet, Video Indexer is a cloud application and platform built upon media AI technologies to make it easier to extract insights from video and audio files. As a starting point for extracting the textual part of the insights, the solution creates a transcript based on the speech appearing in the file; this process is referred to as Speech-to-text. Today, Video Indexer’s Speech-to-text supports ten different languages. Supported languages include English, Spanish, French, German, Italian, Chinese (Simplified), Portuguese (Brazilian), Japanese, Arabic, and Russian.
However, if the content you need is not in one of the above languages, fear not! Video Indexer partners with other transcription service providers to extend its speech-to-text capabilities to many more languages. One of those partnerships is with Zoom Media, which extended the Speech-to-text to Dutch, Danish, Norwegian and Swedish.
A great example for using Video Indexer and Zoom Media is the Dutch public broadcaster AVROTROS; who uses Video Indexer to analyze videos and allow editors to search through them. Finus Tromp, Head of Interactive Media in AVROTROS shared, “We use Microsoft Video Indexer on a daily basis to supply our videos with relevant metadata. The gathered metadata gives us the opportunity to build new applications and enhance the user experience for our products.”
Below is an example for how a Dutch video transcript looks like in the Video Indexer Service:
Speech-to-text extensions in Video Indexer
To index a file in one of the extended language you will need to first use the provider’s solution to generate a VTT file (aka the result transcription file) in the required language and then send the VTT file along with the original video or audio file to Video Indexer to complete the indexing flow.
Once the file is indexed, the full set of insights, including the full transcript will be available for consumption both via the Video Indexer API and via the Video Indexer service. Video Indexer also include the ability to translate the that transcript to dozens of other languages.
We are excited to introduce an open-sourced release of an integration between Video Indexer and Zoom Media now available on GitHub.
(Special thanks to Victor Pikula, Cloud Solution Architect in Microsoft, for designing and building this integration)
The solution is built using Azure Blob storage, Azure Logic Apps and the new v2 Video Indexer REST API. Let’s take a deeper look into how the integration is built.
Below is a high-level diagram of the flow:
Given a video or audio file, the file is first dropped into a Blob Storage. The Logic App “watches” any additions to the blob and as a result sends the file to both Video Indexer and Zoom Media. Zoom Media then generates a VTT file based on the required language and passes it back to Video Indexer to complete the indexing flow. There is more information about the integration, how to get Video Indexer key and the Zoom Media key, and how to control the VTT language used available.
Having the integration built using Logic Apps makes it easy to customize and maintain it as it allows you to debug and configure it quickly and without any need of coding skills.
So, as you can see, with Video Indexer’s powerful media AI technologies, coupled with the ability integrate speech-to-text capabilities in any language and the ease of use of Azure services such as Logic Apps, everyone can get more out of their videos around the world!
Have questions or feedback? We would love to hear from you!
Use our UserVoice to help us prioritize features, or email VISupport@Microsoft.com with any questions.
Leave a Reply