Microsoft has introduced a new text-to-speech feature with vision capabilities, that enables users to create talking avatar videos with text input, and to build real-time interactive bots trained using human images.
Called Azure AI Speech text and available in public preview, it allows customers to create synthetic videos of a 2D photorealistic avatar speaking.
«The Neural text to speech Avatar models are trained by deep neural networks based on the human video recording samples, and the voice of the avatar is provided by text to speech voice model,» the company saud during the 'Microsoft Ignite' event late on Wednesday.
With text to speech avatar, the users can create more engaging digital interactions. They can use the avatar to build conversational agents, virtual assistants, chatbots, and more.
The text-to-speech avatar is designed with the intention of protecting the rights of individuals and society, fostering transparent human-computer interaction, and counteracting the proliferation of harmful deepfakes and misleading content.
«For this reason, custom avatar is a Limited Access feature available by registration only, and only for certain use cases. To access and use the feature in your business applications, register your use case here and apply for the access,» said the company.
The company is offering two separate text to speech avatar features at this time: prebuilt text to speech avatar and custom text to