GPT-4 competitor, Gemini. Google admitted that its video named "Hands-on with Gemini: Interacting with multimodal AI," was edited to expedite the outputs (which was declared in the video description), but there was no voice interaction between the human and the AI, as per a Bloomberg report.
Rather than having Gemini respond to or predict a drawing or change in objects on the table in real time, the demo was made by "using still images from the footage and prompting via text." The video attempts to mislead the audience into believing Gemini's capability is rather questionable due to the lack of disclaimers about how inputs are actually made. “Really happy to see the interest around our “Hands-on with Gemini" video.
In our developer blog yesterday, we broke down how Gemini was used to create it," said, Oriol Vinyals, VP of Research & Deep Learning Lead, Google DeepMind. Gemini co-lead, in a post on X.
“We gave Gemini sequences of different modalities — image and text in this case — and had it respond by predicting what might come next. Devs can try similar things when access to Pro opens on 12/13
. Read more on livemint.com