Published: November 21, 2024
After numerous incredible submissions to the Gemini API Developer Competition, we've selected the winner for best web application: ViddyScribe.
ViddyScribe exemplifies how Gemini can help make videos more accessible on YouTube, and potentially beyond, by generating audio descriptions of any video that are tailored to people who are visually impaired.
Features and Gemini capabilities
ViddyScribe built a user-first designed application. While a number of solutions already exist to generate transcripts and audio descriptions, ViddyScribe prioritized creating an output that prioritizes both quick results and a pleasant user experience for a specific audience: people with visual impairments.
Manual annotation of videos to offer addition details for this audience takes too much time, and is often neglected. ViddyScribe used Gemini to help create a custom solution that scales beyond adding some arbitrary frame descriptions to a text file.
ViddyScribe used prompt engineering to get the best results, curating the question language and style for Gemini 1.5 Pro. This prompt used chain-of-thought prompting to request:
- Purpose and context of the video.
- Tailored audio descriptions using video-specific analysis and guidelines.
- Reformatted timestamps and descriptions for a predictable and consistent format.
Why we chose ViddyScribe
We chose ViddyScribe because it was an elegant solution to a real user problem.
While they found there were other applications on the market providing audio descriptions, they felt the needs of people who are deaf and visually impaired were not fully understood. These developers worked with real people who have these disabilities to determine exactly what they needed in an audio description application.
The experience of people with disabilities can vary greatly, and sometimes, they may have competing needs. Additionally, audio descriptions can also make these videos accessible to people who are neurodivergent and others that prefer to read a transcript rather than watching a video.
We're excited to see how developers continue to enhance ViddyScribe, expanding the audience and capabilities in the future.
Keep building with built-in AI APIs
ViddyScribe was just one of the many amazing applications you built with Gemini.
We're developing built-in AI: web platform APIs and browser features designed to integrate AI models, including large language models (LLMs), directly into the browser. This includes Gemini Nano, the most efficient version of the Gemini family of LLMs, designed to run locally on most modern desktop and laptop computers.
Discover the available APIs to start building powerful websites, web applications, and Chrome Extensions.
Share what you build with us at @ChromiumDev or share with Chrome for Developers on LinkedIn.