A lot has changed in Web AI over the last year. In case you missed it, we gave a talk at I/O 2024 about the new models, tools, and APIs for your next web app.
Web AI is a set of technologies and techniques to use machine learning (ML) models, client-side in a web browser running on a device's CPU or GPU. This can be built with JavaScript and other web technologies, such as WebAssembly and WebGPU. This is unlike server-side AI or "Cloud AI," where the model executes on a server and is accessed with an API.
In this talk, we shared:
- How to run our new large language models (LLMs) in the browser and the impact of running models client-side;
- A look into the future of Visual Blocks, to prototype faster;
- And how web developers can use JavaScript in Chrome to work with Web AI, at scale.
LLMs in the browser
Gemma Web is a new open model from Google that can run in the browser on a user's device, built from the same research and technology we used to create Gemini.
By bringing an LLM on-device, there is significant potential for cost savings as compared to running on a cloud server for inference, along with enhanced user privacy and reduced latency. Generative AI in the browser is still in its early stages, but as hardware continues to improve (with higher CPU and GPU RAM), we expect more models to become available.
Businesses can reimagine what you can do on a web page, especially for task-specific use cases, where the weights of smaller LLMs (2 to 8 billion parameters) can be tuned to run on consumer hardware.
Gemma 2B is available to download on Kaggle Models, and comes in a format that is compatible with our Web LLM inference API. Other supported architectures include Microsoft Phi-2, Falcon RW 1B, and Stable LM 3B, which you can convert to a format that the runtime can use, using our converter library.
Build faster prototypes with Visual Blocks
We're collaborating with Hugging Face, who have created 16 brand new custom nodes for Visual Blocks. This brings Transformers.js and the wider Hugging Face ecosystem to Visual Blocks.
Eight of these new nodes run entirely client side, with Web AI, including:
- Image segmentation
- Translation
- Token classification
- Object detection
- Text classification
- Background removal
- Depth estimation
Additionally, there are seven server-side ML tasks from Hugging Face that allow you to run thousands of models with APIs in Visual Blocks. Check out the Hugging Face Visual Blocks collection.
Use JavaScript for Web AI at scale with Chrome
In the previous instances, such as with Gemma, the model is loaded and run within the web page itself. Chrome is working on built-in, on-device AI, where you could access models with standardized, task-specific JavaScript APIs.
And that's not all. Chrome has also updated WebGPU with support for 16 bit floating point values.
WebAssembly has a new proposal, Memory64, to support 64 bit memory indexes, which would allow you to load larger AI models than before.
Start testing Web AI models with headless Chrome
You can now test client-side AI (or any application that needs WebGL or WebGPU support) using Headless Chrome, while making use of server-side GPUs for acceleration such as an NVIDIA T4 or P100 Learn more:
- Run it in Google Colab
- Read a testing deep dive
- And check out the example code on GitHub
Remember, when you share what you create, add #WebAI so the wider community can see your work. Share your findings and suggestions on X, LinkedIn, or the social platform you prefer.