An experimental polyfill for the Prompt API

Thomas Steiner

Published: May 14, 2026

With the Prompt API in Chrome, you can interact with an LLM using a high-level browser API on window.LanguageModel. However there is still limited support for this and implementation is a complex process.

Browser	Supported OS	Unsupported OS	Position
Chrome	Windows, macOS, Linux, ChromeOS (Chromebook Plus)	Android, iOS	✅ Supported
Edge	Windows, macOS	Android, iOS	✅ Supported
Safari	—	—	📋 Position decided
Firefox	—	—	📋 Position decided

At the same time, developers in the early preview program have shared their enthusiasm for the Prompt API. The availability of the API poses a compatibility challenge for the foreseeable future.

Solution

This is why we are releasing an experimental spec-compliant Prompt API polyfill (see the source code on GitHub) that accurately implements the Prompt API on top of configurable cloud backend providers and also on top of a local backend provider in the form of Transformers.js.

Use the polyfill

To use the polyfill, do the following:

Download the polyfill from npm:
```
npm install prompt-api-polyfill
```
Choose if you want to use a cloud backend provider or a local backend provider:
- Cloud backend provider: User data is sent to the cloud for remote processing, but you don't have to wait for a local model to be available. You are responsible for any incurred cost according to the pricing information of your cloud provider.
- Local backend provider: User data stays in the browser and is processed locally, but you need to download a model, which, unlike with the real Prompt API, can't be shared across different origins. There's no cost involved with local processing.

Cloud backend

Choose between any of the cloud backends and get an API key (and any additional credentials) for your backend provider.

Once you have your API key, enter the details in your configuration file .env.json. If you don't specify a modelName, the polyfill will use each backend's default model, but if you do, you can select one of the supported models of each backend.

{
  "apiKey": "y0ur-Api-k3Y",
  "modelName": "model-name"
}

Local backend

If you decide to go with a local backend provider based on Transformers.js, you only need a dummy API key. You can, however, configure what device Transformers.js should use. Choose "webgpu" for maximum performance, and "wasm" for maximum compatibility. You can optionally change the default settings. Choose another model from the Hugging Face catalog of compatible models. For some models, you can select from different quantizations using the dtype parameter.

{
  "apiKey": "dummy",
  "device": "webgpu",
  "dtype": "q4f16",
  "modelName": "onnx-community/gemma-3-1b-it-ONNX-GQA"
};

Configure your polyfill

With the configuration file in place, you can now start using the polyfill in your app.

Import the configuration file and assign it to an aptly named global variable, where $BACKEND is your chosen backend: window.$BACKEND_CONFIG.
Use a dynamic import to only load the polyfill when the underlying browser doesn't support it.
Call the Prompt API functions.

import config from './.env.json' with { type: 'json' };

// Set $BACKEND_CONFIG to select a backend
window.$BACKEND_CONFIG = config;

if (!('LanguageModel' in window)) {
  await import('prompt-api-polyfill');
}

const session = await LanguageModel.create({
  expectedInputs: [{type: 'text', languages: ['en']}],
  expectedOutputs: [{type: 'text', languages: ['en']}],
});
await session.prompt('Tell me a joke!');

The polyfill supports structured output (except for the Transformers.js backend), deals with multimodal input (except for the OpenAI backend that doesn't support audio and image together, only separately), and is tested against the complete Web Platform Tests suite for the LanguageModel.

For more background and detailed usage information as well as the source code, see the README file in the GitHub repo.

Difference from the browser Prompt API

If the polyfill is backed by cloud models, some of the benefits of running client-side don't apply anymore. Namely, you can no longer guarantee the local processing of sensitive data, though the privacy policies of your backend provider still apply. Your app can also no longer use AI when the user is offline. To find out if you're on- or offline, you can listen for the corresponding events.

window.addEventListener("offline", (e) => {
  console.log("offline");
});

window.addEventListener("online", (e) => {
  console.log("online");
});

If the AI inference runs against a model in the cloud, there is no local model to download. The polyfill fakes the downloadprogress events, so to your app it will appear as if the built-in model was already downloaded, which means there will be two events, one with a loaded value of 0 and one with 1, which is what the spec requires.

With cloud-based inference, unlike with on-device inference, there's a potential cost involved when calling APIs from your backend provider of choice. Check the pricing information, like the one for Gemini API. If you know the cost per token, you can use the Prompt API's contextUsage information to calculate the cost.

const COST_PER_TOKEN = 123;
const COST_LIMIT = 456;

let costSoFar = 0;

const session = await LanguageModel.create(options);

/…/

if (costSoFar < COST_LIMIT) {
  await session.prompt('Tell me a joke.');
  costSoFar = session.contextUsage * COST_PER_TOKEN;
} else {
  // Show premium AI plan promo.
}

When you call a cloud API directly from a mobile or web app (for example, the APIs that allow access to generative AI models), the API key is vulnerable to abuse by unauthorized clients. To help protect these APIs, if you use Firebase AI Logic Hybrid SDK, you should use Firebase App Check to verify that all incoming API calls are from your actual app. With some cloud providers like Google, you can also enforce strict origin checks to make sure only allowed websites can use the API.

Rather than the limits of the Prompt API, for example, regarding the session's contextWindow, the limits of the backend provider apply. For the contextWindow, these limits are typically a lot higher than on-device, and you can process larger amounts of data in the cloud, so while you should be aware of the difference, in practice, you likely won't run into problems with this.

Create your own backend

To add your own backend provider, follow these steps:

Extend the base backend class

Create a new file in the backends/ directory, for example, backends/custom-backend.js. You need to extend the PolyfillBackend class and implement the core methods that satisfy the expected interface.

import PolyfillBackend from './base.js';
import { DEFAULT_MODELS } from './defaults.js';

export default class CustomBackend extends PolyfillBackend {
  constructor(config) {
    // config typically comes from a window global (e.g., window.CUSTOM_CONFIG)
    super(config.modelName || DEFAULT_MODELS.custom.modelName);
  }

  // Check if the backend is configured (e.g., API key is present), if given
  // combinations of modelName and options are supported, or, for local model,
  // if the model is available.
  static availability(options) {
    return window.CUSTOM_CONFIG?.apiKey ? 'available' : 'unavailable';
  }

  // Initialize the underlying SDK or API client. With local models, use
  // monitorTarget to report model download progress to the polyfill.
  createSession(options, sessionParams, monitorTarget) {
    // Return the initialized session or client instance
  }

  // Non-streaming prompt execution
  async generateContent(contents) {
    // contents: Array of { role: 'user'|'model', parts: [{ text: string }] }
    // Return: { text: string, usage: number }
  }

  // Streaming prompt execution
  async generateContentStream(contents) {
    // Return: AsyncIterable yielding chunks
  }

  // Token counting for quota/usage tracking
  async countTokens(contents) {
    // Return: total token count (number)
  }
}

Register your backend

The polyfill uses a "First-Match Priority" strategy based on global configuration. You need to register your backend in the prompt-api-polyfill.js file by adding it to the static #backends array:

// prompt-api-polyfill.js
static #backends = [
  // ... existing backends
  {
    config: 'CUSTOM_CONFIG', // The global object to look for on `window`
    path: './backends/custom-backend.js',
  },
];

Set a default model

Define the fallback model identity in backends/defaults.js. This is used when a user initializes a session without specifying a specific modelName.

// backends/defaults.js
export const DEFAULT_MODELS = {
  // ...
  custom: 'custom-model-pro-v1',
};

Enable local development and testing

The project uses a discovery script (scripts/list-backends.js) to generate test matrixes. To include your new backend in the test runner, create a .env-[name].json file (for example, .env-custom.json) in the root directory:

{
  "apiKey": "your-api-key-here",
  "modelName": "custom-model-pro-v1"
}

Verify with Web Platform Tests (WPT)

The final step is ensuring compliance. Because the polyfill is spec-driven, any new backend should pass the official (or tentative) Web Platform Tests:

npm run test:wpt

This verification step ensures that your backend handles things like AbortSignal, system prompts, and history formatting exactly as the Prompt API specification expects.

Conclusion

The polyfill helps you use the Prompt API on all platforms and devices. By coding against the Prompt API's well-defined API, you make yourself more independent from cloud providers and stay as close to the platform as possible.

On capable devices that support the Prompt API, the polyfill isn't even loaded, so you spare your users from downloading code they won't execute. If you have feedback or run into a bug, open an Issue on GitHub. Happy prompting!