Language detection with built-in AI

Published: September 24, 2024, Last updated: December 10, 2024

Before translating text from one language to another, you must first determine what language is used in the given text. Previously, translation required uploading the text to a cloud service, performing the translation on the server, then downloading the results.

The Language Detector API uses inference on-device so you can improve your privacy story. While it's possible to ship a specific library which does this, it would require additional resources to download.

Availability

Sign up for the origin trial

To start using the Language Detector API, follow these steps:

  1. Acknowledge Google's Generative AI Prohibited Uses Policy.
  2. Go to the Language Detector API origin trial.
  3. Click Register and fill out the form.
    • In the Web origin field, provide your origin or extension ID, chrome-extension://YOUR_EXTENSION_ID.
  4. To submit, click Register.
  5. Copy the token provided, and add it to every web page on your origin or file for your Extension, on which you want the trial to be enabled.
  6. Start using the Language Detection API.

Learn more about how to get started with origin trials.

Add support to localhost

To access the Language Detection API on localhost during the origin trial, you must update Chrome to the latest version. Then, follow these steps:

  1. Go to chrome://flags/#optimization-guide-on-device-model.
  2. Select Enabled BypassPerfRequirement. This skips performance checks and VRAM requirements, which may prevent Gemini Nano from downloading on your device.
  3. Go to chrome://flags/#language-detection-api.
  4. Select Enabled.
  5. Click Relaunch or restart Chrome.

Example use cases

The Language Detector API is primarily useful in the following scenarios:

  • Determine the language of input text, so it can be translated.
  • Determine the language of input text, so the correct model can be loaded for language-specific tasks, such as toxicity detection.
  • Determine the language of input text, so it can be labeled correctly, for example, in online social networking sites.
  • Determine the language of input text, so an app's interface can be adjusted accordingly. For example, on a Belgian site to only show the interface relevant to users who speak French.

Use the Language Detector API

The Language Detector API is part of the larger family of the Translator API. First, run feature detection to see if the browser supports the Language Detector API.

if ('ai' in self && 'languageDetector' in self.ai)
  // The Language Detector API is available.
}  

Model download

Language detection depends on a model that is fine-tuned for the specific task of detecting languages. While the API is built in the browser, the model is downloaded on-demand the first time a site tries to use the API. In Chrome, this model is very small by comparison with other models. In fact, it might already be present given that this model is also used by Chrome browser features.

To see if the model is ready to use, call the asynchronous self.ai.languageDetector.capabilities() function and inspect the available field. There are three possible responses:

  • 'no': The current browser supports the Language Detector API, but it can't be used at the moment. For example, because there isn't enough free disk space available to download the model.
  • 'readily': The current browser supports the Language Detector API, and it can be used right away.
  • 'after-download': The current browser supports the Language Detector API, but it needs to download the model first.

To trigger the download and instantiate the language detector, call the asynchronous self.ai.languageDetector.create() function. If the response to capabilities() was 'after-download', it's best practice to listen for download progress, so you can inform the user in case the download takes time.

To see if a given language can be detected, call the languageAvailable() function.

const languageDetectorCapabilities = await self.ai.languageDetector.capabilities();
languageDetectorCapabilities.languageAvailable('es');
// 'readily'

The following example demonstrates how to initialize the language detector.

const languageDetectorCapabilities = await self.ai.languageDetector.capabilities();
const canDetect = languageDetectorCapabilities.capabilities;
let detector;
if (canDetect === 'no') {
  // The language detector isn't usable.
  return;
}
if (canDetect === 'readily') {
  // The language detector can immediately be used.
  detector = await self.ai.languageDetector.create();
} else {
  // The language detector can be used after model download.
  detector = await self.ai.languageDetector.create({
    monitor(m) {
      m.addEventListener('downloadprogress', (e) => {
        console.log(`Downloaded ${e.loaded} of ${e.total} bytes.`);
      });
    },
  });
  await detector.ready;
}

Run the language detector

The Language Detector API uses a ranking model to determine which language is most likely used in a given piece of text. Ranking is a type of machine learning, where the objective is to order a list of items. In this case, the Language Detector API ranks languages from highest to lowest probability.

The detect() function can return either the first result, the likeliest answer, or iterate over the ranked candidates with the level of confidence. This is returned as a list of {detectedLanguage, confidence} objects. The confidence level is expressed as a value between 0.0 (lowest confidence) and 1.0 (highest confidence).

const someUserText \= 'Hallo und herzlich willkommen\!';
const results \= await detector.detect(someUserText);
for (const result of results) {
  // Show the full list of potential languages with their likelihood, ranked
  // from most likely to least likely. In practice, one would pick the top
  // language(s) that cross a high enough threshold.
  console.log(result.detectedLanguage, result.confidence);
}
// (Output truncated):
// de 0.9993835687637329
// en 0.00038279531872831285
// nl 0.00010798392031574622
// ...

Demo

Preview the Language Detector API in our demo. Enter text written in different languages in the textarea.

Standardization effort

The Language Detector API was moved to the W3C Web Incubator Community Group after the corresponding proposal received enough support. The API is part of a larger Translation API proposal.

The Chrome team requested feedback from the W3C Technical Architecture Group and asked Mozilla and WebKit for their standards positions.

Share your feedback

If you have feedback on Chrome's implementation, file a Chromium bug. Share your feedback on the API shape of the Language Detector API by commenting on an existing or open a new Issue in the Translation API GitHub repository.