Language detection in Chrome with built-in AI

Published: September 24, 2024

Before translating text from one language to another, you must first determine what language is used in the given text. Previously, translation required uploading the text to a cloud service, performing the translation on the server, then downloading the results.

The Language Detector API uses inference on-device so you can improve your privacy story. While it's possible to ship a specific library which does this, it would require additional resources to download.

Availability

Example use cases

The Language Detector API is primarily useful in the following scenarios:

  • Determine the language of input text, so it can be translated.
  • Determine the language of input text, so the correct model can be loaded for language-specific tasks, such as toxicity detection.
  • Determine the language of input text, so it can be labeled correctly, for example, in online social networking sites.
  • Determine the language of input text, so an app's interface can be adjusted accordingly. For example, on a Belgian site to only show the interface relevant to users who speak French.

Use the Language Detector API

The Language Detector API is part of the larger family of the Translator API. First, run feature detection to see if the browser supports the Language Detector API.

if ('translation' in self && 'canDetect' in self.translation) {
  // The Language Detector API is available.
}  

Model download

Language detection depends on a model that is fine-tuned for the specific task of detecting languages. While the API is built in the browser, the model is downloaded on-demand the first time a site tries to use the API. In Chrome, this model is very small by comparison with other models. In fact, it might already be present given that this model is also used by Chrome browser features.

To see if the model is ready to use, call the asynchronous translation.canDetect() function. There are three possible responses:

  • 'no': The current browser supports the Language Detector API, but it can't be used at the moment. For example, because there isn't enough free disk space available to download the model.
  • 'readily': The current browser supports the Language Detector API, and it can be used right away.
  • 'after-download': The current browser supports the Language Detector API, but it needs to download the model first.

To trigger the download and instantiate the language detector, call the asynchronous translation.createDetector() function. If the response to canDetect() was 'after-download', it's best practice to listen for download progress, so you can inform the user in case the download takes time.

The following example demonstrates how to initialize the language detector.

const canDetect = await translation.canDetect();
let detector;
if (canDetect === 'no') {
  // The language detector isn't usable.
  return;
}
if (canDetect === 'readily') {
  // The language detector can immediately be used.
  detector = await translation.createDetector();
} else {
  // The language detector can be used after model download.
  detector = await translation.createDetector();
  detector.addEventListener('downloadprogress', (e) => {
 console.log(e.loaded, e.total);
  });
  await detector.ready;
}

Run the language detector

The Language Detector API uses a ranking model to determine which language is most likely used in a given piece of text. Ranking is a type of machine learning, where the objective is to order a list of items. In this case, the Language Detector API ranks languages from highest to lowest probability.

The detect() function can return either the first result, the likeliest answer, or iterate over the ranked candidates with the level of confidence. This is returned as a list of {detectedLanguage, confidence} objects. The confidence level is expressed as a value between 0.0 (lowest confidence) and 1.0 (highest confidence).

const someUserText \= 'Hallo und herzlich willkommen\!';
const results \= await detector.detect(someUserText);
for (const result of results) {
  // Show the full list of potential languages with their likelihood, ranked
  // from most likely to least likely. In practice, one would pick the top
  // language(s) that cross a high enough threshold.
  console.log(result.detectedLanguage, result.confidence);
}
// (Output truncated):
// de 0.9993835687637329
// en 0.00038279531872831285
// nl 0.00010798392031574622
// ...

Demo

Preview the Language Detector API in our demo. Enter text written in different languages in the textarea.

Standardization effort

The Language Detector API was moved to the W3C Web Incubator Community Group after the corresponding proposal received enough support. The API is part of a larger Translation API proposal.

The Chrome team requested feedback from the W3C Technical Architecture Group and asked Mozilla and WebKit for their standards positions.

Share your feedback

If you have feedback on Chrome's implementation, file a Chromium bug. Share your feedback on the API shape of the Language Detector API by commenting on an existing or open a new Issue in the Translation API GitHub repository.