運用用戶端網頁 AI 鼓勵使用者提供實用的產品評論

Maud Nalpas

Kenji Baheux

Alexandra Klepper

發布日期：2024 年 5 月 16 日

正面和負面評論可做為買家的購買決策依據。

根據外部研究，82% 的網路購物者會在購物前主動尋找負面評論。負評對消費者和商家都有幫助，因為負評有助於降低退貨率，並協助製造商改良產品。

以下提供幾種提升評論品質的方法：

提交前，請先檢查每則評論是否含有毒性內容。我們可以鼓勵使用者移除冒犯性用語和其他無助的評論，讓其他使用者能參考他們的評論，做出更明智的購買決策。
- 負面：這個包包很爛，我討厭它。
- 負評，但提供實用意見拉鍊很難拉，材質感覺很廉價。我已退回這個袋子。
根據評論使用的語言自動生成評分。
判斷評論是負面還是正面。

螢幕截圖：顯示附有情緒和星級評等的評論範例。 — 在這個範例中，評論者的留言獲得正面情緒和五星評分。

最終產品評分應由使用者決定。

下列程式碼研究室提供用戶端解決方案，包括裝置端和瀏覽器。不需要具備 AI 開發知識、伺服器或 API 金鑰。

必要條件

雖然使用解決方案 (例如 Gemini API 或 OpenAI API) 的伺服器端 AI 可為許多應用程式提供強大的解決方案，但本指南著重於用戶端網頁 AI。用戶端 AI 推論會在瀏覽器中進行，藉此移除伺服器往返行程，提升網頁使用者體驗。

在本程式碼研究室中，我們會混合使用各種技術，向您展示用戶端 AI 工具箱的內容。

我們使用下列程式庫和模型：

TensforFlow.js，用於分析有害內容。TensorFlow.js 是開放原始碼機器學習程式庫，可在網路上進行推論和訓練。
transformers.js，用於情緒分析。Transformers.js 是 Hugging Face 的網頁 AI 程式庫。
Gemma 2B 適用於星級評等。Gemma 是一系列輕量級開放式模型，採用與建立 Gemini 模型時相同的研究成果和技術。如要在瀏覽器中執行 Gemma，我們會搭配使用 MediaPipe 的實驗性 LLM 推論 API。

使用者體驗和安全注意事項

為確保最佳使用者體驗和安全性，請注意以下事項：

允許使用者編輯評分。最終產品評分應由使用者決定。
清楚向使用者說明評分和評論是自動產生。
允許使用者發布歸類為有害的評論，但會在伺服器上執行第二次檢查。避免非惡意評論遭誤判為惡意內容 (誤判為正向)，造成令人沮喪的體驗。如果惡意使用者設法略過用戶端檢查，也屬於這種情況。
用戶端毒性檢查很有幫助，但可以略過。請務必在伺服器端執行檢查。

使用 TensorFlow.js 分析惡意指數

使用 TensorFlow.js 快速開始分析使用者評論的毒性。

安裝並匯入 TensorFlow.js 程式庫和毒性模型。
設定最低預測信賴度。預設值為 0.85，在本範例中，我們將其設為 0.9。
以非同步方式載入模型。
非同步分類評論。我們的程式碼會找出任何類別中超過 0.9 門檻的預測。

這個模型可將惡意內容分類為身分攻擊、侮辱、猥褻等。

例如：

import * as toxicity from '@tensorflow-models/toxicity';

// Minimum prediction confidence allowed
const TOXICITY_COMMENT_THRESHOLD = 0.9;

const toxicityModel = await toxicity.load(TOXICITY_COMMENT_THRESHOLD);
const toxicityPredictions = await toxicityModel.classify([review]);
// `predictions` is an array with the raw toxicity probabilities
const isToxic = toxicityPredictions.some(
    (prediction) => prediction.results[0].match
);

使用 Transformers.js 判斷情緒

安裝並匯入 Transformers.js 程式庫。
使用專屬管道設定情緒分析工作。首次使用管道時，系統會下載並快取模型。從此以後，情緒分析的速度應該會快上許多。

注意： 除非您指定模型，否則 Transformers.js 會使用特定管道的預設模型。
非同步分類評論。使用自訂門檻設定您認為適用於應用程式的信賴度。

例如：

import { pipeline } from '@xenova/transformers';

const SENTIMENT_THRESHOLD = 0.9;
// Create a pipeline (don't block rendering on this function)
const transformersjsClassifierSentiment = await pipeline(
  'sentiment-analysis'
);

// When the user finishes typing
const sentimentResult = await transformersjsClassifierSentiment(review);
const { label, score } = sentimentResult[0];
if (score > SENTIMENT_THRESHOLD) {
  // The sentiment is `label`
} else {
  // Classification is not conclusive
}

使用 Gemma 和 MediaPipe 建議星級評等

您可以使用 LLM 推論 API，完全在瀏覽器中執行大型語言模型 (LLM)。

考量到 LLM 的記憶體和運算需求比用戶端模型大上百倍，這項新功能特別具有變革性。這項功能得益於網路堆疊的各項最佳化，包括新的作業、量化、快取和權重共用。資料來源：「Large Language Models On-Device with MediaPipe and TensorFlow Lite」(使用 MediaPipe 和 TensorFlow Lite 在裝置端執行大型語言模型)。

安裝及匯入 MediaPipe LLM 推論 API。
下載模型。這裡我們使用 Gemma 2B，從 Kaggle 下載。 Gemma 2B 是 Google 最小的開放權重模型。
將程式碼指向正確的模型檔案，並使用 FilesetResolver。這項步驟非常重要，因為生成式 AI 模型的資產可能會有特定目錄結構。
使用 MediaPipe 的 LLM 介面載入及設定模型。準備使用模型：指定模型位置、偏好的回覆長度，以及偏好的創意程度 (透過溫度)。
為模型提供提示 (查看範例)。
等待模型回覆。
剖析評分：從模型的回覆中擷取星等評分。

import { FilesetResolver, LlmInference } from '@mediapipe/tasks-genai';

const mediaPipeGenAi = await FilesetResolver.forGenAiTasks();
const llmInference = await LlmInference.createFromOptions(mediaPipeGenAi, {
    baseOptions: {
        modelAssetPath: '/gemma-2b-it-gpu-int4.bin',
    },
    maxTokens: 1000,
    topK: 40,
    temperature: 0.5,
    randomSeed: 101,
});

const prompt = …
const output = await llmInference.generateResponse(prompt);

const int = /\d/;
const ratingAsString = output.match(int)[0];
rating = parseInt(ratingAsString);

提示範例

const prompt = `Analyze a product review, and then based on your analysis give me the
corresponding rating (integer). The rating should be an integer between 1 and 5.
1 is the worst rating, and 5 is the best rating. A strongly dissatisfied review
that only mentions issues should have a rating of 1 (worst). A strongly
satisfied review that only mentions positives and upsides should have a rating
of 5 (best). Be opinionated. Use the full range of possible ratings (1 to 5). \n\n
  \n\n
  Here are some examples of reviews and their corresponding analyses and ratings:
  \n\n
  Review: 'Stylish and functional. Not sure how it'll handle rugged outdoor use,
  but it's perfect for urban exploring.'
  Analysis: The reviewer appreciates the product's style and basic
  functionality. They express some uncertainty about its ruggedness but overall
  find it suitable for their intended use, resulting in a positive, but not
  top-tier rating.
  Rating (integer): 4
  \n\n
  Review: 'It's a solid backpack at a decent price. Does the job, but nothing
  particularly amazing about it.'
  Analysis: This reflects an average opinion. The backpack is functional and
  fulfills its essential purpose. However, the reviewer finds it unremarkable
  and lacking any standout features deserving of higher praise.
  Rating (integer): 3
  \n\n
  Review: 'The waist belt broke on my first trip! Customer service was
  unresponsive too. Would not recommend.'
  Analysis: A serious product defect and poor customer service experience
  naturally warrants the lowest possible rating. The reviewer is extremely
  unsatisfied with both the product and the company.
  Rating (integer): 1
  \n\n
  Review: 'Love how many pockets and compartments it has. Keeps everything
  organized on long trips. Durable too!'
  Analysis: The enthusiastic review highlights specific features the user loves
  (organization and durability), indicating great satisfaction with the product.
  This justifies the highest rating.
  Rating (integer): 5
  \n\n
  Review: 'The straps are a bit flimsy, and they started digging into my
  shoulders under heavy loads.'
  Analysis: While not a totally negative review, a significant comfort issue
  leads the reviewer to rate the product poorly. The straps are a key component
  of a backpack, and their failure to perform well under load is a major flaw.
  Rating (integer): 1
  \n\n
  Now, here is the review you need to assess:
  \n
  Review: "${review}" \n`;

重點整理

不需要具備 AI/機器學習專業知識。設計提示需要疊代，但其餘程式碼都是標準的網頁開發。

用戶端模型相當準確。如果您執行這份文件中的程式碼片段，會發現惡意內容和情緒分析都能提供準確結果。在大多數情況下，Gemma 的評分與幾項測試參考評論的 Gemini 模型評分相符。為驗證準確度，需要進行更多測試。

不過，設計 Gemma 2B 的提示詞需要花費心力。由於 Gemma 2B 是小型 LLM，因此需要詳細提示才能產生令人滿意的結果，而且比 Gemini API 的提示更詳細。

推論速度極快。如果您執行這份文件中的程式碼片段，應該會發現許多裝置的推論速度很快，可能比伺服器往返行程更快。不過，推論速度可能會大幅變動。您必須在目標裝置上進行徹底的基準化測試。我們預期隨著 WebGPU、WebAssembly 和程式庫更新，瀏覽器推論速度會持續提升。舉例來說，Transformers.js 在 v3 中新增了 Web GPU 支援，可大幅加快裝置端推論速度。

下載大小可能非常大。瀏覽器中的推論速度很快，但載入 AI 模型可能會有困難。如要執行瀏覽器內建 AI，通常需要程式庫和模型，這會增加網頁應用程式的下載大小。

Tensorflow 毒性模型 (經典的自然語言處理模型) 只有幾 KB，但 Transformers.js 的預設情緒分析模型等生成式 AI 模型卻高達 60 MB。Gemma 等大型語言模型的大小可達 1.3 GB。這遠遠超過中位數 2.2 MB 的網頁大小，且遠大於建議的最佳效能大小。在特定情況下，用戶端生成式 AI 是可行的解決方案。

網頁生成式 AI 領域正迅速發展！未來預計會出現更小的網頁最佳化模型。

後續步驟

Chrome 正在實驗另一種在瀏覽器中執行生成式 AI 的方式。如要測試這項功能，請申請加入搶先體驗計畫。