發布日期:2025 年 1 月 21 日
串流 LLM 回應包含以增量方式持續傳送的資料。串流資料在伺服器和用戶端中的外觀不同。
來自伺服器
為了瞭解串流回應的樣貌,我使用指令列工具 curl
要求 Gemini 說個長笑話給我聽。請考慮下列對 Gemini API 的呼叫。如果您嘗試使用這個方法,請務必將網址中的 {GOOGLE_API_KEY}
替換成 Gemini API 金鑰。
$ curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:streamGenerateContent?alt=sse&key={GOOGLE_API_KEY}" \
-H 'Content-Type: application/json' \
--no-buffer \
-d '{ "contents":[{"parts":[{"text": "Tell me a long T-rex joke, please."}]}]}'
這項要求會以事件串流格式記錄以下 (經截斷) 輸出內容。每行開頭都會以 data:
開頭,後面接著訊息酬載。具體格式其實不重要,重要的是文字片段。
//
data: {"candidates":[{"content": {"parts": [{"text": "A T-Rex"}],"role": "model"},
"finishReason": "STOP","index": 0,"safetyRatings": [{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE"},{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE"},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE"},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE"}]}],
"usageMetadata": {"promptTokenCount": 11,"candidatesTokenCount": 4,"totalTokenCount": 15}}
data: {"candidates": [{"content": {"parts": [{ "text": " walks into a bar and orders a drink. As he sits there, he notices a" }], "role": "model"},
"finishReason": "STOP","index": 0,"safetyRatings": [{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT","probability": "NEGLIGIBLE"},{"category": "HARM_CATEGORY_HATE_SPEECH","probability": "NEGLIGIBLE"},{"category": "HARM_CATEGORY_HARASSMENT","probability": "NEGLIGIBLE"},{"category": "HARM_CATEGORY_DANGEROUS_CONTENT","probability": "NEGLIGIBLE"}]}],
"usageMetadata": {"promptTokenCount": 11,"candidatesTokenCount": 21,"totalTokenCount": 32}}
第一個酬載是 JSON。請仔細查看醒目的 candidates[0].content.parts[0].text
:
{
"candidates": [
{
"content": {
"parts": [
{
"text": "A T-Rex"
}
],
"role": "model"
},
"finishReason": "STOP",
"index": 0,
"safetyRatings": [
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"probability": "NEGLIGIBLE"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"probability": "NEGLIGIBLE"
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"probability": "NEGLIGIBLE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"probability": "NEGLIGIBLE"
}
]
}
],
"usageMetadata": {
"promptTokenCount": 11,
"candidatesTokenCount": 4,
"totalTokenCount": 15
}
}
第一個 text
項目是 Gemini 回應的開頭。當您擷取更多 text
項目時,回應會以換行符號分隔。
以下程式碼片段顯示多個 text
項目,這些項目會顯示模型的最終回應。
"A T-Rex"
" was walking through the prehistoric jungle when he came across a group of Triceratops. "
"\n\n\"Hey, Triceratops!\" the T-Rex roared. \"What are"
" you guys doing?\"\n\nThe Triceratops, a bit nervous, mumbled,
\"Just... just hanging out, you know? Relaxing.\"\n\n\"Well, you"
" guys look pretty relaxed,\" the T-Rex said, eyeing them with a sly grin.
\"Maybe you could give me a hand with something.\"\n\n\"A hand?\""
...
不過,如果您不問霸王龍笑話,而是問模型較複雜的問題,會發生什麼事呢?舉例來說,要求 Gemini 提供 JavaScript 函式,用於判斷數字是偶數還是奇數。text:
區塊看起來稍有不同。
輸出內容現在包含 Markdown 格式,開頭為 JavaScript 程式碼區塊。以下範例包含與先前相同的預先處理步驟。
"```javascript\nfunction"
" isEven(number) {\n // Check if the number is an integer.\n"
" if (Number.isInteger(number)) {\n // Use the modulo operator"
" (%) to check if the remainder after dividing by 2 is 0.\n return number % 2 === 0; \n } else {\n "
"// Return false if the number is not an integer.\n return false;\n }\n}\n\n// Example usage:\nconsole.log(isEven("
"4)); // Output: true\nconsole.log(isEven(7)); // Output: false\nconsole.log(isEven(3.5)); // Output: false\n```\n\n**Explanation:**\n\n1. **`isEven("
"number)` function:**\n - Takes a single argument `number` representing the number to be checked.\n - Checks if the `number` is an integer using `Number.isInteger()`.\n - If it's an"
...
更複雜的是,部分標記項目會在一個區塊開始,在另一個區塊結束。部分標記是巢狀的。在下列範例中,醒目顯示的函式會分成兩行:**isEven(
和 number) function:**
。合併後的輸出內容為 **isEven("number) function:**
。也就是說,如果您想輸出格式化的 Markdown,就不能只使用 Markdown 剖析器個別處理每個區塊。
來自用戶端
如果您在用戶端上使用 MediaPipe LLM 等架構執行 Gemma 等模型,串流資料會透過回呼函式傳送。
例如:
llmInference.generateResponse(
inputPrompt,
(chunk, done) => {
console.log(chunk);
});
使用 Prompt API 時,您可以透過重複執行 ReadableStream
,以區塊形式取得串流資料。
const languageModel = await self.ai.languageModel.create();
const stream = languageModel.promptStreaming(inputPrompt);
for await (const chunk of stream) {
console.log(chunk);
}
後續步驟
您是否想知道如何以高效且安全的方式算繪串流資料?請參閱呈現 LLM 回覆的最佳做法。