@stephen-lord/other2 1.0.8 → 1.0.10
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/docs/manus/CN-/346/211/222/345/256/214/345/205/250/347/275/221/346/234/200/345/274/272-AI-/345/233/242/351/230/237/347/232/204-Context-Engineering-/346/224/273/347/225/245/346/210/221/344/273/254/346/200/273/347/273/223/345/207/272/344/272/206/350/277/231-5-/345/244/247/346/226/271/346/263/225-/346/231/272/346/272/220/347/244/276/345/214/272.md +2464 -0
- package/dist/docs/manus/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus.md +212 -0
- package/dist/docs/manus/Context-Engineering-for-AI-Agents-Part-2.md +96 -0
- package/dist/docs/manus/Industry.md +94 -0
- package/dist/docs/manus/Observability-for-Manus-15-Agents-Logs-Retries-and-Error-Budgets.md +346 -0
- package/dist/docs/manus/OpenManus-Technical-Analysis-Architecture-and-Implementation-of-an-Open-Source-A.md +324 -0
- package/dist/docs/manus/README.md +85 -0
- package/dist/docs/manus/Tech-Constrained-Decoding-Agent-Reliability.md +81 -0
- package/dist/docs/manus/Tech-How-to-build-function-calling-and-JSON-mode.md +43 -0
- package/dist/docs/manus/Tech-Understanding-Logit-Bias-in-LLMs-Medium.md +1354 -0
- package/dist/docs/manus/The-Performance-Reality-KV-Cache-as-the-North-Star.md +155 -0
- package/dist/docs/manus/Why-Context-Engineering.md +125 -0
- package/dist/docs/manus/article_1_raw.md +1 -0
- package/dist/docs/manus/split_articles.py +52 -0
- package/dist/docs/manus//346/235/245/350/207/252-Manus-/347/232/204/344/270/200/346/211/213/345/210/206/344/272/253/345/246/202/344/275/225/346/236/204/345/273/272-AI-Agent-/347/232/204/344/270/212/344/270/213/346/226/207/345/267/245/347/250/213-/346/231/272/346/272/220/347/244/276/345/214/272.md +2180 -0
- package/dist/ui-ux-pro-max/SKILL.md +386 -0
- package/dist/ui-ux-pro-max/data/charts.csv +26 -0
- package/dist/ui-ux-pro-max/data/colors.csv +97 -0
- package/dist/ui-ux-pro-max/data/icons.csv +101 -0
- package/dist/ui-ux-pro-max/data/landing.csv +31 -0
- package/dist/ui-ux-pro-max/data/products.csv +97 -0
- package/dist/ui-ux-pro-max/data/prompts.csv +24 -0
- package/dist/ui-ux-pro-max/data/react-performance.csv +45 -0
- package/dist/ui-ux-pro-max/data/stacks/flutter.csv +53 -0
- package/dist/ui-ux-pro-max/data/stacks/html-tailwind.csv +56 -0
- package/dist/ui-ux-pro-max/data/stacks/jetpack-compose.csv +53 -0
- package/dist/ui-ux-pro-max/data/stacks/nextjs.csv +53 -0
- package/dist/ui-ux-pro-max/data/stacks/nuxt-ui.csv +51 -0
- package/dist/ui-ux-pro-max/data/stacks/nuxtjs.csv +59 -0
- package/dist/ui-ux-pro-max/data/stacks/react-native.csv +52 -0
- package/dist/ui-ux-pro-max/data/stacks/react.csv +54 -0
- package/dist/ui-ux-pro-max/data/stacks/shadcn.csv +61 -0
- package/dist/ui-ux-pro-max/data/stacks/svelte.csv +54 -0
- package/dist/ui-ux-pro-max/data/stacks/swiftui.csv +51 -0
- package/dist/ui-ux-pro-max/data/stacks/vue.csv +50 -0
- package/dist/ui-ux-pro-max/data/styles.csv +59 -0
- package/dist/ui-ux-pro-max/data/typography.csv +58 -0
- package/dist/ui-ux-pro-max/data/ui-reasoning.csv +101 -0
- package/dist/ui-ux-pro-max/data/ux-guidelines.csv +100 -0
- package/dist/ui-ux-pro-max/data/web-interface.csv +31 -0
- package/dist/ui-ux-pro-max/scripts/__pycache__/core.cpython-310.pyc +0 -0
- package/dist/ui-ux-pro-max/scripts/__pycache__/core.cpython-312.pyc +0 -0
- package/dist/ui-ux-pro-max/scripts/__pycache__/design_system.cpython-312.pyc +0 -0
- package/dist/ui-ux-pro-max/scripts/core.py +258 -0
- package/dist/ui-ux-pro-max/scripts/design_system.py +1066 -0
- package/dist/ui-ux-pro-max/scripts/search.py +106 -0
- package/package.json +6 -6
|
@@ -0,0 +1,1354 @@
|
|
|
1
|
+
# Understanding Logit Bias in LLMs | Medium
|
|
2
|
+
|
|
3
|
+
**Source:** https://medium.com/@serhatcck/token-level-control-in-openai-models-a-developers-guide-to-logit-bias-6fcc04a8a41f
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
URL: https://medium.com/@serhatcck/token-level-control-in-openai-models-a-developers-guide-to-logit-bias-6fcc04a8a41f
|
|
8
|
+
Author: Serhat ÇİÇEK
|
|
9
|
+
|
|
10
|
+
Understanding Logit Bias in LLMs | Medium
|
|
11
|
+
|
|
12
|
+
Sitemap
|
|
13
|
+
|
|
14
|
+
Open in app
|
|
15
|
+
|
|
16
|
+
Sign up
|
|
17
|
+
|
|
18
|
+
Sign in
|
|
19
|
+
|
|
20
|
+
Medium Logo
|
|
21
|
+
|
|
22
|
+
Get app
|
|
23
|
+
|
|
24
|
+
Write
|
|
25
|
+
|
|
26
|
+
Search
|
|
27
|
+
|
|
28
|
+
Sign up
|
|
29
|
+
|
|
30
|
+
Sign in
|
|
31
|
+
|
|
32
|
+
# Token-Level Control in OpenAI Models: A Developer’s Guide to Logit Bias
|
|
33
|
+
|
|
34
|
+
Serhat ÇİÇEK
|
|
35
|
+
|
|
36
|
+
6 min read
|
|
37
|
+
|
|
38
|
+
Dec 11, 2025
|
|
39
|
+
|
|
40
|
+
--
|
|
41
|
+
|
|
42
|
+
Listen
|
|
43
|
+
|
|
44
|
+
Share
|
|
45
|
+
|
|
46
|
+
Press enter or click to view image in full size
|
|
47
|
+
|
|
48
|
+
Logit bias is one of the least discussed yet most influential parameters in modern LLM development. By directly manipulating token-level probabilities, it allows developers to shape model outputs with a precision that traditional prompting cannot achieve. Whether you need deterministic text generation, safer content boundaries, or fine-tuned behavioral constraints without model retraining, logit bias provides a powerful mechanism for controlling how an LLM thinks and responds. Understanding this feature is essential for anyone building reliable, predictable, and production-grade AI systems.
|
|
49
|
+
|
|
50
|
+
## What Is Bias in Large Language Models?
|
|
51
|
+
|
|
52
|
+
Bias in Large Language Models refers to systematic tendencies in how a model generates text. Instead of producing purely neutral or balanced outputs, an LLM may favor certain tokens, ideas, or associations because of the statistical patterns it learned during training. This means the model’s responses can shift — not due to correctness, but due to patterns embedded in the data or architecture.
|
|
53
|
+
|
|
54
|
+
A useful way to understand this is to compare LLM bias with human cognitive bias. For example, if a developer spends years writing code mostly in Python, they naturally develop a positive bias toward that language. When starting a new project, choosing Python becomes more likely — not necessarily because it is the best option, but because past experience shaped their preference.
|
|
55
|
+
|
|
56
|
+
LLMs behave in a similar way: if the training data heavily features certain topics, styles, or associations, the model becomes more inclined to reproduce them.
|
|
57
|
+
|
|
58
|
+
In practical terms, bias is simply a distortion in token probabilities. Some tokens become more likely, others less likely, shaping the model’s tone, content, and reasoning. For AI developers, understanding this is crucial: bias affects predictability, alignment, safety, and overall output quality.
|
|
59
|
+
|
|
60
|
+
### Common Sources of Bias in LLMs
|
|
61
|
+
|
|
62
|
+
- Instruction-Tuning Bias — Reinforcement from human feedback shaping preferred behaviors.
|
|
63
|
+
- Decoding-Time Bias — Sampling techniques and parameters (e.g., temperature, logit bias) that shift token probabilities.
|
|
64
|
+
- Objective & Loss Function Bias — Optimization that favors certain patterns over others.
|
|
65
|
+
- Representational Bias — Embeddings forming unequal relationships between concepts.
|
|
66
|
+
- Training Data Imbalance — Overrepresented topics, sentiments, or cultural viewpoints.
|
|
67
|
+
|
|
68
|
+
## Bias and Tokens in OpenAI Models
|
|
69
|
+
|
|
70
|
+
OpenAI’s bias controls — such as logit_bias — operate entirely at the token level. Tokens are the smallest units a model uses to understand and generate text. Instead of reading characters or whole words, LLMs break text into tokens using a tokenizer. This means that any bias applied to a model directly alters the probability distribution of individual tokens, not entire sentences.
|
|
71
|
+
|
|
72
|
+
For example:
|
|
73
|
+
|
|
74
|
+
- [“Hello”, “ world”, “!”]
|
|
75
|
+
- A simple sentence like “Hello world!” could be encoded into:
|
|
76
|
+
- The word “JavaScript” may be split into multiple tokens depending on the tokenizer.
|
|
77
|
+
- The word “Python” might be a single token.
|
|
78
|
+
|
|
79
|
+
Because bias works at this granular level, developers must understand how tokenization affects outcomes. Applying a positive logit bias to the token representing “Python” increases the chance of the model using that word in its response. Conversely, applying a strong negative bias to a token like “no” can drastically reduce its appearance, even if it makes the response grammatically awkward.
|
|
80
|
+
|
|
81
|
+
### What Do +100 and −100 Mean in Logit Bias?
|
|
82
|
+
|
|
83
|
+
In OpenAI models, logit bias values such as +100 and −100 represent extremely strong adjustments to a token’s probability. A +100 bias forces the model to strongly prefer a specific token — effectively pushing its probability close to 100% whenever it’s a valid next token. Conversely, a −100 bias nearly eliminates a token from the generation process, reducing its probability to almost zero. These extreme values behave like “hard constraints” for token selection. For example, applying +100 to the token for “Python” almost guarantees the model will mention it, while assigning −100 to the token “Java” will cause the model to avoid it entirely — even when it would normally be a reasonable choice. Although smaller values can create more nuanced nudges, ±100 is commonly used when developers need deterministic control over model output.
|
|
84
|
+
|
|
85
|
+
## PoC
|
|
86
|
+
|
|
87
|
+
To experiment with token-level biasing in real time, you can download and run the project available at LLM-Bias on GitHub. Simply clone the repository from
|
|
88
|
+
|
|
89
|
+
https://github.com/Serhatcck/LLM-Bias, install the dependencies, and start the application. The interface allows you to enter a prompt, choose a logit bias value, and test how different token biases affect the model’s output.
|
|
90
|
+
|
|
91
|
+
Press enter or click to view image in full size
|
|
92
|
+
|
|
93
|
+
### -100 Logit Bias
|
|
94
|
+
|
|
95
|
+
We start with a simple baseline prompt:
|
|
96
|
+
|
|
97
|
+
“Explain quantum mechanics with two sentences.”
|
|
98
|
+
|
|
99
|
+
Without any bias applied, the model generates a neutral, standard explanation based on its learned distribution of scientific terminology.
|
|
100
|
+
|
|
101
|
+
Press enter or click to view image in full size
|
|
102
|
+
|
|
103
|
+
Now we repeat the exact same prompt, but this time we introduce a negative logit bias for the token(s) representing “physics” (for example “physics” or its token ID).
|
|
104
|
+
|
|
105
|
+
Assigning a strong negative bias such as −80 to −100 almost completely suppresses the appearance of that word.
|
|
106
|
+
|
|
107
|
+
Press enter or click to view image in full size
|
|
108
|
+
|
|
109
|
+
After running both prompts, the difference becomes clear:
|
|
110
|
+
|
|
111
|
+
- The negatively biased output avoids using the word physics entirely, even when it logically fits the explanation. The model instead substitutes more generic or indirect descriptions to compensate.
|
|
112
|
+
- The unbiased output may reference physics naturally, since quantum mechanics is part of the field.
|
|
113
|
+
|
|
114
|
+
## Insights From Our Logit Bias Experiments
|
|
115
|
+
|
|
116
|
+
While testing both positive and negative logit bias across different prompts, we observed several practical behaviors that developers should be aware of:
|
|
117
|
+
|
|
118
|
+
### Negative Logit Bias Observations
|
|
119
|
+
|
|
120
|
+
- Logit bias values above –80 tend to be weak. The model may still generate the unwanted token, especially in contexts where it is highly probable.
|
|
121
|
+
- To reliably suppress a word, you must include all of its variations in the logit bias list — such as lowercase/uppercase versions, plural forms, and common morphological variants.
|
|
122
|
+
|
|
123
|
+
### Positive Logit Bias Observations
|
|
124
|
+
|
|
125
|
+
- Positive bias values of +80 or lower typically do not create strong enough pressure. The model fails to consistently insert the target word unless the word already aligns with the prompt.
|
|
126
|
+
- When a high positive value (e.g., +100) is applied to a token unrelated to the prompt, the model often takes significantly longer to produce a response. This happens because the LLM struggles to justify the forced token within the context.
|
|
127
|
+
|
|
128
|
+
## Summary
|
|
129
|
+
|
|
130
|
+
Negative logit bias is far more practical for production environments, especially when filtering unwanted words, profanity, or brand names. However, developers must carefully select and tokenize all relevant variations of the terms they want to block. Positive logit bias, while useful for controlled generation, often introduces latency and instability when forcing unrelated tokens into the output.
|
|
131
|
+
|
|
132
|
+
## References
|
|
133
|
+
|
|
134
|
+
### OpenAI Documentation
|
|
135
|
+
|
|
136
|
+
- https://platform.openai.com/docs/guides/text-generation/tokenization
|
|
137
|
+
- GPT Tokenization Guide
|
|
138
|
+
- https://platform.openai.com/docs/api-reference/chat/create#chat-create-logit_bias
|
|
139
|
+
- OpenAI API Reference — Logit Bias
|
|
140
|
+
|
|
141
|
+
### LLM Tokenization & Probability
|
|
142
|
+
|
|
143
|
+
- https://huggingface.co/docs/tokenizers/index
|
|
144
|
+
- Tokenizers by HuggingFace (general tokenizer behavior)
|
|
145
|
+
- https://github.com/openai/tiktoken
|
|
146
|
+
- OpenAI Tiktoken Library (tokenizer implementation)
|
|
147
|
+
|
|
148
|
+
### Research Papers on LLM Bias
|
|
149
|
+
|
|
150
|
+
- https://arxiv.org/abs/2005.14050
|
|
151
|
+
- Assessing Social and Linguistic Bias in Language Models
|
|
152
|
+
- https://arxiv.org/abs/1906.07337
|
|
153
|
+
- Bias in Language Models: A Comprehensive Review
|
|
154
|
+
- https://arxiv.org/abs/2312.01708
|
|
155
|
+
- A Survey on Bias in Large Language Models (arXiv)
|
|
156
|
+
|
|
157
|
+
LLM
|
|
158
|
+
|
|
159
|
+
NLP
|
|
160
|
+
|
|
161
|
+
AI Agent
|
|
162
|
+
|
|
163
|
+
Bias In Ai
|
|
164
|
+
|
|
165
|
+
Llm Tokens
|
|
166
|
+
|
|
167
|
+
## Written by Serhat ÇİÇEK
|
|
168
|
+
|
|
169
|
+
119 followers
|
|
170
|
+
|
|
171
|
+
85 following
|
|
172
|
+
|
|
173
|
+
Security Researcher & AI Researcher & Software Engineer
|
|
174
|
+
|
|
175
|
+
## No responses yet
|
|
176
|
+
|
|
177
|
+
Help
|
|
178
|
+
|
|
179
|
+
Status
|
|
180
|
+
|
|
181
|
+
About
|
|
182
|
+
|
|
183
|
+
Careers
|
|
184
|
+
|
|
185
|
+
Press
|
|
186
|
+
|
|
187
|
+
Blog
|
|
188
|
+
|
|
189
|
+
Privacy
|
|
190
|
+
|
|
191
|
+
Rules
|
|
192
|
+
|
|
193
|
+
Terms
|
|
194
|
+
|
|
195
|
+
Text to speech
|
|
196
|
+
|
|
197
|
+
# Logit Bias - LLM Parameter Guide - Vellum
|
|
198
|
+
URL: https://vellum.ai/llm-parameters/logit-bias
|
|
199
|
+
|
|
200
|
+
Logit Bias - LLM Parameter Guide - Vellum
|
|
201
|
+
|
|
202
|
+
LLM ParametersLogit Bias
|
|
203
|
+
|
|
204
|
+
← All Parameters
|
|
205
|
+
|
|
206
|
+
# Logit Bias
|
|
207
|
+
|
|
208
|
+
OpenAIAnthropic
|
|
209
|
+
|
|
210
|
+
# What is Logit Bias
|
|
211
|
+
|
|
212
|
+
The logit bias parameter lets you control whether the model is more or less likely to generate a specific word.
|
|
213
|
+
|
|
214
|
+
# How does it work behind the scenes
|
|
215
|
+
|
|
216
|
+
The model is always deciding which word (or tokens) to pick next. All these tokens have their own IDs, and using logit bias we can forbid the model to use some of these IDs.
|
|
217
|
+
|
|
218
|
+
But how can we actually find these IDs?
|
|
219
|
+
|
|
220
|
+
The simplest way is to use OpenAI’s tokenizer tool . Just type in your words, toggle the “Text-Token ID” option at the bottom, and you’ll get the IDs for your words. In some cases you’ll get more tokens for one word.
|
|
221
|
+
|
|
222
|
+
It’s important to note here that different models may produce different tokens for the same input, so you should always check with the model provider to learn about their tokenization process.
|
|
223
|
+
|
|
224
|
+
There are couple of important things to note here:
|
|
225
|
+
|
|
226
|
+
You can use OpenAI’s tokenizer tool to find out the tokens for GPT-3.5 and GPT-4 models, but there is still no data for GPT-4o and GPT-4o mini. One word can have two tokens. Characters before a word (e.g. a space or underscore) can produce different tokens for the same word. Capitalization and no-capitalization versions of same word might result in different tokens.
|
|
227
|
+
|
|
228
|
+
# How to set this parameter correctly
|
|
229
|
+
|
|
230
|
+
In the API, this parameter accepts a JSON object that maps token IDs to bias value. This bias value can vary between -100 to 100. The parameter takes tokens, not text, so you’d use the tokenizer we mentioned above to take the token ids for the words that you’d want to “bias”.
|
|
231
|
+
|
|
232
|
+
# How to experiment with Logit Bias
|
|
233
|
+
|
|
234
|
+
The closer the value is to -100, the more likely that token will be blocked from being generated. The closer it is to 100, the more the model is encouraged to use that token.
|
|
235
|
+
|
|
236
|
+
To test this parameter, try adjusting the values gradually and analyze the impact. Using small values like 1 or -1 won’t make much difference, but values like 5 or -5 can have a much stronger effect.
|
|
237
|
+
|
|
238
|
+
# When to use Logit Bias
|
|
239
|
+
|
|
240
|
+
Use logit bias when know specifically which words you want to ban or encourage repetitive use.
|
|
241
|
+
|
|
242
|
+
### Example 1: Ban offensive words
|
|
243
|
+
|
|
244
|
+
One example where you’d want to ban some words (tokens) from appearing in the results is for moderation purposes.
|
|
245
|
+
|
|
246
|
+
Suppose you’re building a guardrail that will capture offensive content in your chatbot. Now, you may want to ban words like “stupid”. The word “stupid” tokenizes to two IDs [267, 16263] , and the same word with a space before “ stupid” tokenizes to another ID [18754] . To ban them from appearing in the results we can add the logit bias like so:
|
|
247
|
+
|
|
248
|
+
Example 2: Encourage neutral answers in a chatbot If you’re using a customer support chatbot, you’ll likely want it to maintain a calm, neutral tone. To help with that, you can encourage the model to use more neutral words like “understand,” “assist,” and “resolve”. To make the model output “understand,” you need to map it to two token IDs and add a bias of 5:
|
|
249
|
+
|
|
250
|
+
|
|
251
|
+
|
|
252
|
+
# Constrained Decoding and Structured Output for Agent Reliability - Engineering Notes
|
|
253
|
+
URL: https://notes.muthu.co/2025/11/constrained-decoding-and-structured-output-for-agent-reliability/
|
|
254
|
+
|
|
255
|
+
Constrained Decoding and Structured Output for Agent Reliability - Engineering Notes
|
|
256
|
+
|
|
257
|
+
# Engineering Notes
|
|
258
|
+
|
|
259
|
+
Thoughts and Ideas on AI by Muthukrishnan
|
|
260
|
+
|
|
261
|
+
# Constrained Decoding and Structured Output for Agent Reliability
|
|
262
|
+
|
|
263
|
+
When building production AI agents, one of the most persistent problems is unpredictable output formats. An agent needs to call a tool with precise JSON parameters, but the LLM wraps the output in markdown code blocks, adds explanatory text, or hallucinate invalid field names. This breaks the entire agent pipeline.
|
|
264
|
+
|
|
265
|
+
Constrained decoding solves this by restricting what tokens an LLM can generate, ensuring outputs always conform to specified formats like JSON schemas, regular expressions, or context-free grammars. It’s the difference between hoping your agent produces valid JSON and guaranteeing it.
|
|
266
|
+
|
|
267
|
+
## Concept Introduction
|
|
268
|
+
|
|
269
|
+
During text generation, an LLM samples tokens from a probability distribution at each step. Constrained decoding modifies this process by masking invalid tokens, setting their probability to zero before sampling. The result: tool calls always have correct parameter names and types, database queries never contain invalid SQL syntax, API requests match OpenAPI specifications, and decision outputs are always parseable by downstream systems.
|
|
270
|
+
|
|
271
|
+
```
|
|
272
|
+
Standard Decoding:
|
|
273
|
+
P(next_token | context) → Sample from all vocabulary
|
|
274
|
+
|
|
275
|
+
Constrained Decoding:
|
|
276
|
+
P(next_token | context, grammar) → Sample only from valid tokens
|
|
277
|
+
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
The constraint can be:
|
|
281
|
+
|
|
282
|
+
- A JSON schema (only generate valid JSON matching the schema)
|
|
283
|
+
- A regular expression (output must match the regex)
|
|
284
|
+
- A context-free grammar (follow specific syntax rules)
|
|
285
|
+
- A finite-state machine (transition through defined states)
|
|
286
|
+
|
|
287
|
+
Modern implementations use:
|
|
288
|
+
|
|
289
|
+
- Token masking at inference time
|
|
290
|
+
- Incremental parsing to track valid next tokens
|
|
291
|
+
- Beam search with grammar-aware scoring
|
|
292
|
+
- Logit bias to steer generation probabilistically
|
|
293
|
+
|
|
294
|
+
## Historical & Theoretical Context
|
|
295
|
+
|
|
296
|
+
The concept emerged from multiple research threads:
|
|
297
|
+
|
|
298
|
+
1. Semantic Parsing (1990s–2000s) Early NLP systems used grammar-based parsers to convert natural language to formal representations (SQL, logic). These were rigid but guaranteed valid output.
|
|
299
|
+
|
|
300
|
+
2. Constrained Generation in NLG (2010s) Neural text generation models began incorporating hard constraints:
|
|
301
|
+
|
|
302
|
+
- Hokamp & Liu (2017): Grid Beam Search for lexically constrained generation
|
|
303
|
+
- Forcing specific phrases to appear in translations or summaries
|
|
304
|
+
|
|
305
|
+
3. Structured Prediction (2015–2020) Seq2seq models for code generation, semantic parsing, and structured data extraction needed format guarantees. Early solutions used post-processing and re-ranking.
|
|
306
|
+
|
|
307
|
+
4. LLM Function Calling Era (2020–present) As LLMs became agents with tool use, reliable structured output became critical:
|
|
308
|
+
|
|
309
|
+
- OpenAI Function Calling (2023): Proprietary constrained decoding for JSON tool calls
|
|
310
|
+
- Guidance (2023): Microsoft’s grammar-based generation library
|
|
311
|
+
- Outlines (2023): Fast regex and JSON schema constraints using FSMs
|
|
312
|
+
- LM Format Enforcer (2023): Token masking for various formats
|
|
313
|
+
|
|
314
|
+
### Theoretical Foundation
|
|
315
|
+
|
|
316
|
+
Constrained decoding connects to:
|
|
317
|
+
|
|
318
|
+
- Formal Language Theory: Using automata to define valid sequences
|
|
319
|
+
- Parsing Theory: Incremental parsing to determine next valid tokens
|
|
320
|
+
- Probabilistic Inference: Conditioning probability distributions on constraints
|
|
321
|
+
- Program Synthesis: Generating code that compiles/type-checks
|
|
322
|
+
|
|
323
|
+
## Algorithms & Math
|
|
324
|
+
|
|
325
|
+
### Core Algorithm: FSM-Guided Token Masking
|
|
326
|
+
|
|
327
|
+
The most efficient modern approach uses finite-state machines:
|
|
328
|
+
|
|
329
|
+
Pseudocode:
|
|
330
|
+
|
|
331
|
+
```
|
|
332
|
+
def constrained_decode(prompt, schema, max_tokens):
|
|
333
|
+
# Convert schema to FSM
|
|
334
|
+
fsm = schema_to_fsm(schema)
|
|
335
|
+
state = fsm.initial_state
|
|
336
|
+
tokens = []
|
|
337
|
+
|
|
338
|
+
for _ in range(max_tokens):
|
|
339
|
+
# Get next token logits from LLM
|
|
340
|
+
logits = llm.forward(prompt + tokens)
|
|
341
|
+
|
|
342
|
+
# Mask invalid tokens based on current FSM state
|
|
343
|
+
valid_tokens = fsm.get_valid_tokens(state)
|
|
344
|
+
masked_logits = mask_logits(logits, valid_tokens)
|
|
345
|
+
|
|
346
|
+
# Sample next token
|
|
347
|
+
next_token = sample(masked_logits)
|
|
348
|
+
tokens.append(next_token)
|
|
349
|
+
|
|
350
|
+
# Update FSM state
|
|
351
|
+
state = fsm.transition(state, next_token)
|
|
352
|
+
|
|
353
|
+
# Check if reached accept state
|
|
354
|
+
if fsm.is_terminal(state):
|
|
355
|
+
break
|
|
356
|
+
|
|
357
|
+
return tokens
|
|
358
|
+
|
|
359
|
+
```
|
|
360
|
+
|
|
361
|
+
Mathematical Formulation:
|
|
362
|
+
|
|
363
|
+
Let $\mathcal{G}$ be a grammar defining valid outputs, and $\mathcal{L}(\mathcal{G})$ the language it accepts.
|
|
364
|
+
|
|
365
|
+
Standard decoding samples:
|
|
366
|
+
|
|
367
|
+
$$ t_i \sim \text{softmax}(\mathbf{z}_i) $$
|
|
368
|
+
|
|
369
|
+
Constrained decoding samples:
|
|
370
|
+
|
|
371
|
+
$$ t_i \sim \text{softmax}(\mathbf{z}_i + \mathbf{m}_i) $$
|
|
372
|
+
|
|
373
|
+
where the mask $\mathbf{m}_i$ is:
|
|
374
|
+
|
|
375
|
+
$$ m_i^{(j)} = \begin{cases} 0 & \text{if } t_1, \ldots, t_{i-1}, j \in \mathcal{L}(\mathcal{G}) \\ -\infty & \text{otherwise} \end{cases} $$
|
|
376
|
+
|
|
377
|
+
### JSON Schema to FSM Conversion
|
|
378
|
+
|
|
379
|
+
Converting a JSON schema to an FSM involves:
|
|
380
|
+
|
|
381
|
+
1. Tokenize the schema structure:`{`,`"field"`,`:`,`[`, numbers, strings, etc.
|
|
382
|
+
2. Build states for each schema element: object start, field names, value types
|
|
383
|
+
3. Define transitions: Valid next tokens from each state
|
|
384
|
+
4. Handle recursion: For nested objects/arrays
|
|
385
|
+
|
|
386
|
+
Example:
|
|
387
|
+
|
|
388
|
+
```
|
|
389
|
+
{
|
|
390
|
+
"type": "object",
|
|
391
|
+
"properties": {
|
|
392
|
+
"action": {"type": "string", "enum": ["search", "calculate"]},
|
|
393
|
+
"value": {"type": "number"}
|
|
394
|
+
},
|
|
395
|
+
"required": ["action", "value"]
|
|
396
|
+
}
|
|
397
|
+
|
|
398
|
+
```
|
|
399
|
+
|
|
400
|
+
FSM states:
|
|
401
|
+
|
|
402
|
+
```
|
|
403
|
+
START → "{" → "action" → ":" → ("search"|"calculate") → "," → "value" → ":" → NUMBER → "}" → END
|
|
404
|
+
|
|
405
|
+
```
|
|
406
|
+
|
|
407
|
+
At each state, only specific tokens are valid (e.g., after`"action":`, only`"search"` or`"calculate"`).
|
|
408
|
+
|
|
409
|
+
## Design Patterns & Architectures
|
|
410
|
+
|
|
411
|
+
Schema-First Agent Design
|
|
412
|
+
|
|
413
|
+
Define schemas before implementing agents:
|
|
414
|
+
|
|
415
|
+
```
|
|
416
|
+
from pydantic import BaseModel
|
|
417
|
+
|
|
418
|
+
class SearchTool(BaseModel):
|
|
419
|
+
query: str
|
|
420
|
+
max_results: int = 10
|
|
421
|
+
filters: dict[str, str] = {}
|
|
422
|
+
|
|
423
|
+
class CalculateTool(BaseModel):
|
|
424
|
+
expression: str
|
|
425
|
+
precision: int = 2
|
|
426
|
+
|
|
427
|
+
# Agent now MUST output one of these
|
|
428
|
+
|
|
429
|
+
```
|
|
430
|
+
|
|
431
|
+
Layered Validation
|
|
432
|
+
|
|
433
|
+
Combine multiple constraint layers:
|
|
434
|
+
|
|
435
|
+
1. Token-level: Constrained decoding ensures valid syntax
|
|
436
|
+
2. Type-level: Schema validation checks types
|
|
437
|
+
3. Semantic-level: Business logic validates values
|
|
438
|
+
|
|
439
|
+
```
|
|
440
|
+
# Layer 1: Constrained decoding produces valid JSON
|
|
441
|
+
output = constrained_generate(prompt, json_schema)
|
|
442
|
+
|
|
443
|
+
# Layer 2: Validate against Pydantic model
|
|
444
|
+
tool_call = SearchTool.parse_raw(output)
|
|
445
|
+
|
|
446
|
+
# Layer 3: Business logic
|
|
447
|
+
if tool_call.max_results > 100:
|
|
448
|
+
raise ValueError("max_results too high")
|
|
449
|
+
|
|
450
|
+
```
|
|
451
|
+
|
|
452
|
+
Progressive Refinement
|
|
453
|
+
|
|
454
|
+
For complex outputs, chain constrained generations:
|
|
455
|
+
|
|
456
|
+
```
|
|
457
|
+
graph LR
|
|
458
|
+
A[User Query] --> B[Generate Tool Choice<br/>Constrained: tool names only]
|
|
459
|
+
B --> C[Generate Parameters<br/>Constrained: specific schema]
|
|
460
|
+
C --> D[Execute Tool]
|
|
461
|
+
D --> E[Generate Response<br/>Constrained: response format]
|
|
462
|
+
|
|
463
|
+
```
|
|
464
|
+
|
|
465
|
+
This reduces error accumulation compared to generating everything at once.
|
|
466
|
+
|
|
467
|
+
### Integration with Agent Architectures
|
|
468
|
+
|
|
469
|
+
Planner-Executor-Memory Loop:
|
|
470
|
+
|
|
471
|
+
```
|
|
472
|
+
class ConstrainedAgent:
|
|
473
|
+
def plan(self, goal: str) -> Plan:
|
|
474
|
+
# Constrained to Plan schema
|
|
475
|
+
return constrained_generate(
|
|
476
|
+
f"Create plan for: {goal}",
|
|
477
|
+
schema=Plan.schema()
|
|
478
|
+
)
|
|
479
|
+
|
|
480
|
+
def execute(self, step: PlanStep) -> ActionResult:
|
|
481
|
+
# Constrained to ActionResult schema
|
|
482
|
+
return constrained_generate(
|
|
483
|
+
f"Execute: {step.description}",
|
|
484
|
+
schema=ActionResult.schema()
|
|
485
|
+
)
|
|
486
|
+
|
|
487
|
+
```
|
|
488
|
+
|
|
489
|
+
ReAct Loop:
|
|
490
|
+
|
|
491
|
+
```
|
|
492
|
+
def react_step(observation: str) -> ThoughtActionObservation:
|
|
493
|
+
# Force exactly: Thought: <text>\nAction: <json>\n
|
|
494
|
+
return constrained_generate(
|
|
495
|
+
prompt=observation,
|
|
496
|
+
grammar=react_grammar # CFG for ReAct format
|
|
497
|
+
)
|
|
498
|
+
|
|
499
|
+
```
|
|
500
|
+
|
|
501
|
+
## Practical Application
|
|
502
|
+
|
|
503
|
+
### Small Coding Example: Weather Agent with Outlines
|
|
504
|
+
|
|
505
|
+
```
|
|
506
|
+
from outlines import models, generate
|
|
507
|
+
from pydantic import BaseModel
|
|
508
|
+
|
|
509
|
+
# Define tool schema
|
|
510
|
+
class WeatherQuery(BaseModel):
|
|
511
|
+
location: str
|
|
512
|
+
unit: str # "celsius" or "fahrenheit"
|
|
513
|
+
|
|
514
|
+
# Load model
|
|
515
|
+
model = models.transformers("mistralai/Mistral-7B-Instruct-v0.2")
|
|
516
|
+
|
|
517
|
+
# Create constrained generator
|
|
518
|
+
generator = generate.json(model, WeatherQuery)
|
|
519
|
+
|
|
520
|
+
# Generate - GUARANTEED to be valid WeatherQuery
|
|
521
|
+
prompt = """User: What's the weather in Tokyo?
|
|
522
|
+
Assistant: I'll check the weather. Tool call:
|
|
523
|
+
"""
|
|
524
|
+
|
|
525
|
+
result = generator(prompt)
|
|
526
|
+
# result = WeatherQuery(location="Tokyo", unit="celsius")
|
|
527
|
+
|
|
528
|
+
print(f"Location: {result.location}")
|
|
529
|
+
print(f"Unit: {result.unit}")
|
|
530
|
+
|
|
531
|
+
```
|
|
532
|
+
|
|
533
|
+
### Integration with LangGraph
|
|
534
|
+
|
|
535
|
+
```
|
|
536
|
+
from langgraph.graph import StateGraph, END
|
|
537
|
+
from outlines import models, generate
|
|
538
|
+
from pydantic import BaseModel
|
|
539
|
+
|
|
540
|
+
class AgentState(BaseModel):
|
|
541
|
+
messages: list[str]
|
|
542
|
+
next_action: str | None = None
|
|
543
|
+
|
|
544
|
+
class ActionSchema(BaseModel):
|
|
545
|
+
action: str # "search" | "calculate" | "finish"
|
|
546
|
+
parameters: dict
|
|
547
|
+
|
|
548
|
+
# Constrained action generator
|
|
549
|
+
model = models.transformers("meta-llama/Llama-3-8B-Instruct")
|
|
550
|
+
action_generator = generate.json(model, ActionSchema)
|
|
551
|
+
|
|
552
|
+
def decide_action(state: AgentState) -> AgentState:
|
|
553
|
+
prompt = format_prompt(state.messages)
|
|
554
|
+
action = action_generator(prompt) # Always valid ActionSchema
|
|
555
|
+
state.next_action = action.action
|
|
556
|
+
return state
|
|
557
|
+
|
|
558
|
+
def execute_action(state: AgentState) -> AgentState:
|
|
559
|
+
# Execute the action
|
|
560
|
+
result = execute(state.next_action)
|
|
561
|
+
state.messages.append(result)
|
|
562
|
+
return state
|
|
563
|
+
|
|
564
|
+
# Build graph
|
|
565
|
+
workflow = StateGraph(AgentState)
|
|
566
|
+
workflow.add_node("decide", decide_action)
|
|
567
|
+
workflow.add_node("execute", execute_action)
|
|
568
|
+
workflow.add_conditional_edges("decide",
|
|
569
|
+
lambda s: END if s.next_action == "finish" else "execute")
|
|
570
|
+
workflow.set_entry_point("decide")
|
|
571
|
+
|
|
572
|
+
app = workflow.compile()
|
|
573
|
+
|
|
574
|
+
```
|
|
575
|
+
|
|
576
|
+
### OpenAI Structured Outputs
|
|
577
|
+
|
|
578
|
+
OpenAI provides native support:
|
|
579
|
+
|
|
580
|
+
```
|
|
581
|
+
from openai import OpenAI
|
|
582
|
+
from pydantic import BaseModel
|
|
583
|
+
|
|
584
|
+
class ResearchResult(BaseModel):
|
|
585
|
+
summary: str
|
|
586
|
+
key_findings: list[str]
|
|
587
|
+
confidence_score: float
|
|
588
|
+
|
|
589
|
+
client = OpenAI()
|
|
590
|
+
|
|
591
|
+
completion = client.chat.completions.create(
|
|
592
|
+
model="gpt-4o-2024-08-06",
|
|
593
|
+
messages=[
|
|
594
|
+
{"role": "user", "content": "Analyze this research paper: ..."}
|
|
595
|
+
],
|
|
596
|
+
response_format=ResearchResult # Structured output mode
|
|
597
|
+
)
|
|
598
|
+
|
|
599
|
+
result = ResearchResult.model_validate_json(completion.choices[0].message.content)
|
|
600
|
+
# Guaranteed to be valid ResearchResult
|
|
601
|
+
|
|
602
|
+
```
|
|
603
|
+
|
|
604
|
+
## Latest Developments & Research
|
|
605
|
+
|
|
606
|
+
### Recent Breakthroughs (2023-2025)
|
|
607
|
+
|
|
608
|
+
1. Efficient FSM Construction
|
|
609
|
+
|
|
610
|
+
“Faster Constrained Decoding for Open-Domain Generation” (2024)
|
|
611
|
+
|
|
612
|
+
- Converts JSON schemas to minimal FSMs in milliseconds
|
|
613
|
+
- Reduces overhead to <5% latency increase
|
|
614
|
+
- Open-sourced in Outlines 0.1+ library
|
|
615
|
+
|
|
616
|
+
2. Grammar-Based Prompting
|
|
617
|
+
|
|
618
|
+
“Guidance: A Faster, More Efficient Programming Paradigm for Constrained Generation” (Microsoft, 2023)
|
|
619
|
+
|
|
620
|
+
- Interleaves constraints with generation
|
|
621
|
+
- Allows complex patterns like “generate code that compiles”
|
|
622
|
+
- Used in Microsoft Copilot
|
|
623
|
+
|
|
624
|
+
3. Type-Aware Constrained Decoding
|
|
625
|
+
|
|
626
|
+
“TypeT5: Seq2seq Type Inference using Static Analysis” (2024)
|
|
627
|
+
|
|
628
|
+
- Combines static type checking with constrained decoding
|
|
629
|
+
- Ensures generated code is type-safe
|
|
630
|
+
- 99.7% type correctness on HumanEval
|
|
631
|
+
|
|
632
|
+
4. Semantic Constraints
|
|
633
|
+
|
|
634
|
+
“NeuroLogic Aesque Decoding”* (2024)
|
|
635
|
+
|
|
636
|
+
- Combines probabilistic constraints (logit bias) with hard constraints
|
|
637
|
+
- Balances fluency and correctness
|
|
638
|
+
- Used for constrained dialogue generation
|
|
639
|
+
|
|
640
|
+
### Benchmarks
|
|
641
|
+
|
|
642
|
+
JSON Schema Compliance (SchemaBench, 2024):
|
|
643
|
+
|
|
644
|
+
- Prompt engineering: 82% valid
|
|
645
|
+
- Post-processing: 89% valid
|
|
646
|
+
- Constrained decoding: 99.8% valid
|
|
647
|
+
|
|
648
|
+
Tool Calling Reliability (FunctionHub, 2024):
|
|
649
|
+
|
|
650
|
+
- Standard generation: 76% executable calls
|
|
651
|
+
- OpenAI function calling: 94% executable
|
|
652
|
+
- Outlines JSON mode: 99.2% executable
|
|
653
|
+
|
|
654
|
+
### Open Problems
|
|
655
|
+
|
|
656
|
+
1. Multi-modal constraints: Extending to image/audio generation
|
|
657
|
+
2. Soft constraints: Probabilistic preferences vs. hard rules
|
|
658
|
+
3. Constraint learning: Inferring schemas from examples
|
|
659
|
+
4. Distributed decoding: Constraints across multi-agent systems
|
|
660
|
+
5. Constraint debugging: Tools to visualize why constraints fail
|
|
661
|
+
|
|
662
|
+
### Ongoing Research
|
|
663
|
+
|
|
664
|
+
- Adaptive masking: Learning which tokens to mask based on context
|
|
665
|
+
- Constraint synthesis: Automatically generating schemas from documentation
|
|
666
|
+
- Probabilistic grammars: Weighted FSMs for soft guidance
|
|
667
|
+
- Cross-lingual constraints: Applying constraints to multilingual models
|
|
668
|
+
|
|
669
|
+
## Cross-Disciplinary Insight
|
|
670
|
+
|
|
671
|
+
### Connections to Compiler Theory
|
|
672
|
+
|
|
673
|
+
Constrained decoding is essentially parsing in reverse:
|
|
674
|
+
|
|
675
|
+
- Parser: Valid string → Abstract syntax tree
|
|
676
|
+
- Constrained decoder: AST (schema) → Valid string
|
|
677
|
+
|
|
678
|
+
Modern compilers use LR parsers that incrementally determine valid next tokens, which is exactly what constrained decoders do. The FSM used is analogous to a parse table in compiler design.
|
|
679
|
+
|
|
680
|
+
### Links to Control Theory
|
|
681
|
+
|
|
682
|
+
The constraint can be seen as a controller in a feedback loop:
|
|
683
|
+
|
|
684
|
+
```
|
|
685
|
+
graph LR
|
|
686
|
+
A[LLM<br/>Plant] -->|Token probabilities| B[Constraint<br/>Controller]
|
|
687
|
+
B -->|Masked probabilities| C[Sampler]
|
|
688
|
+
C -->|Next token| A
|
|
689
|
+
B -.->|Desired output format| B
|
|
690
|
+
|
|
691
|
+
```
|
|
692
|
+
|
|
693
|
+
This mirrors model predictive control where future states are constrained to a safe/desired region.
|
|
694
|
+
|
|
695
|
+
### Cognitive Science Parallel
|
|
696
|
+
|
|
697
|
+
Human language production involves monitoring: we catch ourselves mid-sentence if about to say something incorrect. Constrained decoding is an artificial form of this executive control, filtering invalid “thoughts” before they’re expressed.
|
|
698
|
+
|
|
699
|
+
## Daily Challenge: Build a Constrained SQL Generator
|
|
700
|
+
|
|
701
|
+
Goal: Create an AI agent that generates valid SQL queries using constrained decoding.
|
|
702
|
+
|
|
703
|
+
Requirements:
|
|
704
|
+
|
|
705
|
+
1. Define a simple SQL grammar (SELECT, FROM, WHERE with basic conditions)
|
|
706
|
+
2. Implement constrained decoding to ensure syntactic validity
|
|
707
|
+
3. Test on natural language queries
|
|
708
|
+
|
|
709
|
+
Starter Code:
|
|
710
|
+
|
|
711
|
+
```
|
|
712
|
+
from outlines import models, generate
|
|
713
|
+
|
|
714
|
+
# Define SQL grammar (simplified)
|
|
715
|
+
sql_grammar = r"""
|
|
716
|
+
start: select_stmt
|
|
717
|
+
select_stmt: "SELECT" columns "FROM" table where_clause?
|
|
718
|
+
columns: COLUMN ("," COLUMN)*
|
|
719
|
+
table: WORD
|
|
720
|
+
where_clause: "WHERE" condition
|
|
721
|
+
condition: COLUMN OPERATOR VALUE
|
|
722
|
+
|
|
723
|
+
COLUMN: /[a-z_]+/
|
|
724
|
+
OPERATOR: "=" | ">" | "<" | "!="
|
|
725
|
+
VALUE: /"[^"]*"/ | /[0-9]+/
|
|
726
|
+
WORD: /[a-z_]+/
|
|
727
|
+
|
|
728
|
+
%import common.WS
|
|
729
|
+
%ignore WS
|
|
730
|
+
"""
|
|
731
|
+
|
|
732
|
+
model = models.transformers("your-model")
|
|
733
|
+
sql_generator = generate.cfg(model, sql_grammar)
|
|
734
|
+
|
|
735
|
+
# Test queries
|
|
736
|
+
queries = [
|
|
737
|
+
"Find all users with age greater than 30",
|
|
738
|
+
"Get product names where price is less than 100",
|
|
739
|
+
]
|
|
740
|
+
|
|
741
|
+
for query in queries:
|
|
742
|
+
prompt = f"Natural language: {query}\nSQL: "
|
|
743
|
+
sql = sql_generator(prompt)
|
|
744
|
+
print(f"Query: {query}")
|
|
745
|
+
print(f"SQL: {sql}\n")
|
|
746
|
+
|
|
747
|
+
```
|
|
748
|
+
|
|
749
|
+
Extension Challenges:
|
|
750
|
+
|
|
751
|
+
1. Add support for JOIN operations
|
|
752
|
+
2. Validate against actual table schemas
|
|
753
|
+
3. Measure how often unconstrained models produce invalid SQL
|
|
754
|
+
4. Compare generation time with/without constraints
|
|
755
|
+
|
|
756
|
+
Time estimate: 20-30 minutes
|
|
757
|
+
|
|
758
|
+
## References & Further Reading
|
|
759
|
+
|
|
760
|
+
### Key Papers
|
|
761
|
+
|
|
762
|
+
“Guidance: A Faster Programming Paradigm for Constrained LLM Generation” Microsoft Research, 2023 https://github.com/microsoft/guidance
|
|
763
|
+
|
|
764
|
+
“Outlines: Fast and Flexible Structured Generation” Normal Computing, 2023 https://github.com/outlines-dev/outlines Paper: https://arxiv.org/abs/2307.09702
|
|
765
|
+
|
|
766
|
+
“Grammar-Constrained Decoding for Structured NLP Tasks” Shin et al., EMNLP 2021 https://arxiv.org/abs/2106.08462
|
|
767
|
+
|
|
768
|
+
“Constrained Decoding for Neural NLG from Compositional Representations” Balakrishnan et al., ACL 2019 https://arxiv.org/abs/1906.07220
|
|
769
|
+
|
|
770
|
+
“A Guided Constrained Decoding for Faithful Text Generation”* Lu et al., NeurIPS 2022 https://arxiv.org/abs/2210.05097
|
|
771
|
+
|
|
772
|
+
### Tools & Libraries
|
|
773
|
+
|
|
774
|
+
- Outlines (Python): https://github.com/outlines-dev/outlines
|
|
775
|
+
- Guidance (Python): https://github.com/microsoft/guidance
|
|
776
|
+
- LM Format Enforcer (Python): https://github.com/noamgat/lm-format-enforcer
|
|
777
|
+
- LMQL (Query language): https://lmql.ai/
|
|
778
|
+
- OpenAI Structured Outputs: https://platform.openai.com/docs/guides/structured-outputs
|
|
779
|
+
|
|
780
|
+
### Blog Posts & Tutorials
|
|
781
|
+
|
|
782
|
+
“Structured Generation with Outlines” https://outlines-dev.github.io/outlines/welcome/
|
|
783
|
+
|
|
784
|
+
“How OpenAI’s Structured Outputs Work” https://cookbook.openai.com/examples/structured_outputs_intro
|
|
785
|
+
|
|
786
|
+
“Building Reliable Agents with Constrained Decoding” LangChain blog, 2024 https://blog.langchain.dev/constrained-decoding/
|
|
787
|
+
|
|
788
|
+
### Frameworks Supporting Constrained Outputs
|
|
789
|
+
|
|
790
|
+
- LangGraph: Via custom parsers + retry logic
|
|
791
|
+
- CrewAI: Via Pydantic models + validation
|
|
792
|
+
- AutoGen: Via response format specifications
|
|
793
|
+
- LlamaIndex: Via output parsers + constrained generation
|
|
794
|
+
|
|
795
|
+
### Advanced Topics
|
|
796
|
+
|
|
797
|
+
- Incremental Parsing: Earley parsers for CFGs
|
|
798
|
+
- Efficient FSM Minimization: Hopcroft’s algorithm
|
|
799
|
+
- Probabilistic Context-Free Grammars: Soft constraints
|
|
800
|
+
- Constraint Propagation: SAT solvers for complex constraints
|
|
801
|
+
|
|
802
|
+
---
|
|
803
|
+
|
|
804
|
+
Next Steps:
|
|
805
|
+
|
|
806
|
+
1. Complete the daily challenge to internalize the concepts
|
|
807
|
+
2. Experiment with Outlines or Guidance on your own prompts
|
|
808
|
+
3. Profile the latency impact of constraints on your use case
|
|
809
|
+
4. Design schemas for your agent’s tool calls
|
|
810
|
+
5. Read the Outlines paper for implementation details
|
|
811
|
+
|
|
812
|
+
Constrained decoding transforms AI agents from unpredictable text generators into reliable system components.
|
|
813
|
+
|
|
814
|
+
---
|
|
815
|
+
|
|
816
|
+
# How to build function calling and JSON mode for open-source and fine-tuned LLMs
|
|
817
|
+
URL: https://baseten.co/blog/how-to-build-function-calling-and-json-mode-for-open-source-and-fine-tuned-llms
|
|
818
|
+
|
|
819
|
+
How to build function calling and JSON mode for open-source and fine-tuned LLMs
|
|
820
|
+
|
|
821
|
+
"Inference Engineering" is now available. Get your copy here
|
|
822
|
+
|
|
823
|
+
Model performance
|
|
824
|
+
|
|
825
|
+
# How to build function calling and JSON mode for open-source and fine-tuned LLMs
|
|
826
|
+
|
|
827
|
+
Use a state machine to generate token masks for logit biasing to enable function calling and structured output at the model server level.
|
|
828
|
+
|
|
829
|
+
### Authors
|
|
830
|
+
|
|
831
|
+
Bryce Dubayah
|
|
832
|
+
|
|
833
|
+
Philip Kiely
|
|
834
|
+
|
|
835
|
+
### Last updated
|
|
836
|
+
|
|
837
|
+
May 16, 2025
|
|
838
|
+
|
|
839
|
+
### Share
|
|
840
|
+
|
|
841
|
+
Today, we announced support for function calling and structured output for LLMs deployed with our TensorRT-LLM Engine Builder. This adds support at the model server level for two key features:
|
|
842
|
+
|
|
843
|
+
Function calling: also known as “tool use,” this feature lets you pass a set of defined tools to a LLM as part of the request body. Based on the prompt, the model selects and returns the most appropriate function/tool from the provided options.
|
|
844
|
+
|
|
845
|
+
Structured output: an evolution of “JSON mode,” this feature enforces an output schema defined as part of the LLM input. The LLM output is guaranteed to adhere to the provided schema, with full Pydantic support.
|
|
846
|
+
|
|
847
|
+
To introduce these features, we build new capabilities into our customized version of NVIDIA’s Triton inference server. This engineering deep dive explains how the implementation works under the hood: defining schemas and tools, building a state machine, and using logit biasing to force valid output.
|
|
848
|
+
|
|
849
|
+
And the best part? Thanks to pre-computed token masks, there’s minimal latency impact from using either feature after the first call with a given schema is completed. You can expect the same tokens per second when generating JSON as when generating ordinary text.
|
|
850
|
+
|
|
851
|
+
If you’re looking to get started quickly with these new features, check out our launch announcement and docs for function calling and structured output. For implementation details, keep reading!
|
|
852
|
+
|
|
853
|
+
## How structured output is generated
|
|
854
|
+
|
|
855
|
+
To understand how it’s possible to guarantee structured output, we need to dive into the details of how a token is generated during LLM inference. If you’re familiar with LLM inference, you’ll know that a new token is generated on each forward pass through the model. During that forward pass:
|
|
856
|
+
|
|
857
|
+
A vector of logits is outputted from the final layer of the LLM’s neural network.
|
|
858
|
+
|
|
859
|
+
A normalization function like softmax is applied to turn the logits into probabilities.
|
|
860
|
+
|
|
861
|
+
Using these probabilities, a token is selected. Depending on settings like`top_p`,`top_k`,`beam_width`, and`temperature`, this may not always be the highest-probability token.
|
|
862
|
+
|
|
863
|
+
Structured output uses logit biasing in the first step to guarantee valid tokens are generated.
|
|
864
|
+
|
|
865
|
+
### Logit biasing ensures token validity
|
|
866
|
+
|
|
867
|
+
The length of the logit vector outputted in the first step is equal to the number of tokens in the model’s vocabulary. For example, Llama 3 LLMs have a vocabulary of ~128,000 tokens. Thus, the logit vector will have about 128K values. Each logit in the vector is a score representing how much the LLM thinks that the given token from the vocabulary could be the next token in the output sequence.
|
|
868
|
+
|
|
869
|
+
For structured output, we only want to generate valid tokens. For example, an array in JSON must have both an opening and closing bracket:`[1, 2, 3]`. If we already have generated`[1, 2, 3` then the valid options are:
|
|
870
|
+
|
|
871
|
+
A comma, space, and another value such as four: ,`4`.
|
|
872
|
+
|
|
873
|
+
A closing bracket to end the array:`]`.
|
|
874
|
+
|
|
875
|
+
From the model’s vocabulary, most of the possible tokens will not be valid at certain points when generating structured output. Logit biasing guarantees valid output structure by identifying every invalid token and setting its score to negative infinity, ensuring that the invalid tokens cannot be generated.
|
|
876
|
+
|
|
877
|
+
✕
|
|
878
|
+
|
|
879
|
+
This discussion of logit biasing raises a natural question: how do we know where we are in the output schema and which tokens are valid?
|
|
880
|
+
|
|
881
|
+
### State machine provides token requirements
|
|
882
|
+
|
|
883
|
+
The model server running beneath the inference process is responsible for tracking output format using a state machine. This model server is a modified version of NVIDIA Triton with extra capabilities that we call “Briton” (Baseten + Triton = Briton).
|
|
884
|
+
|
|
885
|
+
Using an industry standard library Outlines, which also powers VLLM, the Briton model server takes the schema passed as model output, transforms it into a regular expression, then generates a state machine from that regex. We chose Outlines for its robust feature set and reliability.
|
|
886
|
+
|
|
887
|
+
However, Outlines is written in Python, while TensorRT-LLM and Triton run in C++ for speed and efficiency. To handle this, we first generate the state machine in Python, then serialize it to Protocol Buffers and load it into the model server.
|
|
888
|
+
|
|
889
|
+
Once loaded into the model server, the state machine makes the logit biasing process incredibly efficient. The state machine is cached in memory, and an appropriate token mask – a list of 1s and 0s corresponding to valid and invalid tokens – is created for each node of the state machine for logit biasing. This means that these calculations aren’t made during inference time, rather, existing masks are applied based on which state is active.
|
|
890
|
+
|
|
891
|
+
With no token mask calculations happening during token generation, this approach to logit biasing has a negligible effect on model performance, so you’ll get the same high tokens per second that you’re used to from TensorRT-LLM while also ensuring that every token is valid for the provided output schema.
|
|
892
|
+
|
|
893
|
+
## How to use function calling
|
|
894
|
+
|
|
895
|
+
Function calling works by providing LLMs with a structured description of a set of tools. Based on the prompt, the model selects the most appropriate tool or tools for the task described. Functions can be anything: API calls, ORM access, SQL queries, or just a script.
|
|
896
|
+
|
|
897
|
+
✕
|
|
898
|
+
|
|
899
|
+
A function written to be passed to an LLM — note the descriptive docstring.
|
|
900
|
+
|
|
901
|
+
It’s essential to understand that function calling does not give the LLM the capability to execute code. Instead, the function calling asks the LLM to choose the most appropriate function from the list of available tools. The actual function execution needs to happen in the same environment that made the LLM call.
|
|
902
|
+
|
|
903
|
+
Our function calling implementation follows the OpenAI API spec for compatibility, but applies to any model served with TensorRT-LLM via the Engine Builder that has built-in function calling capabilities (e.g. Llama 3.1 Instruct, but not Llama 3). Using the same logit biasing process that creates structured output, Briton (the modified Triton inference server) guarantees schematically correct tool responses.
|
|
904
|
+
|
|
905
|
+
✕
|
|
906
|
+
|
|
907
|
+
Example payload with function calling via the "tools" key
|
|
908
|
+
|
|
909
|
+
Function calling is critical for building agentic workflows and other advanced Compound AI systems. To use function calling for yourself, check out our function calling example in the documentation.
|
|
910
|
+
|
|
911
|
+
## How to use structured output
|
|
912
|
+
|
|
913
|
+
The more general structured output feature forces LLMs to return output that adheres to a Pydantic schema. Structured output is valid JSON, but goes beyond JSON mode with support for required and optional fields, multiple data types, and additional validations like maximum length.
|
|
914
|
+
|
|
915
|
+
To start, define your output schema as a Pydantic model.
|
|
916
|
+
|
|
917
|
+
✕
|
|
918
|
+
|
|
919
|
+
Pydantic model for a "Person" object. The schema can be passed to an LLM to structure output.
|
|
920
|
+
|
|
921
|
+
Then, when you add the schema to the LLM call, the model server will build the schema into a state machine and use it for token masking as described above. The LLM inference arguments match the OpenAI API spec for structured output to ensure maximum compatibility.
|
|
922
|
+
|
|
923
|
+
✕
|
|
924
|
+
|
|
925
|
+
Example LLM request payload with a response schema.
|
|
926
|
+
|
|
927
|
+
Structured output is useful for a wide range of Compound AI applications as the guaranteed schema adherence means you can integrate LLMs into larger systems without worrying about type errors. To try structured output for your application, start with our structured output example in the documentation.
|
|
928
|
+
|
|
929
|
+
## What to build with function calling and structured output
|
|
930
|
+
|
|
931
|
+
While the implementation behind these new features is interesting, what’s even more exciting is the use cases they enable.
|
|
932
|
+
|
|
933
|
+
Function calling unlocks a wide range of agentic use cases for open source LLMs. With function calling, you can give agents access to a set of tools to accomplish tasks. As we saw above, the LLM is only able to select the best tool, not actually execute the API call or run the function, so that’s where multi-step AI systems are needed.
|
|
934
|
+
|
|
935
|
+
These multi-step, often multi-model systems are commonly known as Compound AI. When building multi-stage Compound AI systems, structured output is critical. With structured output, each component of the system can communicate in valid JSON, preventing errors and avoiding parsing overhead.
|
|
936
|
+
|
|
937
|
+
As you build with function calling and structured output, remember that the model server changes don’t enhance quality, they only enforce format. Clear prompting and techniques like few-shot prompting still have their place for getting quality output within the enforced structure.
|
|
938
|
+
|
|
939
|
+
Get started building:
|
|
940
|
+
|
|
941
|
+
First, deploy Llama 3.1 8B with the TensorRT-LLM Engine Builder
|
|
942
|
+
|
|
943
|
+
Then try function calling with an accurate LLM math demo
|
|
944
|
+
|
|
945
|
+
And get JSON-mode output with in a document parsing demo
|
|
946
|
+
|
|
947
|
+
Subscribe to our newsletter
|
|
948
|
+
|
|
949
|
+
Stay up to date on model performance, inference infrastructure, and more.
|
|
950
|
+
|
|
951
|
+
|
|
952
|
+
|
|
953
|
+
## Explore Baseten today
|
|
954
|
+
|
|
955
|
+
Start deploying Talk to an engineer
|
|
956
|
+
|
|
957
|
+
# Custom Logits Processors - vLLM
|
|
958
|
+
URL: https://docs.vllm.ai/en/latest/features/custom_logitsprocs/
|
|
959
|
+
|
|
960
|
+
You are viewing the latest developer preview docs. Click here to view docs for the latest stable release.
|
|
961
|
+
|
|
962
|
+
- Examples
|
|
963
|
+
- Offline Inference
|
|
964
|
+
- Online Serving
|
|
965
|
+
- Others
|
|
966
|
+
- Pooling
|
|
967
|
+
- RL
|
|
968
|
+
- General
|
|
969
|
+
- Inference and Serving
|
|
970
|
+
- Deployment
|
|
971
|
+
- Integrations
|
|
972
|
+
- Training
|
|
973
|
+
- Configuration
|
|
974
|
+
- Models
|
|
975
|
+
- Extensions
|
|
976
|
+
- Hardware Supported Models
|
|
977
|
+
- Features
|
|
978
|
+
- Speculative Decoding
|
|
979
|
+
- Developer Guide
|
|
980
|
+
- Model Implementation
|
|
981
|
+
- CI
|
|
982
|
+
- Design Documents
|
|
983
|
+
- Architecture Overview
|
|
984
|
+
- Attention Backend Feature Support
|
|
985
|
+
- CUDA Graphs
|
|
986
|
+
- Vision Encoder (ViT) CUDA Graphs
|
|
987
|
+
- CustomOp
|
|
988
|
+
- Dual Batch Overlap
|
|
989
|
+
- How to debug the vLLM-torch.compile integration
|
|
990
|
+
- Fused MoE Modular Kernel
|
|
991
|
+
- Fusion torch.compile passes
|
|
992
|
+
- Integration with Hugging Face
|
|
993
|
+
- Hybrid KV Cache Manager
|
|
994
|
+
- Logits Processors
|
|
995
|
+
- Metrics
|
|
996
|
+
- Multi-Modal Data Processing
|
|
997
|
+
- Model Runner V2 Design Document
|
|
998
|
+
- Fused MoE Kernel Features
|
|
999
|
+
- Python Multiprocessing
|
|
1000
|
+
- Optimization Levels
|
|
1001
|
+
- P2P NCCL Connector
|
|
1002
|
+
- Paged Attention
|
|
1003
|
+
- Automatic Prefix Caching
|
|
1004
|
+
- torch.compile integration
|
|
1005
|
+
- torch.compile with Multimodal Encoders
|
|
1006
|
+
- Benchmarking
|
|
1007
|
+
- API Reference
|
|
1008
|
+
- benchmarks
|
|
1009
|
+
- sweep
|
|
1010
|
+
- compilation
|
|
1011
|
+
- utility
|
|
1012
|
+
- config
|
|
1013
|
+
- device_allocator
|
|
1014
|
+
- distributed
|
|
1015
|
+
- ec_transfer
|
|
1016
|
+
- elastic_ep
|
|
1017
|
+
- eplb
|
|
1018
|
+
- kv_transfer
|
|
1019
|
+
- mooncake
|
|
1020
|
+
- moriio
|
|
1021
|
+
- offloading
|
|
1022
|
+
- p2p
|
|
1023
|
+
- weight_transfer
|
|
1024
|
+
- engine
|
|
1025
|
+
- entrypoints
|
|
1026
|
+
- cli
|
|
1027
|
+
- mcp
|
|
1028
|
+
- openai
|
|
1029
|
+
- completion
|
|
1030
|
+
- engine
|
|
1031
|
+
- generate
|
|
1032
|
+
- models
|
|
1033
|
+
- parser
|
|
1034
|
+
- realtime
|
|
1035
|
+
- responses
|
|
1036
|
+
- speech_to_text
|
|
1037
|
+
- pooling
|
|
1038
|
+
- classify
|
|
1039
|
+
- embed
|
|
1040
|
+
- pooling
|
|
1041
|
+
- score
|
|
1042
|
+
- sagemaker
|
|
1043
|
+
- serve
|
|
1044
|
+
- disagg
|
|
1045
|
+
- elastic_ep
|
|
1046
|
+
- instrumentator
|
|
1047
|
+
- lora
|
|
1048
|
+
- profile
|
|
1049
|
+
- render
|
|
1050
|
+
- rlhf
|
|
1051
|
+
- rpc
|
|
1052
|
+
- sleep
|
|
1053
|
+
- tokenize
|
|
1054
|
+
- inputs
|
|
1055
|
+
- kernels
|
|
1056
|
+
- logging_utils
|
|
1057
|
+
- lora
|
|
1058
|
+
- ops
|
|
1059
|
+
- triton_ops
|
|
1060
|
+
- xpu_ops
|
|
1061
|
+
- punica_wrapper
|
|
1062
|
+
- model_executor
|
|
1063
|
+
- scaled_mm
|
|
1064
|
+
- layers
|
|
1065
|
+
- fla
|
|
1066
|
+
- fused_moe
|
|
1067
|
+
- oracle
|
|
1068
|
+
- prepare_finalize
|
|
1069
|
+
- router
|
|
1070
|
+
- runner
|
|
1071
|
+
- mamba
|
|
1072
|
+
- pooler
|
|
1073
|
+
- tokwise
|
|
1074
|
+
- quantization
|
|
1075
|
+
- transform
|
|
1076
|
+
- quark
|
|
1077
|
+
- utils
|
|
1078
|
+
- rotary_embedding
|
|
1079
|
+
- model_loader
|
|
1080
|
+
- models
|
|
1081
|
+
- offloader
|
|
1082
|
+
- warmup
|
|
1083
|
+
- multimodal
|
|
1084
|
+
- processing
|
|
1085
|
+
- parser
|
|
1086
|
+
- platforms
|
|
1087
|
+
- plugins
|
|
1088
|
+
- lora_resolvers
|
|
1089
|
+
- profiler
|
|
1090
|
+
- ray
|
|
1091
|
+
- reasoning
|
|
1092
|
+
- renderers
|
|
1093
|
+
- tokenizers
|
|
1094
|
+
- tool_parsers
|
|
1095
|
+
- tracing
|
|
1096
|
+
- transformers_utils
|
|
1097
|
+
- configs
|
|
1098
|
+
- processors
|
|
1099
|
+
- triton_utils
|
|
1100
|
+
- usage
|
|
1101
|
+
- utils
|
|
1102
|
+
- v1
|
|
1103
|
+
- ops
|
|
1104
|
+
- core
|
|
1105
|
+
- engine
|
|
1106
|
+
- executor
|
|
1107
|
+
- kv_offload
|
|
1108
|
+
- worker
|
|
1109
|
+
- metrics
|
|
1110
|
+
- pool
|
|
1111
|
+
- sample
|
|
1112
|
+
- ops
|
|
1113
|
+
- spec_decode
|
|
1114
|
+
- structured_output
|
|
1115
|
+
- worker
|
|
1116
|
+
- mm
|
|
1117
|
+
- model_states
|
|
1118
|
+
- pool
|
|
1119
|
+
- sample
|
|
1120
|
+
- spec_decode
|
|
1121
|
+
- CLI Reference
|
|
1122
|
+
- Community
|
|
1123
|
+
- Blog
|
|
1124
|
+
- Forum
|
|
1125
|
+
- Slack
|
|
1126
|
+
|
|
1127
|
+
# 404 - Not found
|
|
1128
|
+
|
|
1129
|
+
Back to top
|
|
1130
|
+
|
|
1131
|
+
# GUEST POST - Crafting Unique AI Personas: Harnessing the Power of Logit Bias in Large Language Models | Microsoft Agent Framework
|
|
1132
|
+
URL: https://devblogs.microsoft.com/semantic-kernel/guest-post-crafting-unique-ai-personas-harnessing-the-power-of-logit-bias-in-large-language-models/
|
|
1133
|
+
Author: Anthony Puppo
|
|
1134
|
+
|
|
1135
|
+
GUEST POST - Crafting Unique AI Personas: Harnessing the Power of Logit Bias in Large Language Models | Microsoft Agent Framework
|
|
1136
|
+
|
|
1137
|
+
|
|
1138
|
+
|
|
1139
|
+
|
|
1140
|
+
|
|
1141
|
+
|
|
1142
|
+
|
|
1143
|
+
|
|
1144
|
+
|
|
1145
|
+
|
|
1146
|
+
|
|
1147
|
+
|
|
1148
|
+
|
|
1149
|
+
|
|
1150
|
+
|
|
1151
|
+
Anthony Puppo
|
|
1152
|
+
|
|
1153
|
+
Software Engineer
|
|
1154
|
+
|
|
1155
|
+
|
|
1156
|
+
|
|
1157
|
+
Large Language Models (LLMs) have revolutionized our interaction with software. However, there’s a catch – their responses can be monotonous and impersonal. This is where ‘personas’ come in. They add a human touch to LLMs, transforming generic outputs into customized responses that resonate with users. This is particularly handy in applications like customer service bots and virtual assistants. But how do we create these personas without hefty costs or time investments? The good news is, we can tweak a set of common parameters in most LLMs to influence their output, and that’s what we’ll explore today.
|
|
1158
|
+
|
|
1159
|
+
The examples in this blog post utilize C#, Semantic Kernel, SharpToken and the OpenAI API. If you’d like to follow along and experiment yourself, first create a new console project:
|
|
1160
|
+
|
|
1161
|
+
```default
|
|
1162
|
+
dotnet new console --framework net7.0
|
|
1163
|
+
```
|
|
1164
|
+
|
|
1165
|
+
Then install dependencies using NuGet:
|
|
1166
|
+
|
|
1167
|
+
```default
|
|
1168
|
+
dotnet add package Microsoft.SemanticKernel --prerelease
|
|
1169
|
+
dotnet add package SharpToken
|
|
1170
|
+
```
|
|
1171
|
+
|
|
1172
|
+
Additionally, I’ve written a small demo application that utilizes some of the techniques discussed in this blog. It is open-source and available on GitHub.
|
|
1173
|
+
|
|
1174
|
+
### Introduction To How LLMs Work
|
|
1175
|
+
|
|
1176
|
+
In simplified terms, LLMs function a bit like a predictive text engine. They interpret a sentence with each word or part of a word as a ‘token’. The tokens are then transformed into a numerical format that the model can process.
|
|
1177
|
+
|
|
1178
|
+
```csharp
|
|
1179
|
+
var encoding = GptEncoding.GetEncoding("gpt-3.5-turbo");
|
|
1180
|
+
var rawTokens = encoding.Encode("Wonderful day we're having!");
|
|
1181
|
+
var textTokens = rawTokens.Select((x) => $""{encoding.Decode(new() { x })}"").ToList();
|
|
1182
|
+
|
|
1183
|
+
Console.WriteLine($"Raw tokens: {string.Join(", ", rawTokens)}");
|
|
1184
|
+
Console.WriteLine($"Tokenized text: {string.Join(", ", textTokens)}");
|
|
1185
|
+
|
|
1186
|
+
// Output:
|
|
1187
|
+
// Raw tokens: 62372, 1285, 1938, 584, 2351, 3515, 0
|
|
1188
|
+
// Tokenized text: "Wonder", "ful", " day", " we", "'re", " having", "!"
|
|
1189
|
+
```
|
|
1190
|
+
|
|
1191
|
+
Based on its prior training, the model predicts what should come next in the sequence. For example, after “I don’t like”, the model might suggest “apples”. This prediction can be thought of as the model’s first draft — it’s good, but can we make it better?
|
|
1192
|
+
|
|
1193
|
+
### LLM Parameters
|
|
1194
|
+
|
|
1195
|
+
Consumers of LLMs have the ability to adjust certain parameters. This can yield creative, varied, and engaging results.
|
|
1196
|
+
|
|
1197
|
+
##### Temperature
|
|
1198
|
+
|
|
1199
|
+
This parameter adjusts the entropy of our model’s output. A high temperature makes the model’s output diverse and creative, while a lower temperature results in more focused and predictable responses.
|
|
1200
|
+
|
|
1201
|
+
##### Top P
|
|
1202
|
+
|
|
1203
|
+
Top P (also known as nucleus sampling) guides the selection of next tokens based on cumulated probabilities. This is a more nuanced measure of controlling randomness that can often lead to more diverse outputs. For example, if the model is predicting the next word in “The cat climbed up the ___”, and the options tree, roof, and wall add up to around 90% probability, Top P, if set at 90%, restricts the model to select among those three options.
|
|
1204
|
+
|
|
1205
|
+
##### Frequency and Presence Penalties
|
|
1206
|
+
|
|
1207
|
+
Frequency penalty discourages the overuse of specific tokens, and presence penalty penalizes tokens previously used in the output, irrespective of frequency. These mechanisms can be instrumental in quelling repetition and promoting diversity.
|
|
1208
|
+
|
|
1209
|
+
##### Logit Bias
|
|
1210
|
+
|
|
1211
|
+
Logit bias directly manipulates the logits (the raw, unnormalized scores predicted by the model) for specific tokens before they are passed through the softmax function for probability distribution. By adjusting the logit bias, one can promote or demote particular tokens. For instance, if we want the model to avoid using a certain token, we can assign a negative logit bias to it, making it less likely to be chosen. Likewise, assigning a higher bias will have the model favor the token.
|
|
1212
|
+
|
|
1213
|
+
### Persona Generation Using Logit Bias
|
|
1214
|
+
|
|
1215
|
+
For a practical demonstration, let’s consider a scenario where we desire our model to generate shorter sentences. To achieve this, we can manipulate the bias for common punctuation such as “.”, “!”, and “?”.
|
|
1216
|
+
|
|
1217
|
+
First, setup the kernel so we can interact with the model:
|
|
1218
|
+
|
|
1219
|
+
```csharp
|
|
1220
|
+
var kernel = new KernelBuilder()
|
|
1221
|
+
.WithOpenAIChatCompletionService("gpt-3.5-turbo", "<your-openai-api-key>")
|
|
1222
|
+
.Build();
|
|
1223
|
+
```
|
|
1224
|
+
|
|
1225
|
+
Then call it with our custom settings:
|
|
1226
|
+
|
|
1227
|
+
```csharp
|
|
1228
|
+
var result = await kernel.InvokeSemanticFunctionAsync(
|
|
1229
|
+
"Describe a rainbow.",
|
|
1230
|
+
requestSettings: new OpenAIRequestSettings()
|
|
1231
|
+
{
|
|
1232
|
+
Temperature = 0,
|
|
1233
|
+
TopP = 1,
|
|
1234
|
+
FrequencyPenalty = 0,
|
|
1235
|
+
PresencePenalty = 0,
|
|
1236
|
+
TokenSelectionBiases = new[] { ".", "!", "?" }
|
|
1237
|
+
.SelectMany((x) => encoding.Encode(x))
|
|
1238
|
+
.ToDictionary((x) => x, (x) => 10)
|
|
1239
|
+
});
|
|
1240
|
+
|
|
1241
|
+
Console.WriteLine(result);
|
|
1242
|
+
```
|
|
1243
|
+
|
|
1244
|
+
And we get the following:
|
|
1245
|
+
|
|
1246
|
+
A rainbow is a beautiful and natural phenomenon. It appears as a circular arc of colors in the sky. It is formed when sunlight is refracted, or bent, as it passes through raindrops. The sunlight is then reflected inside the raindrop and refracted again. This process causes the light to separate into its component colors. The colors of a rainbow, from top to bottom, are red, orange, yellow, green, blue, indigo, and violet. The colors are vibrant and distinct. The rainbow usually appears after rain showers when the sun is still shining. It can also be seen near waterfalls or fountains. The sight of a rainbow is often associated with joy, hope, and wonder. It is a mesmerizing display of nature’s beauty.
|
|
1247
|
+
|
|
1248
|
+
Conversely, if we make the bias negative:
|
|
1249
|
+
|
|
1250
|
+
```csharp
|
|
1251
|
+
// Surrounding code omitted for brevity...
|
|
1252
|
+
TokenSelectionBiases = new[] { ".", "!", "?" }
|
|
1253
|
+
.SelectMany((x) => encoding.Encode(x))
|
|
1254
|
+
.ToDictionary((x) => x, (x) => -10)
|
|
1255
|
+
```
|
|
1256
|
+
|
|
1257
|
+
We then get something like this:
|
|
1258
|
+
|
|
1259
|
+
A rainbow is a beautiful and natural phenomenon that occurs when sunlight is refracted, or bent, by water droplets in the air, creating a spectrum of colors in the sky. Typically, a rainbow appears as a semi-circular arc of vibrant colors, with red being the outermost color and violet being the innermost color, although sometimes a full circle can be seen in certain conditions. The colors of a rainbow, in order, are red, orange, yellow, green, blue, indigo, and violet, often remembered by the acronym ROYGBIV. Each color of the rainbow is distinct and blends seamlessly into the next, creating a stunning display of hues that can be seen against a backdrop of dark clouds or a clear blue sky. Rainbows are often seen after rain showers when the sun emerges from behind the clouds, casting its rays onto the raindrops in the air, causing them to act as tiny prisms that refract the sunlight and create the colorful spectrum. The sight of a rainbow is often associated with feelings of joy, wonder, and hope, as it is a symbol of beauty and harmony in nature. Rainbows are not physical objects that can be touched or approached, but rather optical illusions that appear to be located at a specific distance from the observer, making them seem elusive and magical. Overall, a rainbow is a breathtaking and ephemeral display of colors that captivates the imagination and reminds us of the wonders of the natural world around us.
|
|
1260
|
+
|
|
1261
|
+
The first response is more concise and straightforward, providing a clear and simple explanation of a rainbow. It uses a more casual and conversational tone, making it easier to understand for a general audience.
|
|
1262
|
+
|
|
1263
|
+
The second is more detailed and comprehensive, providing a more scientific explanation of a rainbow. It uses a more formal and academic tone, making it suitable for a more knowledgeable audience or someone seeking a deeper understanding. The language is more descriptive and the sentences are longer, contributing to a more elaborate and thorough explanation.
|
|
1264
|
+
|
|
1265
|
+
The effects of tweaking logit_bias are evident in the given examples, and these modifications show how we can mold the model’s responses to be more in line with a specific persona. By amplifying or diminishing this bias, we can guide the model to generate responses that are concise or verbose, casual or formal, simple or detailed, depending on the desired personality. However, the key lies in balance. Overdoing it might result in an overbearing or inconsistent persona, while underdoing it might make the persona feel generic.
|
|
1266
|
+
|
|
1267
|
+
### Next Steps
|
|
1268
|
+
|
|
1269
|
+
So, what are some of the options for putting the topics discussed here into practice?
|
|
1270
|
+
|
|
1271
|
+
1. Experiment with Logit Bias: Get hands-on experience with this feature. Start with simple tweaks to the bias values and observe how the output changes. As you gain familiarity, attempt to create a more complex persona by adjusting the bias for a wider range of tokens.
|
|
1272
|
+
2. Dive into Stylometry: Learn more about stylometry. This field of study can provide insights into how writing styles can be quantified and analyzed, which can be helpful in creating more nuanced personas.
|
|
1273
|
+
3. Implement Part-of-Speech Tagging: Incorporate part-of-speech tagging. It can be useful in understanding the grammatical structure of the sentences generated by the model (or text from a pre-existing persona you are attempting to emulate). This understanding can help you tune the logit bias more effectively.
|
|
1274
|
+
4. Randomize Character Creation: Create a corpus of words relevant to the desired persona. Use this corpus to randomly assign attributes to the model’s persona. This can add an element of unpredictability to the model, making it more engaging.
|
|
1275
|
+
5. Explore Token Frequencies and TF-IDF: Rather than merely looking at the plain text, consider tokenizing the text. This approach can be combined with Term Frequency-Inverse Document Frequency (TF-IDF) to assess the frequency of model tokens. This insight can guide the adjustment of logit bias values more appropriately (since models are operating at the token level).
|
|
1276
|
+
6. Combine Parameters: Don’t limit yourself to logit bias. Try combining it with other parameters like temperature and top_p for more nuanced control over the output. Remember, the aim is to create a persona that is consistent, engaging, and believable.
|
|
1277
|
+
|
|
1278
|
+
### Closing Thoughts
|
|
1279
|
+
|
|
1280
|
+
Crafting unique personas can be a tricky pursuit. Logit bias, however, offers a promising starting point. It’s a tool to help you steer your models outputs towards a more personalized touch. Yet, it’s important to note, it’s but one piece of the puzzle. While other parameters might not singularly make a huge impact in persona development, their combined use could unlock more possibilities. The journey to mastering persona creation in LLMs is an intriguing one, and hopefully, this has given you a useful compass to navigate it.
|
|
1281
|
+
|
|
1282
|
+
|
|
1283
|
+
|
|
1284
|
+
|
|
1285
|
+
|
|
1286
|
+
Category
|
|
1287
|
+
|
|
1288
|
+
Semantic Kernel
|
|
1289
|
+
|
|
1290
|
+
Topics
|
|
1291
|
+
|
|
1292
|
+
Personas Semantic Kernel
|
|
1293
|
+
|
|
1294
|
+
Share
|
|
1295
|
+
|
|
1296
|
+
|
|
1297
|
+
|
|
1298
|
+
## Author
|
|
1299
|
+
|
|
1300
|
+
Anthony Puppo
|
|
1301
|
+
|
|
1302
|
+
Software Engineer
|
|
1303
|
+
|
|
1304
|
+
|
|
1305
|
+
|
|
1306
|
+
|
|
1307
|
+
|
|
1308
|
+
|
|
1309
|
+
|
|
1310
|
+
November 1, 2023
|
|
1311
|
+
|
|
1312
|
+
### What to expect from v1 and beyond for Semantic Kernel.
|
|
1313
|
+
|
|
1314
|
+
Matthew Bolanos
|
|
1315
|
+
|
|
1316
|
+
November 2, 2023
|
|
1317
|
+
|
|
1318
|
+
### AutoGen Agents Meet Semantic Kernel
|
|
1319
|
+
|
|
1320
|
+
John Maeda
|
|
1321
|
+
|
|
1322
|
+
|
|
1323
|
+
|
|
1324
|
+
## Stay informed
|
|
1325
|
+
|
|
1326
|
+
Get notified when new posts are published.
|
|
1327
|
+
|
|
1328
|
+
Email *
|
|
1329
|
+
|
|
1330
|
+
Country/Region * Select...United StatesAfghanistanÅland IslandsAlbaniaAlgeriaAmerican SamoaAndorraAngolaAnguillaAntarcticaAntigua and BarbudaArgentinaArmeniaArubaAustraliaAustriaAzerbaijanBahamasBahrainBangladeshBarbadosBelarusBelgiumBelizeBeninBermudaBhutanBoliviaBonaireBosnia and HerzegovinaBotswanaBouvet IslandBrazilBritish Indian Ocean TerritoryBritish Virgin IslandsBruneiBulgariaBurkina FasoBurundiCabo VerdeCambodiaCameroonCanadaCayman IslandsCentral African RepublicChadChileChinaChristmas IslandCocos (Keeling) IslandsColombiaComorosCongoCongo (DRC)Cook IslandsCosta RicaCôte dIvoireCroatiaCuraçaoCyprusCzechiaDenmarkDjiboutiDominicaDominican RepublicEcuadorEgyptEl SalvadorEquatorial GuineaEritreaEstoniaEswatiniEthiopiaFalkland IslandsFaroe IslandsFijiFinlandFranceFrench GuianaFrench PolynesiaFrench Southern TerritoriesGabonGambiaGeorgiaGermanyGhanaGibraltarGreeceGreenlandGrenadaGuadeloupeGuamGuatemalaGuernseyGuineaGuinea-BissauGuyanaHaitiHeard Island and McDonald IslandsHondurasHong Kong SARHungaryIcelandIndiaIndonesiaIraqIrelandIsle of ManIsraelItalyJamaicaJan MayenJapanJerseyJordanKazakhstanKenyaKiribatiKoreaKosovoKuwaitKyrgyzstanLaosLatviaLebanonLesothoLiberiaLibyaLiechtensteinLithuaniaLuxembourgMacau SARMadagascarMalawiMalaysiaMaldivesMaliMaltaMarshall IslandsMartiniqueMauritaniaMauritiusMayotteMexicoMicronesiaMoldovaMonacoMongoliaMontenegroMontserratMoroccoMozambiqueMyanmarNamibiaNauruNepalNetherlandsNew CaledoniaNew ZealandNicaraguaNigerNigeriaNiueNorfolk IslandNorth MacedoniaNorthern Mariana IslandsNorwayOmanPakistanPalauPalestinian AuthorityPanamaPapua New GuineaParaguayPeruPhilippinesPitcairn IslandsPolandPortugalPuerto RicoQatarRéunionRomaniaRwandaSabaSaint BarthélemySaint Kitts and NevisSaint LuciaSaint MartinSaint Pierre and MiquelonSaint Vincent and the GrenadinesSamoaSan MarinoSão Tomé and PríncipeSaudi ArabiaSenegalSerbiaSeychellesSierra LeoneSingaporeSint EustatiusSint MaartenSlovakiaSloveniaSolomon IslandsSomaliaSouth AfricaSouth Georgia and South Sandwich IslandsSouth SudanSpainSri LankaSt HelenaAscensionTristan da CunhaSurinameSvalbardSwedenSwitzerlandTaiwanTajikistanTanzaniaThailandTimor-LesteTogoTokelauTongaTrinidad and TobagoTunisiaTurkeyTurkmenistanTurks and Caicos IslandsTuvaluU.S. Outlying IslandsU.S. Virgin IslandsUgandaUkraineUnited Arab EmiratesUnited KingdomUruguayUzbekistanVanuatuVatican CityVenezuelaVietnamWallis and FutunaYemenZambiaZimbabwe
|
|
1331
|
+
|
|
1332
|
+
I would like to receive the Microsoft Agent Framework Newsletter. Privacy Statement.
|
|
1333
|
+
|
|
1334
|
+
Subscribe
|
|
1335
|
+
|
|
1336
|
+
Follow this blog
|
|
1337
|
+
|
|
1338
|
+
|
|
1339
|
+
|
|
1340
|
+
|
|
1341
|
+
|
|
1342
|
+
|
|
1343
|
+
|
|
1344
|
+
|
|
1345
|
+
|
|
1346
|
+
|
|
1347
|
+
|
|
1348
|
+
|
|
1349
|
+
|
|
1350
|
+
|
|
1351
|
+
|
|
1352
|
+
Sign in
|
|
1353
|
+
|
|
1354
|
+
Theme
|