Dynamic Context (Injecting Request-Scoped Context)
Backgroundβ
In Chat Shell, the system prompt is the most cache-friendly part for LLM prompt caching / prefix caching. If we mix request-scoped, frequently changing content into the system prompt (e.g., knowledge base metadata lists), the cache hit rate drops significantly, increasing token cost and latency.
To improve cache hit rate, we split βdynamic metadataβ out of the system prompt and inject it into the message list via a unified dynamic_context mechanism.
Goalsβ
- Keep the system prompt fully static whenever possible so it can be cached.
- Inject all request-scoped metadata as a separate human/user message.
- Make the mechanism extensible: internal deployments can append
weibo_contextor other dynamic blocks in the same place.
Message Structureβ
Before:
- System: static instructions + dynamic kb_meta_list
- Human (history)
- Human (current) + datetime suffix
After:
- System: static instructions (cacheable)
- Human (history)
- Human (dynamic_context): dynamic kb_meta_prompt (new)
- Human (current) + datetime suffix
Injection order (pseudo-code):
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.extend(history)
if dynamic_context:
messages.append({"role": "user", "content": dynamic_context})
messages.append(current_user_message_with_datetime_suffix)
Sources and Aggregationβ
Current: kb_meta_promptβ
- Backend builds
kb_meta_promptfrom historical contexts (KB name/ID/summary/topics, etc.). - Backend writes it into the unified protocol
ExecutionRequestaskb_meta_prompt. - Chat Shell injects it into messages as
dynamic_context.
Restricted mode: safe kb_meta_promptβ
When KB access runs under Restricted Analyst, dynamic context is still preserved, but the injected metadata should be a safe metadata block rather than directly reusable KB content.
Why dynamic context still exists in restricted mode:
- the main model still needs to know which KBs are currently bound
- minimal information such as KB
name/idstill helps tool calls remain stable - removing KB context entirely makes knowledge tool usage less reliable
The current restricted kb_meta_prompt keeps only the minimum routing context needed for search:
- KB name
- KB ID
- constrained routing hint
- constrained routing keywords
It should not include:
- raw source passages
- definitions that can be restated directly
- exact targets, KPI values, or document structure
These routing hints exist only to help the main model draft better
knowledge_base_search queries. They must not be surfaced as final answer
content.
Future: weibo_contextβ
Internal deployments can extend the same injection point to include:
- user identity / permission context (e.g.
weibo_context)
Suggested approach:
- Build dynamic blocks independently, then join with
\n\n. - Avoid putting any request-scoped data into system prompt templates.
Responsibilitiesβ
-
shared/prompts/knowledge_base.py:- Provides fully static KB prompt templates (no
{kb_meta_list}placeholder).
- Provides fully static KB prompt templates (no
-
Backend:
- Generates
kb_meta_promptand stores it inExecutionRequest.kb_meta_prompt. - Transports it to Chat Shell via
OpenAIRequestConvertermetadata.
- Generates
-
Chat Shell:
- Injects
dynamic_contextas a human message. - Must not build KB meta prompt locally (avoids reverse dependency and keeps HTTP mode consistent).
- Injects
Restricted Retrieval Flowβ
In restricted mode, KB safety no longer depends mainly on a final-answer validator. The control point has been moved into knowledge_base_search.
The current flow is:
- Backend builds a safe
kb_meta_prompt - Chat Shell injects it as
dynamic_context - The main model decides whether to call
knowledge_base_search - In restricted mode, the KB tool retrieves search results or
all-chunks - A secondary model converts the raw chunks into a safe summary
- The main model only sees the safe summary, not the protected raw content
This keeps two important properties:
- the main model can still use KB content for diagnosis and recommendations
- the answerability and redaction decision stays inside the KB tool
Compatibilityβ
If dynamic_context is an empty string or None, behavior is identical to pre-change behavior: no extra message is inserted.
Debugging And Logsβ
When debugging dynamic context or restricted KB behavior, focus on the logs below.
1. LLM request and response logsβ
With CHAT_SHELL_LOG_LLM_REQUESTS=1, the system now logs both LLM_REQUEST and LLM_RESPONSE.
These logs help you verify:
- whether
dynamic_contextis really present in the message list - whether the restricted secondary model was invoked
- what the model actually returned
2. Restricted safe-summary logsβ
Restricted KB flow now adds business-level logs such as:
Starting safe summarySafe summary completed
These are useful for checking:
- how many chunks were actually sent to the secondary model
- whether the decision was
answerorrefuse - the machine-readable reason
- a short preview of the safe summary
3. Persistence logsβ
If the KB tool also persists its result, continue checking:
Persist HTTP requestPersist HTTP response
4. Suggested debugging orderβ
- Confirm
dynamic_contextis present in the request - Confirm
knowledge_base_searchwas triggered - Confirm restricted safe summary started
- Inspect
LLM_RESPONSEandSafe summary completedto see whether the result wasanswerorrefuse