Dynamic Context (Injecting Request-Scoped Context)

Background

In Chat Shell, the system prompt is the most cache-friendly part for LLM prompt caching / prefix caching. If we mix request-scoped, frequently changing content into the system prompt (e.g., knowledge base metadata lists), the cache hit rate drops significantly, increasing token cost and latency.

To improve cache hit rate, we split “dynamic metadata” out of the system prompt and inject it into the message list via a unified dynamic_context mechanism.

Goals

Keep the system prompt fully static whenever possible so it can be cached.
Inject all request-scoped metadata as a separate human/user message.
Make the mechanism extensible: internal deployments can append weibo_context or other dynamic blocks in the same place.
Keep tool-routing rules and policy inside static prompt templates rather than dynamic context.

Message Structure

Before:

System: static instructions + dynamic kb_meta_list
Human (history)
Human (current) + datetime suffix

After:

System: static instructions (cacheable)
Human (history)
Human (dynamic_context): dynamic kb_meta_prompt (new)
Human (current) + datetime suffix

Injection order (pseudo-code):

messages = []
if system_prompt:
    messages.append({"role": "system", "content": system_prompt})
messages.extend(history)
if dynamic_context:
    messages.append({"role": "user", "content": dynamic_context})
messages.append(current_user_message_with_datetime_suffix)

Sources and Aggregation

Current: kb_meta_prompt

Backend builds kb_meta_prompt from historical contexts (KB name/ID/summary/topics, etc.).
Backend writes it into the unified protocol ExecutionRequest as kb_meta_prompt.
Chat Shell injects it into messages as dynamic_context.
kb_meta_prompt should carry request-scoped facts only. It should not duplicate KB workflow rules, tool policy, or response policy that already belongs in static prompt templates.

Restricted mode: safe kb_meta_prompt

When KB access runs under Restricted Analyst, dynamic context is still preserved, but the injected metadata should be a safe metadata block rather than directly reusable KB content.

Why dynamic context still exists in restricted mode:

the main model still needs to know which KBs are currently bound
minimal information such as KB name / id still helps tool calls remain stable
removing KB context entirely makes knowledge tool usage less reliable

The current restricted kb_meta_prompt keeps only the minimum routing context needed for search:

KB name
KB ID
constrained routing hint
constrained routing keywords

It should not include:

raw source passages
definitions that can be restated directly
exact targets, KPI values, or document structure

These routing hints exist only to help the main model draft better knowledge_base_search queries. They must not be surfaced as final answer content.

Future: weibo_context

Internal deployments can extend the same injection point to include:

user identity / permission context (e.g. weibo_context)

Suggested approach:

Build dynamic blocks independently, then join with \n\n.
Avoid putting any request-scoped data into system prompt templates.

Controlled Exception: Web UI Runtime Guidance

Tasks started from the Wegent Web UI append a small <wegent_runtime_guidance> system prompt block while Backend builds the ExecutionRequest. This block is not KB metadata or business data context. It is interaction policy: it tells the model that the request came from the Web UI, describes whether execution is on a local device, cloud sandbox, or managed runtime, and asks the model to tell users to use the task page's "View the task files" entry when previewing or downloading generated files.

Keep this exception tightly scoped:

Apply it only to tasks started from the Web UI, not API or other entry points.
Describe only runtime and user interaction behavior; do not include KB content, business data, or large request context.
Keep the block short, idempotent, and guarded by the <wegent_runtime_guidance> marker.
If the new content is factual request context, prefer dynamic_context instead of extending the system prompt.

Responsibilities

shared/prompts/knowledge_base.py:
- Provides fully static KB prompt templates (no {kb_meta_list} placeholder).
Backend:
- Generates kb_meta_prompt and stores it in ExecutionRequest.kb_meta_prompt.
- Transports it to Chat Shell via OpenAIRequestConverter metadata.
- Appends controlled runtime guidance in TaskRequestBuilder for Web UI tasks.
Chat Shell:
- Injects dynamic_context as a human message.
- Must not build KB meta prompt locally (avoids reverse dependency and keeps HTTP mode consistent).

Restricted Retrieval Flow

In restricted mode, KB safety no longer depends mainly on a final-answer validator. The control point has been moved into knowledge_base_search.

The current flow is:

Backend builds a safe kb_meta_prompt
Chat Shell injects it as dynamic_context
The main model decides whether to call knowledge_base_search
In restricted mode, the KB tool retrieves search results or all-chunks
A secondary model converts the raw chunks into a safe summary
The main model only sees the safe summary, not the protected raw content

This keeps two important properties:

the main model can still use KB content for diagnosis and recommendations
the answerability and redaction decision stays inside the KB tool

Compatibility

If dynamic_context is an empty string or None, behavior is identical to pre-change behavior: no extra message is inserted.

Debugging And Logs

When debugging dynamic context or restricted KB behavior, focus on the logs below.

1. LLM request and response logs

With CHAT_SHELL_LOG_LLM_REQUESTS=1, the system now logs both LLM_REQUEST and LLM_RESPONSE.

These logs help you verify:

whether dynamic_context is really present in the message list
whether the restricted secondary model was invoked
what the model actually returned

2. Restricted safe-summary logs

Restricted KB flow now adds business-level logs such as:

Starting safe summary
Safe summary completed

These are useful for checking:

how many chunks were actually sent to the secondary model
whether the decision was answer or refuse
the machine-readable reason
a short preview of the safe summary

3. Persistence logs

If the KB tool also persists its result, continue checking:

Persist HTTP request
Persist HTTP response

4. Suggested debugging order

Confirm dynamic_context is present in the request
Confirm knowledge_base_search was triggered
Confirm restricted safe summary started
Inspect LLM_RESPONSE and Safe summary completed to see whether the result was answer or refuse

Background​

Goals​

Message Structure​

Sources and Aggregation​

Current: kb_meta_prompt​

Restricted mode: safe kb_meta_prompt​

Future: weibo_context​

Controlled Exception: Web UI Runtime Guidance​

Responsibilities​

Restricted Retrieval Flow​

Compatibility​

Debugging And Logs​

1. LLM request and response logs​

2. Restricted safe-summary logs​

3. Persistence logs​

4. Suggested debugging order​