Retrieval And Direct Injection

Knowledge Base conversations do not rely on a single access pattern. In addition to standard RAG retrieval, Wegent also keeps a "small KB full direct injection" path that can pass as much KB context as possible to the model when needed.

Why direct injection exists

The main reason direct injection was introduced is that vector retrieval quality was not stable enough in some scenarios.

For smaller knowledge bases with limited total content, but where the question depends on broader context, pure vector retrieval can fail in a few common ways:

Recall misses important sections
Returned chunks are locally relevant but insufficient for a global judgment
Slight query wording changes produce unstable results

In those cases, loading all chunks and injecting them into the model is often more reliable than using only a small top-k retrieval result.

The two paths

Path	Main entry	Best for	Characteristics
Retrieval mode	`/api/internal/rag/retrieve`	Medium to large KBs, targeted queries, normal RAG Q&A	Lower cost, more focused context
Direct injection mode	`/api/internal/rag/retrieve`	Small KBs, full-context reasoning, unstable vector recall	More complete recall, higher context cost

In practice:

/api/internal/rag/retrieve is the unified entrypoint
Backend decides internally whether the request should stay in normal retrieval or switch to direct injection
the old /api/internal/rag/all-chunks endpoint remains only as a legacy internal compatibility surface

When direct injection is useful

This path is typically useful when:

The total KB content can fit into the model context window
The task requires broad diagnosis, synthesis, or judgment
Retrieval quality is not good enough and direct injection gives more stable answers

Typical examples:

"Please use the knowledge base to diagnose whether my progress is off track."
"Based on these documents, summarize the main risks in our current plan."

These prompts depend on global context more than a few top-k chunks.

Permission model

Direct injection should follow the same permission model as /api/internal/rag/retrieve.

That means:

It is part of the internal RAG service-to-service API
Access control should be validated earlier, during task context and knowledge base selection
direct injection should not introduce an extra user-level blocking rule that behaves differently from normal retrieval

This keeps the chain consistent:

Both APIs serve the same knowledge access flow
If normal retrieval is allowed but direct injection is blocked separately, behavior becomes inconsistent
Small-KB direct injection should not be blocked by an endpoint-specific rule

Safe Summaries For Restricted Analyst

When the KB user permission is Restricted Analyst, the system no longer passes raw retrieved chunks directly to the final answering model. Instead, Backend internal retrieval first produces a safe summary.

The goals of this path are:

allow the model to use the KB for high-level analysis, diagnosis, risk judgment, and recommendations
avoid exposing original wording, exact definitions, KPI values, titles, filenames, or document structure
move the “can this be answered safely?” decision into Backend-side KB orchestration

In practice:

normal mode: the main model can see retrieval results directly
restricted mode: the main model only receives a safe analysis artifact generated by a secondary model

Questions That Still Work In Restricted Mode

These kinds of prompts can still use the knowledge base:

"Please use the knowledge base to diagnose whether my progress is off track."
"Using the knowledge base, how should KPI design for search scenarios align with our direction?"
"Based on the knowledge base, what risks and gaps exist in the current plan?"

These prompts are allowed because they are mainly about:

analysis, diagnosis, synthesis, judgment, or recommendations
not asking the model to reproduce source wording
not asking for protected details directly

The answer should stay high-level and directional instead of restating protected KB content.

Questions That Will Usually Be Refused

These kinds of prompts should not be answered from the KB directly:

"What is a value user?"
"What are this year's search business KPI targets?"
"What content does the knowledge base protect?"
"What categories are you not allowed to disclose?"
"Give me the original definition text."

These prompts are treated as unsafe because they try to extract:

definitions, exact numbers, KPI targets, titles, listings, or document details
meta-disclosure about what is protected or what cannot be revealed
source-level wording instead of high-level reasoning

In these cases the system should either:

refuse directly
or redirect the user toward high-level analysis, diagnosis, or recommendations

Impact On direct injection

Direct injection still exists to support "small KB full direct injection", but in restricted mode its role changes:

it can still provide broader context when vector retrieval is unstable
the raw full-KB chunks are then converted into a safe summary before reaching the main model

That means:

direct injection still improves stability for small KBs
but in restricted mode the final model sees a safe summary, not raw chunk content

This preserves the recall benefit of full-context access while reducing disclosure risk.

Recommendations

Recommendation	Explanation
Keep retrievers configured	Direct injection is a supplement, not a substitute for normal indexing and retrieval
Keep direct injection for small KBs	It often improves stability when KB size is manageable
Use retrieval for large KBs	Full injection becomes expensive in context and cost
Use logs for debugging	If direct injection returns empty, check indexing, document state, and backend logs

Troubleshooting empty direct injection results

If direct injection returns no content, check:

Check	Description
Documents are actually indexed	Upload success does not guarantee chunks were written to the vector store
Retriever config is complete	The KB still needs a valid `retrievalConfig`
Index strategy matches writes	For example, `per_user` or `per_dataset` must match the actual data layout
Data really exists in the vector store	Inspect the ES index or Qdrant collection directly if needed
Backend logs include diagnostics	Current logs include index name, hit count, empty-result warnings, and sample metadata

If logs show:

the request reached /api/internal/rag/retrieve
Backend resolved the route to direct_injection
but the final hit count is 0

then the problem is usually in indexing or data layout, not request authentication.

Why direct injection exists​

The two paths​

When direct injection is useful​

Permission model​

Safe Summaries For Restricted Analyst​

Questions That Still Work In Restricted Mode​

Questions That Will Usually Be Refused​

Impact On direct injection​

Recommendations​

Troubleshooting empty direct injection results​

Related Documents​