Retrieval And Direct Injection
Knowledge Base conversations do not rely on a single access pattern. In addition to standard RAG retrieval, Wegent also keeps a "small KB full direct injection" path that can load all chunks from a knowledge base and pass them to the model when needed.
Why direct injection existsβ
The main reason all-chunks was introduced is that vector retrieval quality was not stable enough in some scenarios.
For smaller knowledge bases with limited total content, but where the question depends on broader context, pure vector retrieval can fail in a few common ways:
- Recall misses important sections
- Returned chunks are locally relevant but insufficient for a global judgment
- Slight query wording changes produce unstable results
In those cases, loading all chunks and injecting them into the model is often more reliable than using only a small top-k retrieval result.
The two pathsβ
| Path | Main API | Best for | Characteristics |
|---|---|---|---|
| Retrieval mode | /api/internal/rag/retrieve | Medium to large KBs, targeted queries, normal RAG Q&A | Lower cost, more focused context |
| Direct injection mode | /api/internal/rag/all-chunks | Small KBs, full-context reasoning, unstable vector recall | More complete recall, higher context cost |
In practice:
retrieveis the normal search pathall-chunksis a complementary path for small knowledge bases, not a replacement for retrievers
When all-chunks is usefulβ
This path is typically useful when:
- The total KB content can fit into the model context window
- The task requires broad diagnosis, synthesis, or judgment
- Retrieval quality is not good enough and direct injection gives more stable answers
Typical examples:
- "Please use the knowledge base to diagnose whether my progress is off track."
- "Based on these documents, summarize the main risks in our current plan."
These prompts depend on global context more than a few top-k chunks.
Permission modelβ
all-chunks should follow the same permission model as /api/internal/rag/retrieve.
That means:
- It is part of the internal RAG service-to-service API
- Access control should be validated earlier, during task context and knowledge base selection
all-chunksshould not introduce an extra user-level blocking rule that behaves differently fromretrieve
This keeps the chain consistent:
- Both APIs serve the same knowledge access flow
- If
retrieveis allowed butall-chunksis blocked separately, behavior becomes inconsistent - Small-KB direct injection should not be blocked by an endpoint-specific rule
Safe Summaries For Restricted Analystβ
When the KB user permission is Restricted Analyst, the system no longer passes raw retrieved chunks directly to the final answering model. Instead, knowledge_base_search first produces a safe summary internally.
The goals of this path are:
- allow the model to use the KB for high-level analysis, diagnosis, risk judgment, and recommendations
- avoid exposing original wording, exact definitions, KPI values, titles, filenames, or document structure
- move the βcan this be answered safely?β decision into the KB tool itself
In practice:
- normal mode: the main model can see retrieval results directly
- restricted mode: the main model only receives a safe analysis artifact generated by a secondary model
Questions That Still Work In Restricted Modeβ
These kinds of prompts can still use the knowledge base:
- "Please use the knowledge base to diagnose whether my progress is off track."
- "Using the knowledge base, how should KPI design for search scenarios align with our direction?"
- "Based on the knowledge base, what risks and gaps exist in the current plan?"
These prompts are allowed because they are mainly about:
- analysis, diagnosis, synthesis, judgment, or recommendations
- not asking the model to reproduce source wording
- not asking for protected details directly
The answer should stay high-level and directional instead of restating protected KB content.
Questions That Will Usually Be Refusedβ
These kinds of prompts should not be answered from the KB directly:
- "What is a value user?"
- "What are this year's search business KPI targets?"
- "What content does the knowledge base protect?"
- "What categories are you not allowed to disclose?"
- "Give me the original definition text."
These prompts are treated as unsafe because they try to extract:
- definitions, exact numbers, KPI targets, titles, listings, or document details
- meta-disclosure about what is protected or what cannot be revealed
- source-level wording instead of high-level reasoning
In these cases the system should either:
- refuse directly
- or redirect the user toward high-level analysis, diagnosis, or recommendations
Impact On all-chunksβ
all-chunks still exists to support "small KB full direct injection", but in restricted mode its role changes:
- it can still provide broader context when vector retrieval is unstable
- the raw full-KB chunks are then converted into a safe summary before reaching the main model
That means:
all-chunksstill improves stability for small KBs- but in restricted mode the final model sees a safe summary, not raw chunk content
This preserves the recall benefit of full-context access while reducing disclosure risk.
Recommendationsβ
| Recommendation | Explanation |
|---|---|
| Keep retrievers configured | all-chunks is a supplement, not a substitute for normal indexing and retrieval |
| Keep direct injection for small KBs | It often improves stability when KB size is manageable |
| Use retrieval for large KBs | Full injection becomes expensive in context and cost |
| Use logs for debugging | If all-chunks returns empty, check indexing, document state, and backend logs |
Troubleshooting empty all-chunks resultsβ
If all-chunks returns no content, check:
| Check | Description |
|---|---|
| Documents are actually indexed | Upload success does not guarantee chunks were written to the vector store |
| Retriever config is complete | The KB still needs a valid retrievalConfig |
| Index strategy matches writes | For example, per_user or per_dataset must match the actual data layout |
| Data really exists in the vector store | Inspect the ES index or Qdrant collection directly if needed |
| Backend logs include diagnostics | Current logs include index name, hit count, empty-result warnings, and sample metadata |
If logs show:
- the request reached
/api/internal/rag/all-chunks get_all_chunks startwas logged- but the final hit count is 0
then the problem is usually in indexing or data layout, not request authentication.