Retrieval And Direct Injection
Knowledge Base conversations do not rely on a single access pattern. In addition to standard RAG retrieval, Wegent also keeps a "small KB full direct injection" path that can pass as much KB context as possible to the model when needed.
Why direct injection existsβ
The main reason direct injection was introduced is that vector retrieval quality was not stable enough in some scenarios.
For smaller knowledge bases with limited total content, but where the question depends on broader context, pure vector retrieval can fail in a few common ways:
- Recall misses important sections
- Returned chunks are locally relevant but insufficient for a global judgment
- Slight query wording changes produce unstable results
In those cases, loading all chunks and injecting them into the model is often more reliable than using only a small top-k retrieval result.
The two pathsβ
| Path | Main entry | Best for | Characteristics |
|---|---|---|---|
| Retrieval mode | /api/internal/rag/retrieve | Medium to large KBs, targeted queries, normal RAG Q&A | Lower cost, more focused context |
| Direct injection mode | /api/internal/rag/retrieve | Small KBs, full-context reasoning, unstable vector recall | More complete recall, higher context cost |
In practice:
/api/internal/rag/retrieveis the unified entrypoint- Backend decides internally whether the request should stay in normal retrieval or switch to direct injection
- the old
/api/internal/rag/all-chunksendpoint remains only as a legacy internal compatibility surface
When direct injection is usefulβ
This path is typically useful when:
- The total KB content can fit into the model context window
- The task requires broad diagnosis, synthesis, or judgment
- Retrieval quality is not good enough and direct injection gives more stable answers
Typical examples:
- "Please use the knowledge base to diagnose whether my progress is off track."
- "Based on these documents, summarize the main risks in our current plan."
These prompts depend on global context more than a few top-k chunks.
Permission modelβ
Direct injection should follow the same permission model as /api/internal/rag/retrieve.
That means:
- It is part of the internal RAG service-to-service API
- Access control should be validated earlier, during task context and knowledge base selection
- direct injection should not introduce an extra user-level blocking rule that behaves differently from normal retrieval
This keeps the chain consistent:
- Both APIs serve the same knowledge access flow
- If normal retrieval is allowed but direct injection is blocked separately, behavior becomes inconsistent
- Small-KB direct injection should not be blocked by an endpoint-specific rule
Safe Summaries For Restricted Analystβ
When the KB user permission is Restricted Analyst, the system no longer passes raw retrieved chunks directly to the final answering model. Instead, Backend internal retrieval first produces a safe summary.
The goals of this path are:
- allow the model to use the KB for high-level analysis, diagnosis, risk judgment, and recommendations
- avoid exposing original wording, exact definitions, KPI values, titles, filenames, or document structure
- move the βcan this be answered safely?β decision into Backend-side KB orchestration
In practice:
- normal mode: the main model can see retrieval results directly
- restricted mode: the main model only receives a safe analysis artifact generated by a secondary model
Questions That Still Work In Restricted Modeβ
These kinds of prompts can still use the knowledge base:
- "Please use the knowledge base to diagnose whether my progress is off track."
- "Using the knowledge base, how should KPI design for search scenarios align with our direction?"
- "Based on the knowledge base, what risks and gaps exist in the current plan?"
These prompts are allowed because they are mainly about:
- analysis, diagnosis, synthesis, judgment, or recommendations
- not asking the model to reproduce source wording
- not asking for protected details directly
The answer should stay high-level and directional instead of restating protected KB content.
Questions That Will Usually Be Refusedβ
These kinds of prompts should not be answered from the KB directly:
- "What is a value user?"
- "What are this year's search business KPI targets?"
- "What content does the knowledge base protect?"
- "What categories are you not allowed to disclose?"
- "Give me the original definition text."
These prompts are treated as unsafe because they try to extract:
- definitions, exact numbers, KPI targets, titles, listings, or document details
- meta-disclosure about what is protected or what cannot be revealed
- source-level wording instead of high-level reasoning
In these cases the system should either:
- refuse directly
- or redirect the user toward high-level analysis, diagnosis, or recommendations
Impact On direct injectionβ
Direct injection still exists to support "small KB full direct injection", but in restricted mode its role changes:
- it can still provide broader context when vector retrieval is unstable
- the raw full-KB chunks are then converted into a safe summary before reaching the main model
That means:
- direct injection still improves stability for small KBs
- but in restricted mode the final model sees a safe summary, not raw chunk content
This preserves the recall benefit of full-context access while reducing disclosure risk.
Recommendationsβ
| Recommendation | Explanation |
|---|---|
| Keep retrievers configured | Direct injection is a supplement, not a substitute for normal indexing and retrieval |
| Keep direct injection for small KBs | It often improves stability when KB size is manageable |
| Use retrieval for large KBs | Full injection becomes expensive in context and cost |
| Use logs for debugging | If direct injection returns empty, check indexing, document state, and backend logs |
Troubleshooting empty direct injection resultsβ
If direct injection returns no content, check:
| Check | Description |
|---|---|
| Documents are actually indexed | Upload success does not guarantee chunks were written to the vector store |
| Retriever config is complete | The KB still needs a valid retrievalConfig |
| Index strategy matches writes | For example, per_user or per_dataset must match the actual data layout |
| Data really exists in the vector store | Inspect the ES index or Qdrant collection directly if needed |
| Backend logs include diagnostics | Current logs include index name, hit count, empty-result warnings, and sample metadata |
If logs show:
- the request reached
/api/internal/rag/retrieve - Backend resolved the route to
direct_injection - but the final hit count is 0
then the problem is usually in indexing or data layout, not request authentication.