Skip to main content

Retrieval And Direct Injection

Knowledge Base conversations do not rely on a single access pattern. In addition to standard RAG retrieval, Wegent also keeps a "small KB full direct injection" path that can load all chunks from a knowledge base and pass them to the model when needed.


Why direct injection exists​

The main reason all-chunks was introduced is that vector retrieval quality was not stable enough in some scenarios.

For smaller knowledge bases with limited total content, but where the question depends on broader context, pure vector retrieval can fail in a few common ways:

  • Recall misses important sections
  • Returned chunks are locally relevant but insufficient for a global judgment
  • Slight query wording changes produce unstable results

In those cases, loading all chunks and injecting them into the model is often more reliable than using only a small top-k retrieval result.


The two paths​

PathMain APIBest forCharacteristics
Retrieval mode/api/internal/rag/retrieveMedium to large KBs, targeted queries, normal RAG Q&ALower cost, more focused context
Direct injection mode/api/internal/rag/all-chunksSmall KBs, full-context reasoning, unstable vector recallMore complete recall, higher context cost

In practice:

  • retrieve is the normal search path
  • all-chunks is a complementary path for small knowledge bases, not a replacement for retrievers

When all-chunks is useful​

This path is typically useful when:

  • The total KB content can fit into the model context window
  • The task requires broad diagnosis, synthesis, or judgment
  • Retrieval quality is not good enough and direct injection gives more stable answers

Typical examples:

  • "Please use the knowledge base to diagnose whether my progress is off track."
  • "Based on these documents, summarize the main risks in our current plan."

These prompts depend on global context more than a few top-k chunks.


Permission model​

all-chunks should follow the same permission model as /api/internal/rag/retrieve.

That means:

  • It is part of the internal RAG service-to-service API
  • Access control should be validated earlier, during task context and knowledge base selection
  • all-chunks should not introduce an extra user-level blocking rule that behaves differently from retrieve

This keeps the chain consistent:

  • Both APIs serve the same knowledge access flow
  • If retrieve is allowed but all-chunks is blocked separately, behavior becomes inconsistent
  • Small-KB direct injection should not be blocked by an endpoint-specific rule

Safe Summaries For Restricted Analyst​

When the KB user permission is Restricted Analyst, the system no longer passes raw retrieved chunks directly to the final answering model. Instead, knowledge_base_search first produces a safe summary internally.

The goals of this path are:

  • allow the model to use the KB for high-level analysis, diagnosis, risk judgment, and recommendations
  • avoid exposing original wording, exact definitions, KPI values, titles, filenames, or document structure
  • move the β€œcan this be answered safely?” decision into the KB tool itself

In practice:

  • normal mode: the main model can see retrieval results directly
  • restricted mode: the main model only receives a safe analysis artifact generated by a secondary model

Questions That Still Work In Restricted Mode​

These kinds of prompts can still use the knowledge base:

  • "Please use the knowledge base to diagnose whether my progress is off track."
  • "Using the knowledge base, how should KPI design for search scenarios align with our direction?"
  • "Based on the knowledge base, what risks and gaps exist in the current plan?"

These prompts are allowed because they are mainly about:

  • analysis, diagnosis, synthesis, judgment, or recommendations
  • not asking the model to reproduce source wording
  • not asking for protected details directly

The answer should stay high-level and directional instead of restating protected KB content.


Questions That Will Usually Be Refused​

These kinds of prompts should not be answered from the KB directly:

  • "What is a value user?"
  • "What are this year's search business KPI targets?"
  • "What content does the knowledge base protect?"
  • "What categories are you not allowed to disclose?"
  • "Give me the original definition text."

These prompts are treated as unsafe because they try to extract:

  • definitions, exact numbers, KPI targets, titles, listings, or document details
  • meta-disclosure about what is protected or what cannot be revealed
  • source-level wording instead of high-level reasoning

In these cases the system should either:

  • refuse directly
  • or redirect the user toward high-level analysis, diagnosis, or recommendations

Impact On all-chunks​

all-chunks still exists to support "small KB full direct injection", but in restricted mode its role changes:

  • it can still provide broader context when vector retrieval is unstable
  • the raw full-KB chunks are then converted into a safe summary before reaching the main model

That means:

  • all-chunks still improves stability for small KBs
  • but in restricted mode the final model sees a safe summary, not raw chunk content

This preserves the recall benefit of full-context access while reducing disclosure risk.


Recommendations​

RecommendationExplanation
Keep retrievers configuredall-chunks is a supplement, not a substitute for normal indexing and retrieval
Keep direct injection for small KBsIt often improves stability when KB size is manageable
Use retrieval for large KBsFull injection becomes expensive in context and cost
Use logs for debuggingIf all-chunks returns empty, check indexing, document state, and backend logs

Troubleshooting empty all-chunks results​

If all-chunks returns no content, check:

CheckDescription
Documents are actually indexedUpload success does not guarantee chunks were written to the vector store
Retriever config is completeThe KB still needs a valid retrievalConfig
Index strategy matches writesFor example, per_user or per_dataset must match the actual data layout
Data really exists in the vector storeInspect the ES index or Qdrant collection directly if needed
Backend logs include diagnosticsCurrent logs include index name, hit count, empty-result warnings, and sample metadata

If logs show:

  • the request reached /api/internal/rag/all-chunks
  • get_all_chunks start was logged
  • but the final hit count is 0

then the problem is usually in indexing or data layout, not request authentication.