Skip to main content

Retrieval And Direct Injection

Knowledge Base conversations do not rely on a single access pattern. In addition to standard RAG retrieval, Wegent also keeps a "small KB full direct injection" path that can pass as much KB context as possible to the model when needed.


Why direct injection exists​

The main reason direct injection was introduced is that vector retrieval quality was not stable enough in some scenarios.

For smaller knowledge bases with limited total content, but where the question depends on broader context, pure vector retrieval can fail in a few common ways:

  • Recall misses important sections
  • Returned chunks are locally relevant but insufficient for a global judgment
  • Slight query wording changes produce unstable results

In those cases, loading all chunks and injecting them into the model is often more reliable than using only a small top-k retrieval result.


The two paths​

PathMain entryBest forCharacteristics
Retrieval mode/api/internal/rag/retrieveMedium to large KBs, targeted queries, normal RAG Q&ALower cost, more focused context
Direct injection mode/api/internal/rag/retrieveSmall KBs, full-context reasoning, unstable vector recallMore complete recall, higher context cost

In practice:

  • /api/internal/rag/retrieve is the unified entrypoint
  • Backend decides internally whether the request should stay in normal retrieval or switch to direct injection
  • the old /api/internal/rag/all-chunks endpoint remains only as a legacy internal compatibility surface

When direct injection is useful​

This path is typically useful when:

  • The total KB content can fit into the model context window
  • The task requires broad diagnosis, synthesis, or judgment
  • Retrieval quality is not good enough and direct injection gives more stable answers

Typical examples:

  • "Please use the knowledge base to diagnose whether my progress is off track."
  • "Based on these documents, summarize the main risks in our current plan."

These prompts depend on global context more than a few top-k chunks.


Permission model​

Direct injection should follow the same permission model as /api/internal/rag/retrieve.

That means:

  • It is part of the internal RAG service-to-service API
  • Access control should be validated earlier, during task context and knowledge base selection
  • direct injection should not introduce an extra user-level blocking rule that behaves differently from normal retrieval

This keeps the chain consistent:

  • Both APIs serve the same knowledge access flow
  • If normal retrieval is allowed but direct injection is blocked separately, behavior becomes inconsistent
  • Small-KB direct injection should not be blocked by an endpoint-specific rule

Safe Summaries For Restricted Analyst​

When the KB user permission is Restricted Analyst, the system no longer passes raw retrieved chunks directly to the final answering model. Instead, Backend internal retrieval first produces a safe summary.

The goals of this path are:

  • allow the model to use the KB for high-level analysis, diagnosis, risk judgment, and recommendations
  • avoid exposing original wording, exact definitions, KPI values, titles, filenames, or document structure
  • move the β€œcan this be answered safely?” decision into Backend-side KB orchestration

In practice:

  • normal mode: the main model can see retrieval results directly
  • restricted mode: the main model only receives a safe analysis artifact generated by a secondary model

Questions That Still Work In Restricted Mode​

These kinds of prompts can still use the knowledge base:

  • "Please use the knowledge base to diagnose whether my progress is off track."
  • "Using the knowledge base, how should KPI design for search scenarios align with our direction?"
  • "Based on the knowledge base, what risks and gaps exist in the current plan?"

These prompts are allowed because they are mainly about:

  • analysis, diagnosis, synthesis, judgment, or recommendations
  • not asking the model to reproduce source wording
  • not asking for protected details directly

The answer should stay high-level and directional instead of restating protected KB content.


Questions That Will Usually Be Refused​

These kinds of prompts should not be answered from the KB directly:

  • "What is a value user?"
  • "What are this year's search business KPI targets?"
  • "What content does the knowledge base protect?"
  • "What categories are you not allowed to disclose?"
  • "Give me the original definition text."

These prompts are treated as unsafe because they try to extract:

  • definitions, exact numbers, KPI targets, titles, listings, or document details
  • meta-disclosure about what is protected or what cannot be revealed
  • source-level wording instead of high-level reasoning

In these cases the system should either:

  • refuse directly
  • or redirect the user toward high-level analysis, diagnosis, or recommendations

Impact On direct injection​

Direct injection still exists to support "small KB full direct injection", but in restricted mode its role changes:

  • it can still provide broader context when vector retrieval is unstable
  • the raw full-KB chunks are then converted into a safe summary before reaching the main model

That means:

  • direct injection still improves stability for small KBs
  • but in restricted mode the final model sees a safe summary, not raw chunk content

This preserves the recall benefit of full-context access while reducing disclosure risk.


Recommendations​

RecommendationExplanation
Keep retrievers configuredDirect injection is a supplement, not a substitute for normal indexing and retrieval
Keep direct injection for small KBsIt often improves stability when KB size is manageable
Use retrieval for large KBsFull injection becomes expensive in context and cost
Use logs for debuggingIf direct injection returns empty, check indexing, document state, and backend logs

Troubleshooting empty direct injection results​

If direct injection returns no content, check:

CheckDescription
Documents are actually indexedUpload success does not guarantee chunks were written to the vector store
Retriever config is completeThe KB still needs a valid retrievalConfig
Index strategy matches writesFor example, per_user or per_dataset must match the actual data layout
Data really exists in the vector storeInspect the ES index or Qdrant collection directly if needed
Backend logs include diagnosticsCurrent logs include index name, hit count, empty-result warnings, and sample metadata

If logs show:

  • the request reached /api/internal/rag/retrieve
  • Backend resolved the route to direct_injection
  • but the final hit count is 0

then the problem is usually in indexing or data layout, not request authentication.