Skip to main content

MCP Tool Refactoring Guide

English | 简体中文

This document describes the refactored architecture for Knowledge MCP tools, helping developers quickly understand the current implementation state and future development direction.


📋 Refactoring Overview

Background

The original MCP tools directly called KnowledgeService, which had the following issues:

  1. Business logic duplicated between REST API and MCP tools
  2. Auto-configuration logic (retriever, embedding model) was inconsistent
  3. MCP tools required manual parameter schema definition
  4. Different async mechanisms (BackgroundTasks vs Celery) caused code duplication

Goals

  1. Unified Business Layer: Introduce KnowledgeOrchestrator as a unified business orchestration layer
  2. Decorator-based Auto-registration: Use @mcp_tool decorator to auto-generate MCP schema
  3. Auto-selection: Implement automatic selection of retriever, embedding, and summary model at the Orchestrator layer
  4. Unified Async Mechanism: Use Celery for all async indexing tasks

🏗️ Architecture Diagram

Post-refactoring Architecture

┌─────────────────────────────────────────────────────────────┐
│ Entry Layer │
├─────────────────────────────────────────────────────────────┤
│ REST API (FastAPI) │ MCP Tools (Standalone) │
│ app/api/endpoints/ │ app/mcp_server/tools/ │
│ knowledge.py │ knowledge.py │
│ │ @mcp_tool decorator │
└──────────────┬───────────────┴───────────────┬───────────────┘
│ │
│ Unified Calls │
▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ Orchestrator Layer │
├─────────────────────────────────────────────────────────────┤
│ KnowledgeOrchestrator │
│ app/services/knowledge/orchestrator.py │
│ │
│ Responsibilities: │
│ - Auto-select retriever/embedding/summary model │
│ - Orchestrate complete business workflows │
│ - Schedule async tasks via Celery │
└──────────────────────────────┬──────────────────────────────┘
│ │
▼ ▼
┌────────────────────────┐ ┌────────────────────────────────┐
│ Service Layer │ │ Async Task Layer │
├────────────────────────┤ ├────────────────────────────────┤
│ KnowledgeService │ │ Celery Tasks │
│ knowledge_service.py │ │ tasks/knowledge_tasks.py │
│ │ │ │
│ - Database CRUD │ │ - index_document_task │
│ - Basic validation │ │ - generate_document_summary │
└────────────────────────┘ └───────────────┬────────────────┘


┌────────────────────────────────┐
│ Shared Indexing Module │
├────────────────────────────────┤
│ services/knowledge/indexing.py │
│ │
│ - run_document_indexing() │
│ - KnowledgeBaseIndexInfo │
│ - RAGIndexingParams │
└────────────────────────────────┘

📁 Core Files

FilePurposeStatus
backend/app/mcp_server/tools/decorator.py@mcp_tool decorator implementation✅ Complete
backend/app/mcp_server/tools/knowledge.pyKnowledge MCP tool definitions✅ Complete
backend/app/services/knowledge/orchestrator.pyBusiness orchestration layer✅ Complete
backend/app/services/knowledge/indexing.pyShared RAG indexing logic✅ Complete
backend/app/tasks/knowledge_tasks.pyCelery tasks for async operations✅ Complete
backend/app/api/endpoints/knowledge.pyREST API endpoints✅ Uses Celery

🔧 @mcp_tool Decorator

Usage

from app.mcp_server.tools.decorator import mcp_tool

@mcp_tool(
name="create_knowledge_base",
description="Create a new knowledge base",
server="knowledge",
param_descriptions={
"name": "Knowledge base name",
"description": "Optional description",
},
)
def create_knowledge_base(
token_info: TaskTokenInfo, # Auto-injected, excluded from MCP schema
name: str,
description: Optional[str] = None,
) -> Dict[str, Any]:
...

Features

  • token_info parameter automatically excluded from MCP schema
  • Support custom parameter descriptions
  • Auto-infer parameter types and default values
  • Build MCP tools dict via build_mcp_tools_dict()

📊 API Migration Status

All APIs Migrated ✅

APIREST APIMCP ToolNotes
List Knowledge Baseslist_knowledge_bases
List Documentslist_documents
Create Knowledge Basecreate_knowledge_base
Get Knowledge Baseget_knowledge_base
Update Knowledge Baseupdate_knowledge_base
Create Documentcreate_document - Both use Celery
Update Document Contentupdate_document_content - Both use Celery
Reindex Documentreindex_document - Uses Orchestrator
Create Web Documentcreate_web_document - Uses Orchestrator
Refresh Web Documentrefresh_web_document - Uses Orchestrator
Delete DocumentMCP tool not implemented

Unified Async Mechanism

Both REST API and MCP tools now use Celery for async task scheduling:

# Both REST API and MCP tools use the same approach
from app.tasks.knowledge_tasks import index_document_task

index_document_task.delay(
knowledge_base_id=str(kb_id),
attachment_id=attachment_id,
retriever_name=retriever_name,
retriever_namespace=retriever_namespace,
embedding_model_name=embedding_model_name,
embedding_model_namespace=embedding_model_namespace,
user_id=user_id,
user_name=user_name,
document_id=document_id,
splitter_config_dict=splitter_config,
trigger_summary=True,
)

🎯 Summary Model Auto-selection Logic

When summary_enabled=True but summary_model_ref is not specified:

Selection Priority

  1. Task Model Resolution: Task → Team → Bot → Model
  2. First Available LLM: Via model_aggregation_service.list_available_models()
  3. Fallback: If no model available, automatically set summary_enabled to False

Model Type Field

summary_model_ref must include a type field to distinguish model source:

summary_model_ref = {
"name": "model-name",
"namespace": "default",
"type": "public" # or "user" or "group"
}
TypeDescription
publicSystem public model (user_id=0)
userUser private model (namespace=default)
groupGroup shared model (namespace=group_name)

🧪 Test Coverage

Test FileCoverage
backend/tests/mcp_server/test_tools_decorator.py@mcp_tool decorator tests
backend/tests/services/knowledge/test_orchestrator.pyOrchestrator business logic tests

📝 Future Development Suggestions

Potential Enhancements

  1. Implement delete_document MCP tool: Complete functionality documented in SKILL.md
  2. Add reindex_document MCP tool: Expose reindex capability to AI agents (Orchestrator method ready)
  3. Add create_web_document MCP tool: Expose web scraping capability to AI agents (Orchestrator method ready)
  4. Batch operations: Consider adding batch create/update/delete for efficiency

Design Principles

  • All business logic centralized in KnowledgeOrchestrator
  • REST API and MCP Tools only handle parameter parsing and response formatting
  • Async task scheduling unified through Celery
  • Shared indexing logic in services/knowledge/indexing.py to avoid circular imports
  • Web scraping logic also centralized in Orchestrator (create_web_document, refresh_web_document)