深入剖析 Hermes Agent：一個 AI 作業系統的底層邏輯與實作

前言：為什麼要深入了解 Hermes Agent？

在 AI Agent 爆發的時代，我們見過太多「聊天機器人」包裝成「Agent」的產品。但真正能跨平台運作、具備持久記憶、支援多模型切換、還能自我排程執行任務的 AI Agent 框架，少之又少。

Hermes Agent 就是這樣一個開源框架——它不只是一個 ChatGPT wrapper，而是一個完整的 AI 作業系統。本文將從原始碼層級，深入剖析 Hermes Agent 的底層邏輯與實作方式。

一、整體架構：五層設計

Hermes Agent 是一個純 Python 專案，核心程式碼量超過 50,000 行。其架構可以分為五個層級：

┌─────────────────────────────────────────┐
│           Gateway Layer                  │
│   (Telegram / Discord / CLI / ACP)       │
├─────────────────────────────────────────┤
│           AIAgent Core                   │
│   (run_agent.py — 9,910 行)              │
├─────────────────────────────────────────┤
│         Tool Orchestration               │
│   (model_tools.py + tools/registry.py)   │
├─────────────────────────────────────────┤
│          Tool Implementations            │
│   (tools/*.py — 40+ 工具模組)            │
├─────────────────────────────────────────┤
│       Infrastructure Layer               │
│   (Memory / Sessions / Cron / Plugins)   │
└─────────────────────────────────────────┘

核心的依賴鏈是這樣的：

tools/registry.py → 無依賴，所有 tool 檔案 import 它來註冊自己
tools/*.py → 各自 import registry.register()，在模組載入時自動註冊
model_tools.py → import tools/registry + 觸發所有 tool 模組的 discovery
run_agent.py → 核心 AIAgent 類別，消費 model_tools 提供的 API
cli.py / gateway/run.py → 分別是 CLI 與多平台 gateway 的入口

二、訊息處理流程：從輸入到回應的完整 Pipeline

當使用者在 Telegram 發送一則訊息，到 Agent 回覆，中間經歷了什麼？這是 Hermes 最核心的流程：

Step 1：Gateway 接收

Gateway 層的 GatewayRunner（7,905 行）負責接收來自各平台的訊息。以 Telegram 為例，TelegramAdapter 使用 python-telegram-bot 庫進行 long polling，收到訊息後觸發 _handle_message()。

Step 2：Session 管理與使用者驗證

每個聊天會話都有一個 SessionSource，記錄平台、聊天 ID、使用者 ID 等元資料：

@dataclass
class SessionSource:
    platform: Platform      # telegram, discord, cli...
    chat_id: str
    chat_name: Optional[str]
    chat_type: str          # "dm", "group", "channel"
    user_id: Optional[str]
    user_name: Optional[str]
    thread_id: Optional[str]

Step 3：核心對話迴圈

AIAgent 的核心是一個 tool-use loop，位於 run_agent.py：

while api_call_count < self.max_iterations:
    # 1. 檢查中斷請求（使用者發新訊息可中斷）
    if self._interrupt_requested:
        break
    
    # 2. 組裝 API messages
    #    system prompt + ephemeral context + 對話歷史
    api_messages = build_messages(messages)
    
    # 3. 套用 Anthropic prompt caching（如適用）
    if self._use_prompt_caching:
        api_messages = apply_anthropic_cache_control(api_messages)
    
    # 4. 呼叫 LLM API
    response = client.chat.completions.create(**api_kwargs)
    
    # 5. 若有 tool_calls → 逐一執行 → 結果加入 messages → 繼續迴圈
    # 6. 若無 tool_calls → 返回最終回應

關鍵設計：Agent 會持續呼叫 LLM，直到模型不再要求使用工具為止。每一輪 tool call 的結果都會被加回對話歷史，讓模型看到執行結果後決定下一步。

Step 4：多 API 模式支援

Hermes 支援三種 API 呼叫模式：

OpenAI Chat Completions（預設）：相容任何 OpenAI-compatible endpoint
Anthropic Messages API：直連 Anthropic，支援 prompt caching
Codex Responses API：OpenAI o-series 模型的專用模式

三、Tool 系統：Import-Time 自動註冊

Hermes 的工具系統是其最精巧的設計之一。它採用 Registry 模式，讓每個工具在 Python import 時自動註冊自己。

ToolRegistry 單例

class ToolEntry:
    __slots__ = ("name", "toolset", "schema", "handler",
                 "check_fn", "requires_env", "is_async",
                 "description", "emoji", "max_result_size_chars")

class ToolRegistry:
    def register(self, name, toolset, schema, handler, 
                 check_fn=None, requires_env=None, is_async=False):
        self._tools[name] = ToolEntry(...)
    
    def dispatch(self, name, args, **kwargs) -> str:
        entry = self._tools.get(name)
        return entry.handler(args, **kwargs)
    
    def get_definitions(self, tool_names) -> List[dict]:
        # 只返回 check_fn() 通過的工具
        # 例如：browser 工具只在有瀏覽器環境時才可用

Discovery 流程

在 model_tools.py 中，Tool Discovery 分三步：

def _discover_tools():
    modules = [
        "tools.web_tools",      # web_search, web_extract
        "tools.terminal_tool",  # terminal, process
        "tools.file_tools",     # read_file, write_file, patch
        "tools.vision_tools",   # vision_analyze
        "tools.delegate_tool",  # delegate_task
        "tools.memory_tool",    # memory, session_search
        # ... 共 20+ 模組
    ]
    for mod_name in modules:
        importlib.import_module(mod_name)  # 觸發 register()

_discover_tools()       # Step 1: 內建工具
discover_mcp_tools()    # Step 2: MCP 伺服器工具
discover_plugins()      # Step 3: 插件工具

Toolset 分組

工具被分組成 Toolset，可以按需啟用：

TOOLSETS = {
    "web":      {"tools": ["web_search", "web_extract"]},
    "terminal": {"tools": ["terminal", "process"]},
    "file":     {"tools": ["read_file", "write_file", "patch", "search_files"]},
    "browser":  {"tools": ["browser_navigate", "browser_snapshot", ...]},
    # 支援 includes 巢狀引用其他 toolset
}

這讓子代理可以只攜帶需要的工具集（例如研究任務只給 web + file），避免 token 浪費和安全風險。

四、記憶系統：三層架構

Hermes 的記憶系統分為三層，從長期到短期：

Layer 1：持久化記憶（Persistent Memory）

以檔案形式存放在 ~/.hermes/memories/：

MEMORY.md：Agent 的個人筆記（環境事實、工具怪癖、學到的教訓）
USER.md：使用者相關資訊（名字、偏好、溝通風格）

關鍵設計——凍結快照模式：

class MemoryStore:
    def load_from_disk(self):
        # 載入 → 去重 → 凍結 snapshot
        self._system_prompt_snapshot = {
            "memory": self._render_block("memory", self.memory_entries),
            "user": self._render_block("user", self.user_entries),
        }
        # system prompt 的 memory 在 session 開始時凍結
        # mid-session 的寫入只影響檔案，不改 system prompt
        # 這保護了 Anthropic 的 prefix cache

還有嚴格的安全防護——每筆寫入都會掃描 prompt injection 模式：

_MEMORY_THREAT_PATTERNS = [
    (r'ignore\s+(previous|all)\s+instructions', "prompt_injection"),
    (r'you\s+are\s+now\s+', "role_hijack"),
    (r'curl\s+[^\n]*\$\{?\w*(KEY|TOKEN|SECRET)', "exfil_curl"),
]

Layer 2：Session 記憶（SQLite + FTS5）

所有對話歷史存在 SQLite 資料庫中，使用 FTS5 全文搜索索引：

class SessionDB:
    # WAL mode，支援併發讀寫
    # sessions 表：id, source, model, system_prompt, token 統計
    # messages 表：role, content, tool_calls, reasoning
    # messages_fts 虛擬表：FTS5 全文索引

這讓 Agent 可以用 session_search 工具跨 session 搜尋過往對話——「我們上次討論的那個 Docker 問題怎麼解決的？」

Layer 3：Context 壓縮

當對話變長時，ContextCompressor 會自動壓縮：

修剪舊的 tool results（不需 LLM）
保護 head messages（system prompt + 第一輪對話）
保護 tail messages（最近 ~20K tokens）
用輔助 LLM 摘要中間部分
後續壓縮：迭代更新先前摘要

五、多平台 Gateway：Adapter 模式

Hermes 支援超過 20 個通訊平台，採用 Adapter 模式 實作：

class BasePlatformAdapter(ABC):
    """所有平台的統一介面"""
    async def connect(self) -> bool: ...
    async def disconnect(self): ...
    async def send(self, chat_id, text, **kwargs): ...
    async def send_voice(self, chat_id, audio_path): ...
    async def send_image_file(self, chat_id, image_path): ...
    def set_message_handler(self, handler): ...

目前已實作的 Adapter 包括：

通訊平台：Telegram、Discord、Slack、WhatsApp、Signal、Matrix、Mattermost
中國平台：微信（WeChat）、釘釘、飛書、企業微信
其他：SMS、Email、Home Assistant、Webhook、BlueBubbles（iMessage）
開發者：CLI、API Server、ACP（IDE 整合）

每個 Adapter 只需實作 BasePlatformAdapter 的介面，就能無縫接入 Hermes 的工具生態和記憶系統。

六、Sub-Agent 系統：受控的委派

Hermes 的 delegate_task 工具讓主 Agent 可以派生子代理（Sub-Agent）來執行獨立任務：

DELEGATE_BLOCKED_TOOLS = frozenset([
    "delegate_task",   # 禁止遞迴委派
    "clarify",         # 禁止用戶互動
    "memory",          # 禁止寫入共享 MEMORY.md
    "send_message",    # 禁止跨平台副作用
    "execute_code",    # 子代理應逐步推理
])

MAX_CONCURRENT_CHILDREN = 3  # 最多 3 個並行子代理
MAX_DEPTH = 2                # 最深 2 層巢狀

子代理的設計非常講究：

工具隔離：與父代理的工具集取交集，再移除被封鎖的工具
獨立 Terminal Session：每個子代理有自己的 task_id，Terminal 工具在各自的工作目錄中執行
專注的 System Prompt：只包含任務目標和 context，不帶父代理的完整人格
深度限制：防止子代理無限遞迴委派

批次任務使用 ThreadPoolExecutor 並行執行，最多 3 個同時運行。

七、ACP：Agent Communication Protocol

ACP 是 Hermes 的另一個亮點——它實作了 Agent Communication Protocol，讓 Hermes 可以：

被 IDE 整合（VS Code、Zed、JetBrains）
作為 ACP 主機，呼叫其他 ACP 代理（如 Claude Code、Gemini CLI）

class HermesACPAgent(acp.Agent):
    # 支援的操作：
    # - initialize(): 回報 agent 能力
    # - new_session() / load_session(): session 管理
    # - prompt(): 核心對話方法 → ThreadPoolExecutor 中執行 AIAgent
    # - fork_session(): 分叉 session
    # - list_sessions(): 列出所有 session
    # - command(): 執行斜線命令
    
    # MCP 橋接：IDE 傳入的 MCP 伺服器自動整合到 Hermes 工具表面
    async def _register_session_mcp_servers(self, state, mcp_servers):
        ...

這意味著你可以在 Hermes 中執行 delegate_task(acp_command="claude")，讓 Claude Code 作為子代理來執行程式碼任務——跨框架的 Agent 協作。

八、Cron 排程系統

Hermes 內建了完整的排程系統：

# 排程格式支援多種語法：
parse_schedule("30m")              # 30 分鐘後執行一次
parse_schedule("every 2h")         # 每 2 小時重複
parse_schedule("0 9 * * *")        # Cron 表達式
parse_schedule("2026-02-03T14:00") # 一次性時間戳

排程器每 60 秒 tick 一次，使用檔案鎖防止併發。每個到期任務會建立一個獨立的 AIAgent 來執行，結果自動遞送到指定平台：

"local"：僅存本地檔案
"origin"：回覆到建立任務的原始聊天
"telegram" / "discord"：指定平台的 Home Channel
"telegram:chat_id"：指定特定聊天

九、安全機制：多層防護

作為一個有權限存取使用者資料的 Agent，安全是重中之重：

危險命令審批

DANGEROUS_PATTERNS = [
    (r'\brm\s+-[^\s]*r', "recursive delete"),
    (r'\bchmod\s+.*777', "world-writable permissions"),
    (r'\bDROP\s+(TABLE|DATABASE)', "SQL DROP"),
    (r'\b(curl|wget)\b.*\|\s*(ba)?sh', "pipe remote to shell"),
    (r'\bgit\s+push\b.*--force', "git force push"),
    # ... 30+ 模式
]

CLI 模式下互動式提示，Gateway 模式支援異步審批（/approve, /deny），還有 Smart Approval 讓輔助 LLM 自動評估低風險命令。

Terminal 沙箱

支援 7 種執行環境：本地執行、Docker 容器隔離、SSH 遠端、Modal 雲端沙箱、Daytona 開發環境等，可依安全需求選擇隔離等級。

Prompt Injection 防護

不只是記憶寫入，連 context file（如 AGENTS.md）都會掃描注入模式，偵測不可見 Unicode 字元等攻擊手法。

十、Profile 系統與多模型切換

Profile 系統讓你可以在同一台機器上運行多個獨立的 Agent 身份：

~/.hermes/
├── config.yaml          # 預設 profile
├── memories/
├── sessions/
└── profiles/
    ├── work/            # 工作用 profile
    │   ├── config.yaml  # 不同的模型、API key
    │   ├── memories/    # 獨立的記憶
    │   └── cron/        # 獨立的排程
    └── research/        # 研究用 profile
        └── ...

每個 profile 完全隔離：config、API keys、記憶、session、cron、skills 都是獨立的。

多模型切換支援兩種方式：

config.yaml：設定預設模型和 fallback 鏈
/model 命令：Gateway 中即時切換，每個 session 可獨立覆寫

Failover 機制會自動偵測錯誤類型（rate limit、auth failure、context overflow），觸發 fallback 到備選模型，或啟動 context 壓縮。

結語：不只是 Agent，是 AI 作業系統

深入 Hermes Agent 的原始碼後，我最大的感受是：這不是一個聊天機器人框架，而是一個 AI 作業系統。

它有自己的：

行程管理（Sub-Agent 委派、並行執行、深度限制）
檔案系統（持久記憶、Session 資料庫、Skill 目錄）
排程系統（Cron Job、Heartbeat）
驅動程式（20+ 平台 Adapter）
安全模型（命令審批、注入防護、沙箱隔離）
IPC 協議（ACP、MCP）

核心設計哲學是 Registry 模式 + Adapter 模式 + Plugin 系統 的鬆耦合架構。工具在 import 時自動註冊，平台透過統一介面接入，插件透過 hook 機制擴展——每一層都可以獨立替換和擴展。

如果你正在尋找一個真正可用、可擴展、可自訂的 AI Agent 框架，Hermes Agent 值得一看。它的原始碼可能是目前市面上最完整的 Agent 實作參考。

深入剖析 Hermes Agent：一個 AI 作業系統的底層邏輯與實作

前言：為什麼要深入了解 Hermes Agent？

一、整體架構：五層設計

二、訊息處理流程：從輸入到回應的完整 Pipeline

Step 1：Gateway 接收

Step 2：Session 管理與使用者驗證

Step 3：核心對話迴圈

Step 4：多 API 模式支援

三、Tool 系統：Import-Time 自動註冊

ToolRegistry 單例

Discovery 流程

Toolset 分組

四、記憶系統：三層架構

Layer 1：持久化記憶（Persistent Memory）

Layer 2：Session 記憶（SQLite + FTS5）

Layer 3：Context 壓縮

五、多平台 Gateway：Adapter 模式

六、Sub-Agent 系統：受控的委派

七、ACP：Agent Communication Protocol

八、Cron 排程系統

九、安全機制：多層防護

危險命令審批

Terminal 沙箱

Prompt Injection 防護

十、Profile 系統與多模型切換

結語：不只是 Agent，是 AI 作業系統

電子報

最近文章

主題導覽

精選閱讀

深入剖析 Hermes Agent：一個 AI 作業系統的底層邏輯與實作

馬斯克的太空 AI 資料中心夢想：物理學上的三大「窒礙難行」死穴 (可行性深度分析)

AI 泡沫論的真相 (2026版)：為何好用卻不賺錢？從 Klarna 逆轉到 7000 億豪賭的深度解析