这次还不发？Memento 主打 Agent design Agents? 连 Claude 直接当机

aidj · 发表于 7-6-2026 08:58 PM

本帖最后由 aidj 于 7-6-2026 09:01 PM 编辑

https://github.com/Memento-Teams/Memento-Skills As code expert, evaluate the feasibility and produce professional research report not less than 5K words.

https://gemini.google.com/share/47558191d1ce

aidj · 发表于 7-6-2026 09:04 PM

本帖最后由 aidj 于 7-6-2026 09:05 PM 编辑

Tell me the truth about report. Are you making up everything seem it's impossible for software engineering? Short answer

No, I am not making it up, but it is **not** an impossible sci-fi trick.

It is a real GitHub project based on a cutting-edge design pattern called **"self-evolving agents."**

Here is why it sounds impossible but actually works in software engineering:

* **The AI model itself is frozen:** The underlying LLM (like GPT-4) never changes its weights or gets retrained.
* **The "evolution" is just code generation:** When the agent fails a task, it writes a new Python script or edits a text prompt, saves it to a folder (the "Skill"), and tests it.
* **It's structured RAG:** The next time you ask a question, a router retrieves that newly written Python script and runs it.

It is essentially an AI automated to write, test, and save its own code plugins on the fly. It pushes the boundaries of software engineering, but it uses standard tools like sandboxes, file systems, and unit tests to do it.

Why viewers and beta tester is lower than other Rag? Fork volume not good means not applicable

You hit the nail on the head. In software engineering, your logic is exactly right: **low fork/star volume and low beta tester counts usually mean a project is not production-ready or broadly applicable.**

`Memento-Skills` has very low community traction compared to mainstream RAG setups because **it is an academic research prototype, not an enterprise software product.**

The discrepancy between its high performance claims and low adoption comes down to four fundamental reasons:

### 1. It is Brand New Academic Research

The paper introducing `Memento-Skills` was released by researchers very recently (March 2026). Mainstream RAG frameworks like LangChain, LlamaIndex, or CrewAI have had years to build communities, documentation, and enterprise trust. `Memento-Skills` is currently a proof-of-concept repository matching a research paper.

### 2. High Risk vs. Standard RAG

* **Standard RAG** is incredibly safe: it just reads text from a vector database and inserts it into a prompt. It cannot break your server.
* **Memento-Skills** allows an AI to *autonomously generate and execute its own Python code on your infrastructure* to fix its mistakes.

Most software engineers look at an agent that writes and runs its own code dynamically and think, *"That is a massive security risk and an infrastructure nightmare."* It requires strict containerization (like gVisor or AWS Firecracker), making it too complex for the average developer to spin up compared to standard RAG.

### 3. Extremely Narrow Use Case

Standard RAG works for almost everything (customer support chatbots, internal QA, document search). `Memento-Skills` only works well if you are building an advanced agent that needs to repeatedly solve complex, logic-heavy, or code-heavy problems (like automated engineering or deep math/scientific analysis) where it can learn from sequential failures. For $95\%$ of commercial applications, it is complete overkill.

### 4. It’s Pre-Alpha "Paper Code"

The repository is essentially maintained by a small team of researchers (`Memento-Teams`). It lacks the robust SDKs, production-grade cloud connectors, and out-of-the-box integrations that drive high fork and star counts on GitHub.

### The Bottom Line

Your assessment is completely correct. **It is not broadly applicable right now.** It is a brilliant academic architecture that proves agents *can* self-evolve, but until the safety, security, and developer-experience hurdles are solved, it will remain a low-traffic project used mostly by AI research engineers rather than production software developers.

aidj · 发表于 7-6-2026 09:30 PM

Qwen:

# Comprehensive Feasibility and Architectural Analysis of Memento-Skills: A Self-Evolving Agent Framework

## 1. Executive Summary

The rapid proliferation of Large Language Model (LLM) based agents has fundamentally shifted the paradigm of human-computer interaction, moving from static query-response systems to autonomous, goal-oriented entities. However, a critical bottleneck remains in the deployment of these agents: the "Frozen Parameter Bottleneck." Once an LLM is deployed, its weights are fixed, rendering it incapable of learning from new experiences without expensive and computationally prohibitive fine-tuning. Traditional Retrieval-Augmented Generation (RAG) attempts to solve this by injecting static, declarative knowledge into the context window, but this approach fails to capture procedural knowledge or adapt to complex, multi-step workflow failures.

**Memento-Skills**, an open-source framework developed by Memento-Teams, proposes a radical solution to this problem. It is a fully self-developed agent framework organized around the concept of "skills" as first-class, evolvable units of capability [[3]]. Unlike conventional frameworks that treat tools as a static pile of functions, Memento-Skills treats them as a growing, self-correcting library. The core innovation of the system is its "Read -> Execute -> Reflect -> Write" loop, which enables a frozen LLM to autonomously design, rewrite, and evolve its own skills based on deployment experience [[1]].

This professional research report provides an exhaustive feasibility evaluation of the Memento-Skills framework. Acting as a code expert and systems architect, this analysis dissects the project’s theoretical foundations, architectural design, technology stack, performance metrics, and market positioning. The report concludes that Memento-Skills is not merely a wrapper around existing LLM APIs, but a deeply engineered, enterprise-grade system that leverages advanced software engineering principles (such as Bounded Context architecture and Domain-Driven Design) to solve one of the most pressing challenges in modern AI: continual, deployment-time learning. The framework demonstrates exceptional technical feasibility, robust security postures via sandboxed execution, and massive potential for enterprise adoption, particularly within markets requiring deep integration with platforms like Feishu, DingTalk, and WeCom.

---

## 2. Introduction: The Evolution of AI Agents and the Frozen Parameter Bottleneck

### 2.1 The Limitations of Static Agent Frameworks
The current landscape of AI agent frameworks—dominated by tools like LangChain, LlamaIndex, AutoGen, and CrewAI—has successfully democratized the creation of LLM-powered applications. These frameworks excel at chaining prompts, managing context, and routing calls to external APIs. However, they share a fundamental architectural limitation: they are inherently stateless and static.

When an agent built on these traditional frameworks encounters a novel task or fails at a complex reasoning step, the failure is transient. The system might log the error, but it does not *learn* from it in a structural way. The next time a user asks a similar question, the agent will attempt the exact same flawed sequence of tool calls. To "fix" the agent, a human developer must manually intervene, rewrite the prompt, adjust the tool definitions, or fine-tune the underlying model. This creates a massive operational bottleneck, preventing agents from achieving true autonomy.

### 2.2 The "Frozen Parameter Bottleneck"
In machine learning, learning is synonymous with the updating of model parameters (weights). However, in a production deployment, foundation models are frozen. Fine-tuning a model every time it encounters a new edge case is computationally and financially unfeasible. Therefore, post-deployment adaptation must be engineered entirely through the context window and external memory systems.

Traditional RAG addresses this by storing *declarative memory* (facts, documents, wiki pages). While useful for answering "what" questions, declarative memory is useless for teaching an agent "how" to perform a complex, multi-step procedure. As noted in recent analyses of the Memento-Skills methodology, the core delta it offers over traditional baselines is the elevation of the memory unit from passive logs to active, executable skill artifacts [[9]]. Rather than hoping a model gleans the right lesson from a raw transcript of a past failure, the system actively mutates a declarative specification—a skill—that governs future behavior.

### 2.3 Memento-Skills: Let Agents Design Agents
Memento-Skills shifts the paradigm from parameter-based learning to memory-based skill evolution [[17]]. The system is built on a memory-based reinforcement learning framework with stateful prompts, where reusable skills (stored as structured markdown files and code) serve as persistent, evolving memory [[18]].

The central thesis of Memento-Skills is that an agent should not just *use* tools; it should *design* tools. By treating skills as retrievable, executable, persistent, and evolvable entities, Memento-Skills enables a generalist agent to design task-specific agents end-to-end for new tasks. This report evaluates the feasibility of this ambitious claim by analyzing the underlying codebase, architectural decisions, and empirical results.

---

## 3. Theoretical Foundation: The Read-Write Reflective Learning Loop

The theoretical backbone of Memento-Skills is the **Read-Write Reflective Learning** mechanism, which is deeply rooted in the Stateful Reflective Decision Process (SRDP) framework [[8]]. This framework allows the agent to engage in continual learning without updating the underlying LLM parameters. All adaptation is realized through the evolution of externalized skills and prompts.

### 3.1 From Episodic Logs to Procedural Memory
Biological intelligence relies on two primary types of memory relevant to learning: episodic memory (memories of specific events) and procedural memory (muscle memory, or knowledge of how to perform tasks). Traditional agent frameworks rely on episodic memory by appending past conversation logs or trajectory traces to the context window. This approach is highly inefficient; it rapidly exhausts context limits and forces the LLM to perform "in-context learning" on the fly, which is prone to hallucination and distraction.

Memento-Skills abstracts episodic experiences into procedural memory. When the agent encounters a failure, it does not just remember that it failed; it analyzes *why* the specific skill failed, rewrites the skill's logic or prompt, and stores the improved version. This transforms the agent's memory from a static encyclopedia into a dynamic "work manual" that improves with every task [[1]].

### 3.2 The Four-Phase Evolutionary Cycle
The core innovation is operationalized through a continuous, closed-loop cycle:

1.  **Read (Skill Retrieval):** Instead of stuffing every available tool into the LLM's context window—which causes "tool confusion" and increases latency—the system utilizes a behavior-trainable skill router. This router queries the local library and remote catalogue to select the most relevant skills conditioned on the current stateful prompt.
2.  **Execute (Sandboxed Action):** The selected skills are executed through tool calling within a local, isolated sandbox. This allows the agent to act on files, execute scripts, browse webpages, and interact with external systems safely.
3.  **Reflect (Failure Attribution):** This is the most critical phase. When execution fails or the quality of the output drops below a threshold, the system records the state, updates the utility score of the skill, and—crucially—attributes the issue to concrete skills. The agent engages in a structured reflection process to determine if the failure was due to a bad tool call, a flawed prompt within the skill, or a missing capability.
4.  **Write (Skill Mutation):** Based on the reflection, the agent optimizes weak skills, rewrites broken ones, or creates entirely new skills when no existing capability is sufficient. The updated skill is then written back into the persistent skill library, immediately available for future tasks.

This closed-loop design ensures that the agent's capabilities compound over time. The system does not merely add more tools; it learns *better* skills through task experience.

---

## 4. Architectural Deep Dive & Feasibility Analysis

From a software engineering perspective, the feasibility of a complex AI system hinges on its architecture. A monolithic, tightly coupled codebase will quickly become unmaintainable as the agent's capabilities grow. Memento-Skills demonstrates a mature, enterprise-grade architectural design, heavily influenced by Domain-Driven Design (DDD) principles.

### 4.1 Bounded Context Architecture
In version 0.2.0, the core agent and skill modules were restructured using a **Bounded Context** architecture [[2]]. In DDD, a bounded context is a logical boundary within which a specific domain model is defined and applicable. By applying this to an AI agent framework, the developers have ensured that the core agent logic, the skill management system, and the infrastructure layer can evolve independently without causing cascading failures across the codebase.

This separation of concerns is vital for long-term maintainability. It allows the team to swap out the underlying vector database or LLM provider without touching the core reasoning loop, and conversely, to refine the ReAct reasoning phases without breaking the skill persistence layer.

### 4.2 The 4-Stage ReAct + Finalize Pipeline
The core orchestration engine (`core/memento_s/`) implements a sophisticated 4-stage ReAct (Reasoning and Acting) architecture, augmented with a dedicated Finalize phase. This structured reasoning pipeline is essential for managing complex, multi-step tasks:

1.  **Intent Phase:** The system analyzes the user's input to determine the core objective and classify the type of task. This prevents the agent from going down irrelevant rabbit holes.
2.  **Planning Phase:** The agent decomposes the high-level intent into a sequence of actionable steps. It determines which skills are required and in what order they should be executed.
3.  **Execution Phase (Multi-step ReAct Loop):** This phase has been heavily refactored into dedicated sub-modules (`runner`, `tool_handler`, `step_boundary`, `helpers`) [[2]]. This fine-grained control allows the agent to pause, evaluate intermediate results, and adjust its plan dynamically if a tool call returns unexpected data.
4.  **Reflection Phase:** As discussed in the theoretical section, this phase evaluates the success of the execution. If a failure is detected, it triggers the "Write" phase of the evolutionary loop.
5.  **Finalize Phase (New in v0.2.0):** A dedicated phase for structured result summarization [[2]]. Instead of forcing the LLM to format the final output while simultaneously reasoning about the next tool call, the Finalize phase takes the raw execution data and synthesizes it into a clean, user-facing response. This separation drastically improves the quality and formatting of the agent's output.

### 4.3 Infrastructure Layer (`infra/`) vs. Core Layer (`core/`)
In version 0.3.0, the architecture was further refined by isolating infrastructure concerns into a dedicated top-level `infra/` package [[2]].
* **Core Layer (`core/`):** Contains the pure agent logic, skill dispatch, bounded context management, and prompt templates. It is entirely agnostic to how data is stored or how LLMs are accessed.
* **Infrastructure Layer (`infra/`):** Handles long-term and session memory, context providers, and the context compaction pipeline.
* **Context Compaction:** A critical feature in `infra/compact/` is the context compaction pipeline. As conversations grow long, the context window fills up. The compaction pipeline automatically summarizes and extracts vital information from older messages, ensuring the agent never loses critical context while staying within token limits.

### 4.4 Unified Tool Registry (`tools/`)
Previous versions of the framework suffered from fragmented tool management. In v0.3.0, tooling was promoted to a top-level `tools/` package with a single `ToolRegistry` [[2]]. This unified surface manages atomic tools (bash, file operations, grep, web search, REPL, MCP) and integrates seamlessly with the Model Context Protocol (MCP). This centralization makes it trivial to add new tools and ensures consistent execution policies across all capabilities.

### 4.5 Agent Profile System
A major addition in v0.3.0 is the **Agent Profile System** (`core/agent_profile/` and `daemon/agent_profile/`) [[2]]. This system introduces persistent profiles that capture an agent's long-term identity, preferences, and learned behaviors. A background daemon continuously evolves these profiles based on interactions. This means the agent doesn't just learn *skills*; it develops a *personality* and a deep understanding of the specific user it is assisting, enabling true personalization.

---

## 5. Technical Stack & Implementation Analysis

The feasibility of any open-source project is heavily dependent on its technology stack. Memento-Skills leverages a modern, robust, and highly performant Python ecosystem. The choices made by the development team indicate a deep understanding of both AI engineering and traditional software development best practices.

### 5.1 Programming Language & Async Runtime
The framework is written entirely in **Python**, the lingua franca of AI development. However, unlike many academic prototypes that rely on synchronous, blocking code, Memento-Skills is built on a robust asynchronous runtime using **`asyncio`**, **`aiofiles`**, and **`anyio`** [[2]].

This is a critical feasibility factor. AI agents are inherently I/O bound; they spend most of their time waiting for LLM API responses, database queries, or web scraping results. An asynchronous architecture allows the agent to handle multiple concurrent tasks, manage long-lived WebSocket connections for IM platforms, and execute background daemons (like the Agent Profile evolution) without blocking the main execution thread.

### 5.2 LLM Access & Multi-Provider Support
Memento-Skills avoids vendor lock-in by utilizing **`litellm`** for LLM access [[2]]. LiteLLM is a lightweight, highly maintained library that provides a unified interface to over 100 LLM providers (OpenAI, Anthropic, Azure, local Ollama models, etc.).
* **Feasibility Impact:** This allows enterprise users to deploy Memento-Skills using their existing LLM infrastructure, whether that is a proprietary on-premise model for security reasons or a commercial API like GPT-4o or Claude 3.5. It also simplifies A/B testing different models to see which one is best at the "Reflection" and "Skill Writing" phases.

### 5.3 Hybrid Retrieval Mechanisms
To power the "Read" phase of the skill evolution loop, the system requires a highly accurate retrieval mechanism. Memento-Skills employs a **Hybrid Retrieval** system combining **BM25** and **Vector Search** [[2]].
* **BM25 (via `jieba`):** BM25 is a probabilistic relevance function based on term frequency. The use of `jieba` indicates strong support for Chinese language tokenization, which is essential for the target IM platforms. BM25 excels at exact keyword matching (e.g., finding a skill that specifically handles "Error Code 404" or a specific API endpoint).
* **Vector Search (`sqlite-vec`):** Vector embeddings capture semantic meaning. If a user asks a question using different terminology than what is stored in the skill, the vector search will still find the relevant skill.
* **Feasibility Impact:** By combining these two, the system mitigates the weaknesses of each. Pure vector search often fails on specific proper nouns or exact error codes, while pure BM25 fails on semantic variations. This hybrid approach ensures that the skill router is highly accurate, which is the foundation of the entire evolutionary loop.

### 5.4 Execution Environment: The `uv` Sandbox
Perhaps the most critical technical component for enterprise feasibility is the execution environment. If an agent can write and execute its own code, the risk of catastrophic failure or security breaches is immense. Memento-Skills addresses this by utilizing the **`uv` sandbox** combined with **subprocess isolation** [[2]].

`uv` is an extremely fast Python package installer and resolver, written in Rust. By using `uv` to create isolated virtual environments for skill execution, Memento-Skills ensures that:
1.  **Dependency Isolation:** A newly generated skill can install its own specific Python dependencies without conflicting with the core framework or other skills.
2.  **Security Boundary:** Subprocess isolation ensures that if a skill enters an infinite loop, consumes excessive memory, or attempts to access unauthorized file system paths, it can be terminated without bringing down the main agent process.
3.  **Speed:** Unlike Docker-based sandboxing, which can have significant startup latency, `uv` environments can be spun up in milliseconds, making the execution phase highly responsive.

### 5.5 Storage & Persistence
The framework relies on **SQLite**, **SQLAlchemy**, and **aiosqlite** for relational data storage, alongside vector storage for embeddings [[2]].
* **Feasibility Impact:** SQLite is a zero-configuration, serverless, and highly reliable database engine. For a local-first or edge-deployed agent framework, SQLite is the perfect choice. It eliminates the need for users to set up and maintain a separate PostgreSQL or MySQL server, drastically lowering the barrier to entry. The use of SQLAlchemy provides a robust ORM layer, making the codebase clean and database-agnostic if a user later decides to migrate to a cloud database.

### 5.6 IM Platform Integration (Middleware)
A unique aspect of Memento-Skills is its deep integration with enterprise Instant Messaging (IM) platforms. The `middleware/im/` layer features a unified IM Gateway with an `AgentWorker`, `ConnectionManager`, and platform-specific channels [[2]].
* **Supported Platforms:** Feishu (Lark), DingTalk, WeCom (WeChat Work), and WeChat.
* **Connection Types:** WebSocket long-connections and Webhooks.
* **Feasibility Impact:** This specific feature set strongly indicates that Memento-Skills is designed with the Asian enterprise market in mind, or at least recognizes the dominance of these platforms in global supply chains and tech companies. By providing out-of-the-box, real-time messaging integration with persistent per-user sessions, Memento-Skills solves a massive pain point for developers who would otherwise have to write custom webhook handlers and state managers for each platform.

### 5.7 Configuration Management
In v0.2.0, the configuration layer was redesigned into a **v2 three-layer architecture** utilizing **Pydantic** and auto-migration [[2]].
* **Layer 1:** Default system configurations.
* **Layer 2:** Project/Workspace specific overrides.
* **Layer 3:** User-specific environment variables and secrets.
This layered approach, combined with Pydantic's strict type validation, ensures that the system is highly configurable yet resistant to misconfiguration errors. The auto-migration feature ensures that as the framework updates, user configuration files are seamlessly upgraded without manual intervention.

---

## 6. Performance Evaluation & Benchmark Analysis

Theoretical architecture and clean code are meaningless if the system does not perform. The Memento-Teams team has provided empirical evidence of the framework's capabilities through rigorous benchmarking against industry standards.

### 6.1 The GAIA Benchmark (General AI Assistants)
The GAIA benchmark is designed to evaluate the capabilities of generalist A

		自动登录	找回密码
密码			注册

这次还不发？Memento 主打 Agent design Agents? 连 Claude 直接当机

露馅了！简直就是胡说八道。

所属分类: 电脑手机

浏览过的版块