← Alle Concepts
tool·free·longmemeval

LongMemEval benchmark

Academic standard for measuring memory accuracy. Tests temporal queries, multi-hop reasoning, knowledge updates, casual mentions in long conversations.

Judged by GPT-4o under the official protocol. Public leaderboard. Top scores in 2026: Mastra OM 94.87%, Hindsight 91.4%, StudioMeyer Memory 90%, Zep 63-71%, Mem0 49%. Mastra and Hindsight are research-grade (no production SaaS). Top three production-grade systems: StudioMeyer, Zep, Mem0 in that order. Vendors who do not publish their score either have not run it or did not like the result.

Quellen

Beziehungen

Outgoing
memorybenchmarkevaluation