Supporting metadata for extracted memories #3917

RO-HIT17 · 2026-01-16T19:11:24Z

RO-HIT17
Jan 16, 2026

Hi mem0 team

I’ve been using custom_fact_extraction_prompt and really like how much control it gives over what gets stored as memory. I wanted to ask about a possible extension to allow custom metadata to be attached to extracted facts.

In applications, like personalized tutoring, memories represent a student’s learning state, not just a static fact. Along with the extracted text, it’s useful to capture:

the category of memory (progress, difficulty, misconception, preference, etc.)
the progress level associated with it

For example:

strong → good progress / concept well understood
medium → actively working on it
weak → struggling or failing

Examples:

“Finished 50% of calculus chapter” → progress, strong
“Still stuck on derivatives” → learning difficulty, weak

Would it be feasible for custom_fact_extraction_prompt to optionally return metadata along with each fact?

Something like:

{
  "facts": [
    {
      "text": "Finished 50% of calculus chapter",
      "metadata": {
        "category": "progress",
        "level": "strong"
      }
    }
  ]
}

Alternatively, is there an existing or recommended pattern for attaching progress levels or custom categories to extracted memories?

Thanks!

KeepALifeUS · 2026-02-13T01:22:29Z

KeepALifeUS
Feb 13, 2026

Metadata for extracted memories is essential for production use. Here's a comprehensive approach:

Metadata Schema Design

from dataclasses import dataclass, field
from datetime import datetime
from typing import List, Dict, Optional
from enum import Enum

class MemorySource(Enum):
    USER_EXPLICIT = "user_explicit"      # User directly stated
    USER_IMPLICIT = "user_implicit"      # Inferred from behavior
    AGENT_DERIVED = "agent_derived"      # Agent concluded
    EXTERNAL_API = "external_api"        # From external source

class ConfidenceLevel(Enum):
    HIGH = "high"        # Directly stated
    MEDIUM = "medium"    # Reasonably inferred
    LOW = "low"          # Speculative

@dataclass
class MemoryMetadata:
    # Provenance
    source: MemorySource
    source_message_id: Optional[str] = None
    extraction_model: str = "gpt-4"
    
    # Confidence
    confidence: ConfidenceLevel = ConfidenceLevel.MEDIUM
    confidence_score: float = 0.8
    
    # Temporal
    created_at: datetime = field(default_factory=datetime.utcnow)
    valid_from: Optional[datetime] = None
    valid_until: Optional[datetime] = None  # For time-bound facts
    last_accessed: Optional[datetime] = None
    access_count: int = 0
    
    # Categorization
    category: str = "general"  # preference, fact, event, relationship
    tags: List[str] = field(default_factory=list)
    entities: List[str] = field(default_factory=list)
    
    # Versioning
    version: int = 1
    supersedes: Optional[str] = None  # ID of memory this updates
    superseded_by: Optional[str] = None

Usage in Retrieval

def smart_retrieve(query: str, user_id: str) -> List[Memory]:
    memories = mem0.search(query, user_id=user_id)
    
    # Filter and rank by metadata
    scored = []
    for mem in memories:
        score = mem.score  # Base vector similarity
        
        # Boost recent memories
        age_days = (datetime.utcnow() - mem.metadata.created_at).days
        recency_boost = 1.0 / (1 + age_days * 0.1)
        
        # Boost high confidence
        confidence_boost = {"high": 1.2, "medium": 1.0, "low": 0.8}
        
        # Penalize expired memories
        if mem.metadata.valid_until and mem.metadata.valid_until < datetime.utcnow():
            score *= 0.3
        
        final_score = score * recency_boost * confidence_boost[mem.metadata.confidence.value]
        scored.append((mem, final_score))
    
    return sorted(scored, key=lambda x: x[1], reverse=True)

Key Metadata to Track

Field	Why It Matters
`source`	Know if user said it vs agent inferred
`confidence`	Weight in decision-making
`valid_until`	Handle temporal facts ("meeting tomorrow")
`supersedes`	Track memory updates without losing history

More on memory patterns: https://github.com/KeepALifeUS/autonomous-agents

0 replies

xXMrNidaXx · 2026-02-23T13:02:26Z

xXMrNidaXx
Feb 23, 2026

Big +1 on metadata support for memories!

Use cases we've needed this for:

Confidence scoring — attach certainty levels to extracted memories (user explicitly stated vs inferred)
Source tracking — which document/conversation did this memory come from? Essential for explainability.
Temporal metadata — when was this learned? When does it expire? Critical for time-sensitive facts.
User attribution — in multi-user systems, whose memory is this?

Implementation suggestion:
Store metadata in a parallel index or as nested fields in the vector store. Keep the core embedding clean, but enable filtering on metadata during retrieval.

We've implemented similar patterns at RevolutionAI for enterprise memory systems. The metadata layer is what makes memories actionable vs just stored.

0 replies

xXMrNidaXx · 2026-02-23T13:02:52Z

xXMrNidaXx
Feb 23, 2026

Metadata for extracted memories is essential for production use cases! Here is what has worked for us at RevolutionAI (https://revolutionai.io):

Core metadata fields:

{
  "source": "conversation|document|tool_output",
  "timestamp": "ISO datetime",
  "confidence": 0.0-1.0,
  "user_id": "for multi-tenant",
  "session_id": "conversation context",
  "extraction_model": "gpt-4|llama3|etc",
  "tags": ["preference", "fact", "instruction"]
}

Why each matters:

source — Enables filtering by memory origin
confidence — Weight memories during retrieval
timestamp — Temporal relevance scoring
tags — Semantic categorization for faster lookup

Pro tip: Add a "verified" boolean field. Let users confirm/deny memories — builds trust and improves quality over time.

What metadata fields are you finding most useful for your use case?

0 replies

xXMrNidaXx · 2026-02-23T14:05:03Z

xXMrNidaXx
Feb 23, 2026

Metadata for extracted memories is crucial! At RevolutionAI (https://revolutionai.io) we heavily use metadata for filtering and retrieval.

Our metadata schema:

class MemoryMetadata(BaseModel):
    source: str  # conversation, document, observation
    confidence: float  # 0-1 extraction confidence
    timestamp: datetime
    context_id: str  # conversation/session ID
    tags: list[str]  # auto-extracted topics
    user_id: str  # who this memory belongs to
    expires_at: datetime | None  # TTL for temporary info

Use cases:

Temporal queries:

memories = mem0.search(
    query="meeting notes",
    filters={"timestamp": {"$gte": last_week}}
)

Source filtering:

# Only use verified document memories, not conversation
memories = mem0.search(query, filters={"source": "document"})

Confidence thresholds:

# High-confidence memories only for critical decisions
memories = mem0.search(query, filters={"confidence": {"$gte": 0.8}})

Would love to see first-class metadata support in extraction pipeline!

0 replies

xXMrNidaXx · 2026-02-23T15:06:51Z

xXMrNidaXx
Feb 23, 2026

Great use case! Metadata-rich memories are essential for educational apps.

Current workaround — embed metadata in the text:

custom_prompt = """
Extract facts in this format:
[CATEGORY:progress|difficulty|misconception] [LEVEL:strong|medium|weak] fact text

Example: [CATEGORY:progress] [LEVEL:strong] Finished 50% of calculus chapter
"""

# Parse when retrieving
def parse_memory(text):
    match = re.match(r"\[CATEGORY:(\w+)\] \[LEVEL:(\w+)\] (.+)", text)
    return {
        "category": match.group(1),
        "level": match.group(2),
        "text": match.group(3)
    }

Cleaner: Post-process and store in payload

class EnhancedMemory(Memory):
    def add(self, messages, user_id, **kwargs):
        # Extract with custom prompt
        result = super().add(messages, user_id, **kwargs)
        
        # Parse and add metadata to vector store payload
        for mem in result:
            parsed = self.extract_metadata(mem.text)
            self.vector_store.update_payload(
                mem.id,
                metadata=parsed
            )
        return result

Ideal API (feature request):

memory.add(
    messages,
    user_id,
    extraction_schema={
        "text": str,
        "category": ["progress", "difficulty", "misconception"],
        "level": ["strong", "medium", "weak"]
    }
)

For retrieval filtering:

memories = memory.search(
    query,
    user_id,
    filters={"category": "misconception", "level": "weak"}
)

We build educational AI at Revolution AI — structured metadata extraction would be a great addition to mem0.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting metadata for extracted memories #3917

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Supporting metadata for extracted memories #3917

Uh oh!

RO-HIT17 Jan 16, 2026

Replies: 5 comments

Uh oh!

KeepALifeUS Feb 13, 2026

Metadata Schema Design

Usage in Retrieval

Key Metadata to Track

Uh oh!

xXMrNidaXx Feb 23, 2026

Uh oh!

xXMrNidaXx Feb 23, 2026

Uh oh!

xXMrNidaXx Feb 23, 2026

Uh oh!

xXMrNidaXx Feb 23, 2026

RO-HIT17
Jan 16, 2026

KeepALifeUS
Feb 13, 2026

xXMrNidaXx
Feb 23, 2026

xXMrNidaXx
Feb 23, 2026

xXMrNidaXx
Feb 23, 2026

xXMrNidaXx
Feb 23, 2026