What is the difference between keyword search and semantic search for AI agents?

Keyword search (BM25) finds exact matches — ideal for scores, dates, and identifiers. Semantic search uses embeddings to find similar meanings even when words differ. Neither is sufficient alone: Glorics' Agentics combines both in a hybrid architecture for optimal memory retrieval.

What is the Soul Search System in Glorics Agentics?

Soul Search is the hybrid memory retrieval layer in Glorics' Agentics platform. It combines BM25 keyword search (via SQLite FTS5) and semantic vector search (via sqlite-vec with local embeddings) over a three-level agent memory hierarchy — Identity, Global expertise, and Project-specific observations.

How does cross-agent memory search work in Agentics?

Agentics stores indexed memory chunks per project, not per agent. The orchestrator agent (Angela) and synthesis agents can search across all 12 agents' observations — finding patterns no single agent could detect alone, such as correlating SERP trends with content performance over weeks of history.

Inside Agentics: How Glorics Agents Retrieve Memories with Hybrid Search

24 min read Expert March 2026

At Glorics, our agents don't just execute tasks — they remember. They accumulate observations, learn from client feedback, and refine their behavior over weeks and months of operation. But storing memories is the easy part. Retrieving the right memory at the right moment — that's where the engineering gets serious.

This article takes you inside Agentics, the memory intelligence layer of the Glorics platform. We'll show you how 12 autonomous agents maintain, search, and cross-reference their memories using a hybrid architecture that combines keyword precision with semantic understanding. Here are the building blocks:

1. The Soul System: Where Memories Live

Before we talk about search, you need to understand where memories are stored. Every Glorics agent has a soul — a structured markdown document that contains everything the agent knows. Not a vague embedding blob, not a key-value store. A readable, editable, human-auditable document.

The soul is organized in three hierarchical levels, from the most general to the most specific:

Each project-level soul contains six sections: Identity, Brand Voice, Domain Context, Rules, What I've Learned (auto-observations + human notes), and Performance Summary. The agent accumulates observations automatically — when a content piece gets scored, when a SERP position changes, when the human adds notes like "focus on implant content."

Why markdown?

The soul is plain markdown in a textarea. Not a WYSIWYG editor, not a form wizard. This is a deliberate design choice — the same approach used by Claude Projects, Cursor rules, and every modern AI tool. Humans can read it, edit it, audit it. The agent's memory is never a black box.

The problem: until now, the entire soul was injected into the agent's prompt at every run. All observations, all notes, everything — regardless of relevance. The copywriter writing about dental implants received observations about orthodontics, whitening, and periodontics. The signal was buried in noise.

That's why we built the Soul Search System — the hybrid retrieval layer that sits on top of the soul. The soul stays the source of truth. Soul Search makes it searchable.

2. Keyword Search: The Foundation

Keyword search is the most direct retrieval method. You look for exact matches between query terms and stored content. It's fast, it's precise on identifiers, and it's the backbone of every search system ever built.

At Glorics, we use BM25 through SQLite's FTS5 extension — the same algorithm that powers Elasticsearch and Lucene, but embedded in a single file with zero server overhead. BM25 doesn't just find matches — it ranks them by factoring in term frequency and document length:

formula

score(D, Q) = Σ IDF(qi) × [ f(qi, D) × (k1 + 1) ] / [ f(qi, D) + k1 × (1 - b + b × |D|/avgdl) ]

Where:
  f(qi, D)  = frequency of term qi in document D
  |D|       = document length
  avgdl     = average document length
  k1, b     = tuning parameters (typically k1=1.2, b=0.75)

TF (Term Frequency)              → more appearances = more relevant
IDF (Inverse Document Frequency) → rarer term = stronger signal

In Agentics, every soul chunk is indexed in an FTS5 virtual table. When the copywriter is about to write an article on "dental implant pricing", BM25 finds every observation that contains those exact words:

sql

-- Agentics soul chunks indexed for BM25 search
CREATE VIRTUAL TABLE soul_chunks_fts USING fts5(
    text,
    content='soul_chunks',
    content_rowid='chunk_id',
    tokenize='porter unicode61'   -- stemming + unicode support
);

-- Search the copywriter's observations about implant scores
SELECT sc.chunk_id, sc.agent_key, sc.text, rank
FROM soul_chunks_fts
JOIN soul_chunks sc ON soul_chunks_fts.rowid = sc.chunk_id
WHERE soul_chunks_fts MATCH 'implant score SEO'
ORDER BY rank;

3. Semantic Search: Understanding Meaning

Semantic search doesn't look for identical words — it looks for similar meanings. The text is converted into embeddings: vectors of numbers that capture semantic meaning. Two sentences that mean the same thing produce similar vectors, even if they share no words in common.

At Glorics, we run embeddings locally on our own server using sentence-transformers — no third-party API, no data leaving the infrastructure, zero per-token cost. Our server (128 GB RAM, 32 cores) handles this effortlessly:

python

from sentence_transformers import SentenceTransformer

# Loaded once at server startup — 420 MB, ~8ms per embedding
model = SentenceTransformer('all-mpnet-base-v2')

# The magic: similar meaning → similar vectors
v1 = model.encode("implant article SEO score 91")
v2 = model.encode("prosthesis content performed well")
v3 = model.encode("best restaurants in Paris")

# v1 and v2 are CLOSE (similar meaning)
# v1 and v3 are FAR APART (unrelated topics)

print(f"Dimensions: {len(v1)}")  # 768

Why local embeddings?

Glorics runs exclusively on Anthropic's Claude for reasoning (Haiku, Sonnet, Opus). Rather than adding another API dependency for embeddings, we run all-mpnet-base-v2 locally. Zero network latency, zero per-token cost, zero data leakage. The entire embedding pipeline stays on our infrastructure.

Once embeddings are generated, they're stored in sqlite-vec — a vector search extension for SQLite. Same file, same database, no vector server to manage. The agent's semantic index lives right next to its BM25 index:

4. Hybrid Fusion: The Real Answer

Neither keyword search nor semantic search is sufficient on its own. Keyword search finds "SEO score 91" perfectly but misses "content performed well." Semantic search understands meaning but can't reliably match specific scores, dates, and names. The answer: run both in parallel and fuse the results.

Agentics uses weighted score fusion — each result receives a combined score that favors semantic understanding (70%) while preserving keyword precision (30%):

formula

final_score = 0.7 × semantic_score + 0.3 × keyword_score

Observation              | Semantic | BM25  | Fused score
-------------------------|----------|-------|----------------------------
"implant avg 91 vs 72"  |   0.92   | 0.85  | 0.7×0.92 + 0.3×0.85 = 0.899  ← best
"prosthesis perf good"   |   0.88   | 0.00  | 0.7×0.88 + 0.3×0.00 = 0.616
"orthodontics Q1 data"   |   0.31   | 0.00  | 0.7×0.31 + 0.3×0.00 = 0.217  ← below threshold
"implant pricing trends"  |   0.75   | 0.72  | 0.7×0.75 + 0.3×0.72 = 0.741

Minimum threshold: 0.35 — anything below is dropped
-- Results appearing in BOTH searches are naturally favored

The alternative is Reciprocal Rank Fusion (RRF), which uses position instead of raw scores:

formula

RRF_score(d) = Σ  1 / (k + rank(d))    (k = 60 standard)

Observation              | Semantic rank | Keyword rank | RRF score
-------------------------|:------------:|:------------:|:---------:
"implant avg 91 vs 72"  |      #1      |      #2      | 1/61 + 1/62 = 0.0325
"client prefers narrative" |      #2      |      #5      | 1/62 + 1/65 = 0.0315
"pricing featured snippet" |      #4      |      #1      | 1/64 + 1/61 = 0.0320

Why weighted fusion over RRF?

Agentics uses weighted score fusion because it preserves match strength. A near-perfect semantic match (0.92) scores very differently from a mediocre one (0.31). RRF flattens this distinction — rank #1 vs #4 doesn't tell you how much better the first result is. For agent memory retrieval where the quality of the match directly impacts reasoning, preserving score magnitude matters.

5. Agentics Soul Search: Complete Architecture

Everything above comes together in a single system: the Soul Search layer. One SQLite file per project, sitting alongside the backend, containing the hybrid index over all agent souls in that project.

SQLite database schema

sql

-- Soul chunks: markdown sections split into searchable blocks
CREATE TABLE soul_chunks (
    chunk_id      INTEGER PRIMARY KEY AUTOINCREMENT,
    agent_key     TEXT NOT NULL,       -- 'copywriter', 'serp_scout', 'angela'...
    section       TEXT NOT NULL,       -- 'auto_observations', 'human_notes', 'archive'
    text          TEXT NOT NULL,
    tokens        INTEGER NOT NULL,
    source        TEXT DEFAULT 'soul_md',
    created_at    TEXT NOT NULL,
    embedding_hash TEXT               -- SHA256: skip re-embedding if unchanged
);

-- BM25 keyword search (via FTS5)
CREATE VIRTUAL TABLE soul_chunks_fts USING fts5(
    text, content='soul_chunks', content_rowid='chunk_id',
    tokenize='porter unicode61'
);

-- Vector search (cosine similarity, local embeddings)
CREATE VIRTUAL TABLE soul_chunks_vec USING vec0(
    chunk_id INTEGER PRIMARY KEY,
    embedding FLOAT[768]              -- all-mpnet-base-v2 = 768 dimensions
);

The soul_search process

process

Run context: "Writing article: Guide implant dentaire 2026. Keywords: implant dentaire, prix implant"

Step 1 — Embed the run context (local model, ~8ms)

Step 2 — Parallel search (×4 candidate multiplier)
  BM25 search    → 20 candidates  (Score = 1/(1+rank))
  Vector search  → 20 candidates  (Score = 1 - cosine_distance)

Step 3 — Weighted fusion
  score = 0.7 × semantic_score + 0.3 × keyword_score
  (chunk missing from one search → score = 0 for that side)

Step 4 — Filtering
  • Minimum threshold: 0.35
  • Sort descending by fused score
  • Cap at 5 results

Results:
  [copywriter]      "Implant articles avg 91 vs ortho 72"          score: 0.87
  [human_note]      "Dr. Martin wants clinical case studies"        score: 0.72
  [serp_scout]      "'implant dentaire prix' — featured snippet"   score: 0.68
  [copywriter]      "Rating 5/5 on 'Guide implant 2026'"           score: 0.61
  [signal_analyst]  "Competitor launched 3 articles on pricing"     score: 0.45

The ×4 candidate multiplier

If Agentics needs 5 final results, each search returns 20 candidates. This surplus gives the fusion step more material to identify cross-matched results — observations that appear in both keyword and semantic searches naturally rise to the top, because they score on both dimensions.

Score normalization

python

# BM25 rank → normalized 0–1
def normalize_bm25(rank):
    return 1 / (1 + rank)
    # rank=0 (best) → score=1.0
    # rank=1        → score=0.5
    # rank=9        → score=0.1

# Cosine distance → similarity
def cosine_to_similarity(distance):
    return 1 - distance
    # distance=0.0 (identical) → score=1.0
    # distance=0.2             → score=0.8
    # distance=0.5             → score=0.5

Selective injection: the key differentiator

Here's what makes Agentics different from generic RAG systems. The soul is split into fixed sections (Identity, Brand Voice, Domain Context, Rules) and dynamic sections (observations, notes, archive). Fixed sections are always injected in full — they're short and always relevant. Dynamic sections are filtered through soul_search():

process

get_smart_soul_context(project_id, agent_key="copywriter", run_context="write implant article")

  1. Inject FULL: Identity + Brand Voice + Domain Context + Rules
     → Always relevant, always short (~500 tokens total)

  2. Call soul_search(run_context) for observations
     → Returns 5 most relevant observations across agents
     → Replaces the old "dump everything" approach

  3. Inject FULL: Performance Summary
     → Short, always useful (~60 tokens)

Result: precise context (~800 tokens) instead of full dump (~2500 tokens)
Same information value, 70% less noise.

The soul_search → soul_get pattern

For strategic agents like Angela (the orchestrator) and the Briefing Writer, Agentics goes further. These agents receive soul_search and soul_get as callable tools — they actively decide when and what to search, rather than passively receiving pre-filtered results:

python

# Angela decides to investigate a cross-agent pattern
SOUL_TOOLS = [
    {
        "name": "soul_search",
        "description": "Search the collective memory of all agents on this project.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": { "type": "string" },
                "agent": { "type": "string", "description": "Limit to one agent (optional)" }
            },
            "required": ["query"]
        }
    },
    {
        "name": "soul_get",
        "description": "Retrieve the full text of a specific observation by ID.",
        "input_schema": {
            "type": "object",
            "properties": {
                "chunk_id": { "type": "integer" }
            },
            "required": ["chunk_id"]
        }
    }
]

This two-step pattern keeps the context window lean. The search returns short previews (200 chars). The agent only loads the full observation for what it actually needs. It's how a human uses a book index — scan the table of contents first, then open the right page.

6. Cross-Agent Intelligence

This is where Agentics does something no other framework offers. Because all 12 agents' observations are indexed in the same SQLite file per project, any agent can search across the collective memory of all agents.

Concrete example: Angela is making weekly strategic decisions. She calls soul_search("keyword cannibalization implant") across all agents. Back come results from the SERP Scout (position data from 3 weeks ago), the Signal Analyst (competitor trend analysis), the Copywriter (which article topics overlapped), and a human note (the client wants to consolidate). Four agents, weeks of history, one query. No single agent had the full picture — but together, through the index, the pattern is clear.

This is enabled by a single architectural choice: one SQLite index per project, not per agent. Every observation from every agent is indexed in the same file. The agent_key column lets you filter to a single agent when needed, but the default is project-wide. The index doesn't care how many agents exist — add a 13th, a 25th agent, and their observations are automatically indexed and searchable.

The observation archive: never lose history

One more critical piece. When agents accumulate more than 15 observations, a smart compression step summarizes the oldest ones into a few bullet points. This keeps the soul document readable. But the raw observations — with their specific scores, dates, and details — are archived in soul_observations_archive before compression happens. The archive is indexed in the SQLite hybrid index. So even observations that were compressed 3 months ago are still searchable through soul_search().

The soul shows the summary. The archive preserves the details. The index makes everything searchable. Nothing is ever lost.

Conclusion

Agentics is what happens when you take memory retrieval seriously for production AI agents. Not RAG bolted onto a chatbot — a purpose-built system where 12 autonomous agents accumulate, compress, search, and cross-reference months of operational intelligence.

The Soul System provides structure — three hierarchical levels (Identity, Global, Project), six sections per agent, auto-learning from pipeline execution, smart compression that preserves key data, and human-editable markdown that's never a black box.

The hybrid search provides precision — BM25 for exact scores, dates, and identifiers; local semantic embeddings for meaning and synonyms; weighted 70/30 fusion that favors understanding while preserving literal accuracy. All running locally, ~15-20ms per search, zero API cost.

Cross-agent intelligence provides the breakthrough — one index per project means any agent can search across all 12 agents' history. Patterns that no single agent could detect emerge when the orchestrator correlates weeks of SERP data, content performance, competitor movements, and human feedback in a single query.

The observation archive ensures nothing is ever lost — compressions keep the soul readable, but raw observations are preserved and indexed. An agent can retrieve a specific data point from 6 months ago, long after it was compressed out of the visible soul.

This is what "agents that remember" actually means. Not a marketing claim — a searchable, cross-referenced, auditable memory system with hybrid retrieval, running on dedicated infrastructure with zero third-party dependencies.

Build smarter AI agent memory

We architect and deploy autonomous agent teams with production-grade memory systems for SEO and digital marketing.

Talk to an Architect →

← PreviousInside Claude Code Next →Claude Code Skills: Teach Once