At Glorics, our agents don't just execute tasks — they remember. They accumulate observations, learn from client feedback, and refine their behavior over weeks and months of operation. But storing memories is the easy part. Retrieving the right memory at the right moment — that's where the engineering gets serious.
This article takes you inside Agentics, the memory intelligence layer of the Glorics platform. We'll show you how 12 autonomous agents maintain, search, and cross-reference their memories using a hybrid architecture that combines keyword precision with semantic understanding. Here are the building blocks:
Before we talk about search, you need to understand where memories are stored. Every Glorics agent has a soul — a structured markdown document that contains everything the agent knows. Not a vague embedding blob, not a key-value store. A readable, editable, human-auditable document.
The soul is organized in three hierarchical levels, from the most general to the most specific:
Each project-level soul contains six sections: Identity, Brand Voice, Domain Context, Rules, What I've Learned (auto-observations + human notes), and Performance Summary. The agent accumulates observations automatically — when a content piece gets scored, when a SERP position changes, when the human adds notes like "focus on implant content."
The soul is plain markdown in a textarea. Not a WYSIWYG editor, not a form wizard. This is a deliberate design choice — the same approach used by Claude Projects, Cursor rules, and every modern AI tool. Humans can read it, edit it, audit it. The agent's memory is never a black box.
The problem: until now, the entire soul was injected into the agent's prompt at every run. All observations, all notes, everything — regardless of relevance. The copywriter writing about dental implants received observations about orthodontics, whitening, and periodontics. The signal was buried in noise.
That's why we built the Soul Search System — the hybrid retrieval layer that sits on top of the soul. The soul stays the source of truth. Soul Search makes it searchable.
Keyword search is the most direct retrieval method. You look for exact matches between query terms and stored content. It's fast, it's precise on identifiers, and it's the backbone of every search system ever built.
At Glorics, we use BM25 through SQLite's FTS5 extension — the same algorithm that powers Elasticsearch and Lucene, but embedded in a single file with zero server overhead. BM25 doesn't just find matches — it ranks them by factoring in term frequency and document length:
score(D, Q) = Σ IDF(qi) × [ f(qi, D) × (k1 + 1) ] / [ f(qi, D) + k1 × (1 - b + b × |D|/avgdl) ]
Where:
f(qi, D) = frequency of term qi in document D
|D| = document length
avgdl = average document length
k1, b = tuning parameters (typically k1=1.2, b=0.75)
TF (Term Frequency) → more appearances = more relevant
IDF (Inverse Document Frequency) → rarer term = stronger signal In Agentics, every soul chunk is indexed in an FTS5 virtual table. When the copywriter is about to write an article on "dental implant pricing", BM25 finds every observation that contains those exact words:
-- Agentics soul chunks indexed for BM25 search
CREATE VIRTUAL TABLE soul_chunks_fts USING fts5(
text,
content='soul_chunks',
content_rowid='chunk_id',
tokenize='porter unicode61' -- stemming + unicode support
);
-- Search the copywriter's observations about implant scores
SELECT sc.chunk_id, sc.agent_key, sc.text, rank
FROM soul_chunks_fts
JOIN soul_chunks sc ON soul_chunks_fts.rowid = sc.chunk_id
WHERE soul_chunks_fts MATCH 'implant score SEO'
ORDER BY rank; Semantic search doesn't look for identical words — it looks for similar meanings. The text is converted into embeddings: vectors of numbers that capture semantic meaning. Two sentences that mean the same thing produce similar vectors, even if they share no words in common.
At Glorics, we run embeddings locally on our own server using sentence-transformers — no third-party API, no data leaving the infrastructure, zero per-token cost. Our server (128 GB RAM, 32 cores) handles this effortlessly:
from sentence_transformers import SentenceTransformer
# Loaded once at server startup — 420 MB, ~8ms per embedding
model = SentenceTransformer('all-mpnet-base-v2')
# The magic: similar meaning → similar vectors
v1 = model.encode("implant article SEO score 91")
v2 = model.encode("prosthesis content performed well")
v3 = model.encode("best restaurants in Paris")
# v1 and v2 are CLOSE (similar meaning)
# v1 and v3 are FAR APART (unrelated topics)
print(f"Dimensions: {len(v1)}") # 768 Glorics runs exclusively on Anthropic's Claude for reasoning (Haiku, Sonnet, Opus). Rather than adding another API dependency for embeddings, we run all-mpnet-base-v2 locally. Zero network latency, zero per-token cost, zero data leakage. The entire embedding pipeline stays on our infrastructure.
Once embeddings are generated, they're stored in sqlite-vec — a vector search extension for SQLite. Same file, same database, no vector server to manage. The agent's semantic index lives right next to its BM25 index:
Neither keyword search nor semantic search is sufficient on its own. Keyword search finds "SEO score 91" perfectly but misses "content performed well." Semantic search understands meaning but can't reliably match specific scores, dates, and names. The answer: run both in parallel and fuse the results.
Agentics uses weighted score fusion — each result receives a combined score that favors semantic understanding (70%) while preserving keyword precision (30%):
final_score = 0.7 × semantic_score + 0.3 × keyword_score
Observation | Semantic | BM25 | Fused score
-------------------------|----------|-------|----------------------------
"implant avg 91 vs 72" | 0.92 | 0.85 | 0.7×0.92 + 0.3×0.85 = 0.899 ← best
"prosthesis perf good" | 0.88 | 0.00 | 0.7×0.88 + 0.3×0.00 = 0.616
"orthodontics Q1 data" | 0.31 | 0.00 | 0.7×0.31 + 0.3×0.00 = 0.217 ← below threshold
"implant pricing trends" | 0.75 | 0.72 | 0.7×0.75 + 0.3×0.72 = 0.741
Minimum threshold: 0.35 — anything below is dropped
-- Results appearing in BOTH searches are naturally favored The alternative is Reciprocal Rank Fusion (RRF), which uses position instead of raw scores:
RRF_score(d) = Σ 1 / (k + rank(d)) (k = 60 standard)
Observation | Semantic rank | Keyword rank | RRF score
-------------------------|:------------:|:------------:|:---------:
"implant avg 91 vs 72" | #1 | #2 | 1/61 + 1/62 = 0.0325
"client prefers narrative" | #2 | #5 | 1/62 + 1/65 = 0.0315
"pricing featured snippet" | #4 | #1 | 1/64 + 1/61 = 0.0320 Agentics uses weighted score fusion because it preserves match strength. A near-perfect semantic match (0.92) scores very differently from a mediocre one (0.31). RRF flattens this distinction — rank #1 vs #4 doesn't tell you how much better the first result is. For agent memory retrieval where the quality of the match directly impacts reasoning, preserving score magnitude matters.
Everything above comes together in a single system: the Soul Search layer. One SQLite file per project, sitting alongside the backend, containing the hybrid index over all agent souls in that project.
-- Soul chunks: markdown sections split into searchable blocks
CREATE TABLE soul_chunks (
chunk_id INTEGER PRIMARY KEY AUTOINCREMENT,
agent_key TEXT NOT NULL, -- 'copywriter', 'serp_scout', 'angela'...
section TEXT NOT NULL, -- 'auto_observations', 'human_notes', 'archive'
text TEXT NOT NULL,
tokens INTEGER NOT NULL,
source TEXT DEFAULT 'soul_md',
created_at TEXT NOT NULL,
embedding_hash TEXT -- SHA256: skip re-embedding if unchanged
);
-- BM25 keyword search (via FTS5)
CREATE VIRTUAL TABLE soul_chunks_fts USING fts5(
text, content='soul_chunks', content_rowid='chunk_id',
tokenize='porter unicode61'
);
-- Vector search (cosine similarity, local embeddings)
CREATE VIRTUAL TABLE soul_chunks_vec USING vec0(
chunk_id INTEGER PRIMARY KEY,
embedding FLOAT[768] -- all-mpnet-base-v2 = 768 dimensions
); Run context: "Writing article: Guide implant dentaire 2026. Keywords: implant dentaire, prix implant"
Step 1 — Embed the run context (local model, ~8ms)
Step 2 — Parallel search (×4 candidate multiplier)
BM25 search → 20 candidates (Score = 1/(1+rank))
Vector search → 20 candidates (Score = 1 - cosine_distance)
Step 3 — Weighted fusion
score = 0.7 × semantic_score + 0.3 × keyword_score
(chunk missing from one search → score = 0 for that side)
Step 4 — Filtering
• Minimum threshold: 0.35
• Sort descending by fused score
• Cap at 5 results
Results:
[copywriter] "Implant articles avg 91 vs ortho 72" score: 0.87
[human_note] "Dr. Martin wants clinical case studies" score: 0.72
[serp_scout] "'implant dentaire prix' — featured snippet" score: 0.68
[copywriter] "Rating 5/5 on 'Guide implant 2026'" score: 0.61
[signal_analyst] "Competitor launched 3 articles on pricing" score: 0.45 If Agentics needs 5 final results, each search returns 20 candidates. This surplus gives the fusion step more material to identify cross-matched results — observations that appear in both keyword and semantic searches naturally rise to the top, because they score on both dimensions.
# BM25 rank → normalized 0–1
def normalize_bm25(rank):
return 1 / (1 + rank)
# rank=0 (best) → score=1.0
# rank=1 → score=0.5
# rank=9 → score=0.1
# Cosine distance → similarity
def cosine_to_similarity(distance):
return 1 - distance
# distance=0.0 (identical) → score=1.0
# distance=0.2 → score=0.8
# distance=0.5 → score=0.5 Here's what makes Agentics different from generic RAG systems. The soul is split into fixed sections (Identity, Brand Voice, Domain Context, Rules) and dynamic sections (observations, notes, archive). Fixed sections are always injected in full — they're short and always relevant. Dynamic sections are filtered through soul_search():
get_smart_soul_context(project_id, agent_key="copywriter", run_context="write implant article")
1. Inject FULL: Identity + Brand Voice + Domain Context + Rules
→ Always relevant, always short (~500 tokens total)
2. Call soul_search(run_context) for observations
→ Returns 5 most relevant observations across agents
→ Replaces the old "dump everything" approach
3. Inject FULL: Performance Summary
→ Short, always useful (~60 tokens)
Result: precise context (~800 tokens) instead of full dump (~2500 tokens)
Same information value, 70% less noise. For strategic agents like Angela (the orchestrator) and the Briefing Writer, Agentics goes further. These agents receive soul_search and soul_get as callable tools — they actively decide when and what to search, rather than passively receiving pre-filtered results:
# Angela decides to investigate a cross-agent pattern
SOUL_TOOLS = [
{
"name": "soul_search",
"description": "Search the collective memory of all agents on this project.",
"input_schema": {
"type": "object",
"properties": {
"query": { "type": "string" },
"agent": { "type": "string", "description": "Limit to one agent (optional)" }
},
"required": ["query"]
}
},
{
"name": "soul_get",
"description": "Retrieve the full text of a specific observation by ID.",
"input_schema": {
"type": "object",
"properties": {
"chunk_id": { "type": "integer" }
},
"required": ["chunk_id"]
}
}
] This two-step pattern keeps the context window lean. The search returns short previews (200 chars). The agent only loads the full observation for what it actually needs. It's how a human uses a book index — scan the table of contents first, then open the right page.
This is where Agentics does something no other framework offers. Because all 12 agents' observations are indexed in the same SQLite file per project, any agent can search across the collective memory of all agents.
Concrete example: Angela is making weekly strategic decisions. She calls soul_search("keyword cannibalization implant") across all agents. Back come results from the SERP Scout (position data from 3 weeks ago), the Signal Analyst (competitor trend analysis), the Copywriter (which article topics overlapped), and a human note (the client wants to consolidate). Four agents, weeks of history, one query. No single agent had the full picture — but together, through the index, the pattern is clear.
This is enabled by a single architectural choice: one SQLite index per project, not per agent. Every observation from every agent is indexed in the same file. The agent_key column lets you filter to a single agent when needed, but the default is project-wide. The index doesn't care how many agents exist — add a 13th, a 25th agent, and their observations are automatically indexed and searchable.
One more critical piece. When agents accumulate more than 15 observations, a smart compression step summarizes the oldest ones into a few bullet points. This keeps the soul document readable. But the raw observations — with their specific scores, dates, and details — are archived in soul_observations_archive before compression happens. The archive is indexed in the SQLite hybrid index. So even observations that were compressed 3 months ago are still searchable through soul_search().
The soul shows the summary. The archive preserves the details. The index makes everything searchable. Nothing is ever lost.
Agentics is what happens when you take memory retrieval seriously for production AI agents. Not RAG bolted onto a chatbot — a purpose-built system where 12 autonomous agents accumulate, compress, search, and cross-reference months of operational intelligence.
The Soul System provides structure — three hierarchical levels (Identity, Global, Project), six sections per agent, auto-learning from pipeline execution, smart compression that preserves key data, and human-editable markdown that's never a black box.
The hybrid search provides precision — BM25 for exact scores, dates, and identifiers; local semantic embeddings for meaning and synonyms; weighted 70/30 fusion that favors understanding while preserving literal accuracy. All running locally, ~15-20ms per search, zero API cost.
Cross-agent intelligence provides the breakthrough — one index per project means any agent can search across all 12 agents' history. Patterns that no single agent could detect emerge when the orchestrator correlates weeks of SERP data, content performance, competitor movements, and human feedback in a single query.
The observation archive ensures nothing is ever lost — compressions keep the soul readable, but raw observations are preserved and indexed. An agent can retrieve a specific data point from 6 months ago, long after it was compressed out of the visible soul.
This is what "agents that remember" actually means. Not a marketing claim — a searchable, cross-referenced, auditable memory system with hybrid retrieval, running on dedicated infrastructure with zero third-party dependencies.
We architect and deploy autonomous agent teams with production-grade memory systems for SEO and digital marketing.