You ask the LLM for “DRAINS” relationships.
You get:
drains, depletes, exhausts, causes_fatigue,
emotionally_draining, negatively_impacts,
energy_draining, leads_to_exhaustion,
saps_vitality, wears_out, causes_drain...
Eleven variations. Per concept. Per run.
If your application depends on consistent output from an LLM, you have a problem. I learned this the hard way while building Sentinel, a CLI tool that uses Cognee to detect energy conflicts in personal schedules.
This article shares the pattern I developed to normalise chaotic LLM output into predictable, application-ready data.
The Problem: LLMs Don’t Follow Instructions
I was building a knowledge graph where activities could DRAINS energy or REQUIRES focus. Simple enough. I wrote a custom extraction prompt:
**REQUIRED RELATIONSHIP TYPES** (use ONLY these exact names):
- DRAINS: Activity depletes energy/focus/motivation
- REQUIRES: Activity needs energy/focus/resources
The LLM nodded along and generated… whatever it felt like.
My collision detection algorithm expected DRAINS. Cognee’s LLM returned is_emotionally_draining. My BFS traversal found nothing. Tests passed (mocks are liars). Production was broken.
The harsh truth: Prompting alone gets you ~70% consistency. For the remaining 30%, you need a normalisation layer.
Why Prompting Isn’t Enough
I tried harder. I added examples. I used few-shot prompting. I YELLED IN CAPS.
**CRITICAL**: Use ONLY these relationship types:
- DRAINS (not "depletes", not "exhausts", not "causes_fatigue")
Result: The LLM now generated DRAINS 85% of the time. But also drains_energy, energy_draining, and my personal favourite: negatively_impacts_emotional_state.
The LLM understands semantics, not syntax. It knows these concepts are equivalent. It doesn’t care about your string matching.
My detection rate: 15-20% of actual collisions found.
The Solution: 3-Tier Normalisation
The pattern I landed on processes LLM output through three increasingly fuzzy matching tiers:
┌─────────────────────────────────────────────────────────┐
│ LLM Output │
│ "totally_exhausting_day" │
└─────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ TIER 1: Exact Match (O(1)) │
│ ─────────────────────────── │
│ Dictionary lookup. Instant. │
│ "drains" → DRAINS ✓ │
│ "totally_exhausting_day" → miss, try next tier │
└─────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ TIER 2: Keyword Match (O(n)) │
│ ──────────────────────────── │
│ Check if input contains known stems. │
│ "totally_exhausting_day" contains "exhaust" → DRAINS ✓ │
└─────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ TIER 3: Fuzzy Match (O(n×m)) │
│ ───────────────────────────── │
│ RapidFuzz similarity scoring against candidates. │
│ "reduces vitality" ~= "reduces energy" (82%) → DRAINS │
└─────────────────────────────────────────────────────────┘
Key insight: Most lookups hit Tier 1. Only novel variations fall through to expensive fuzzy matching. Performance stays fast; coverage stays complete.
Implementation
Tier 1: Exact Match Dictionary
The fast path. Cover every variation you’ve ever seen:
RELATION_TYPE_MAP: dict[str, str] = {
# DRAINS mappings
"drains": "DRAINS",
"depletes": "DRAINS",
"exhausts": "DRAINS",
"tires": "DRAINS",
"fatigues": "DRAINS",
"causes_fatigue": "DRAINS",
"drains_energy": "DRAINS",
"is_emotionally_draining": "DRAINS",
"emotionally_draining": "DRAINS",
"causes_exhaustion": "DRAINS",
"negatively_impacts": "DRAINS",
"leads_to_exhaustion": "DRAINS",
# ... 70+ more entries across all types
# REQUIRES mappings
"requires": "REQUIRES",
"needs": "REQUIRES",
"demands": "REQUIRES",
"depends_on": "REQUIRES",
"requires_focus": "REQUIRES",
"needs_energy": "REQUIRES",
# ...
}
Lookup:
def map_relation_type(raw_type: str) -> str | None:
normalized = raw_type.lower()
# Tier 1: Exact match (O(1))
if normalized in RELATION_TYPE_MAP:
return RELATION_TYPE_MAP[normalized]
# Fall through to Tier 2...
When to grow this dictionary: Every time you see a new variation in logs, add it. This dictionary is your institutional knowledge of LLM quirks.
Tier 2: Keyword Matching
When an exact match fails, check for semantic keywords. Use word stems to catch variations:
SEMANTIC_KEYWORDS: dict[str, list[str]] = {
"DRAINS": [
"drain", # matches: draining, drained, drains
"exhaust", # matches: exhausting, exhausted, exhaustion
"deplet", # matches: depleting, depleted, depletion
"fatigue", # matches: fatiguing, fatigued
"tire", # matches: tiring, tired
"sap", # matches: sapping, sapped
"stress", # matches: stressing, stressed, stressful
],
"REQUIRES": [
"require", # matches: requiring, required, requirement
"need", # matches: needing, needed, needs
"demand", # matches: demanding, demanded
"depend", # matches: depending, dependent, dependency
],
"CONFLICTS_WITH": [
"conflict", # matches: conflicting, conflicted
"clash", # matches: clashing, clashed
"interfer", # matches: interfering, interference
"hinder", # matches: hindering, hindered
],
}
def _keyword_match_relation(relation_type: str) -> str | None:
normalized = relation_type.lower()
for canonical_type, keywords in SEMANTIC_KEYWORDS.items():
for keyword in keywords:
if keyword in normalized:
return canonical_type
return None
Why stems work: The LLM might say energy_depleting or causes_depletion or depleted_state. The stem deplet catches all three.
Tier 3: Fuzzy Matching with RapidFuzz
For truly novel outputs, use semantic similarity scoring:
from rapidfuzz import fuzz, process
FUZZY_CANDIDATES: dict[str, list[str]] = {
"DRAINS": [
"drains",
"drains energy",
"emotionally draining",
"causes exhaustion",
"leads to fatigue",
"reduces energy",
"saps energy",
"wears out",
],
"REQUIRES": [
"requires",
"needs",
"demands",
"depends on",
"prerequisite for",
"essential for",
],
# ...
}
DEFAULT_FUZZY_THRESHOLD: int = 50 # Tune based on your domain
def _fuzzy_match_relation(
relation_type: str,
threshold: int = DEFAULT_FUZZY_THRESHOLD
) -> str | None:
# Normalize: lowercase and spaces instead of underscores
normalized = relation_type.lower().replace("_", " ")
best_match: str | None = None
best_score: float = 0.0
for canonical_type, candidates in FUZZY_CANDIDATES.items():
result = process.extractOne(
normalized,
candidates,
scorer=fuzz.WRatio
)
if result is not None:
match_str, score, _ = result
if score > best_score and score >= threshold:
best_score = score
best_match = canonical_type
return best_match
Why RapidFuzz? It’s fast (written in C++), handles partial matches well, and WRatio balances token order flexibility with precision.
Putting It Together
The main mapping function chains all three tiers:
def _map_cognee_relation_to_edge(relation: dict) -> Edge | None:
relation_type = relation.get("type", "").lower()
# Tier 1: Exact match (fastest)
edge_type = RELATION_TYPE_MAP.get(relation_type)
if edge_type:
return Edge(
relationship=edge_type,
metadata={"match_tier": "exact"},
# ...
)
# Tier 2: Keyword match
edge_type = _keyword_match_relation(relation_type)
if edge_type:
return Edge(
relationship=edge_type,
metadata={"match_tier": "keyword"},
# ...
)
# Tier 3: Fuzzy match (slowest, most flexible)
edge_type = _fuzzy_match_relation(relation_type)
if edge_type:
return Edge(
relationship=edge_type,
metadata={"match_tier": "fuzzy"},
# ...
)
# No match - log and filter out
logger.warning("Unknown relation type: %s", relation_type)
return None
Notice the metadata. Tracking which tier matched helps you:
- Debug why something matched (or didn’t)
- Know when to add new exact mappings (if Tier 3 hits the same input repeatedly)
- Monitor your coverage over time
Code simplified for clarity. The actual implementation handles full Edge construction with source/target IDs and confidence scores.
Testing the Pattern
Parametrized tests cover all tiers:
import pytest
@pytest.mark.parametrize("input_type,expected_type,expected_tier", [
# Tier 1: Exact matches
("drains", "DRAINS", "exact"),
("requires", "REQUIRES", "exact"),
("emotionally_draining", "DRAINS", "exact"),
# Tier 2: Keyword matches
("energy_depleting", "DRAINS", "keyword"),
("absolutely_exhausting", "DRAINS", "keyword"),
("strongly_dependent", "REQUIRES", "keyword"),
# Tier 3: Fuzzy matches (no exact or keyword match)
("reduces vitality", "DRAINS", "fuzzy"),
("critical for success", "REQUIRES", "fuzzy"),
])
def test_3_tier_mapping(input_type, expected_type, expected_tier):
result = _map_cognee_relation_to_edge({
"type": input_type,
"source_id": "a",
"target_id": "b",
})
assert result is not None
assert result.relationship == expected_type
assert result.metadata["match_tier"] == expected_tier
Critical: Add real LLM outputs to this test as you discover them. Your test suite becomes documentation of LLM behaviour.
Results
Before the 3-tier pattern:
- Collision detection rate: 15-20%
- Unmatched relation types: 11+ per run
After:
- Collision detection rate: 100%
- Unmatched relation types: 0
- Performance impact: Negligible (most lookups hit Tier 1’s O(1) dictionary)
The journey from broken to working required:
- Custom extraction prompt (got edge types right)
- 3-tier normalisation (got relation mapping right)
- Semantic node consolidation (got node matching right)
Each layer catches what the previous layer misses.
Note: Semantic node consolidation is a separate module that merges equivalent nodes (e.g., “emotional_exhaustion” and “low_energy”) using similar RapidFuzz techniques. The 3-tier pattern handles relation types; consolidation handles node identity. Both are needed for reliable graph traversal.
When to Use This Pattern
Use 3-tier normalisation when:
- Your application depends on categorical LLM output
- You’re building knowledge graphs, classification systems, or structured extraction
- Prompt engineering alone isn’t achieving consistency
- You need to handle novel variations gracefully
Consider alternatives when:
- Output is free-form text (summarization, chat)
- You control the training/fine-tuning
- Exact matching is acceptable (you can reject unknowns)
Key Takeaways
-
Prompting is necessary but not sufficient. Even great prompts get ~70-85% consistency.
-
Build the fast path first. Exact match dictionary handles most cases in O(1).
-
Use word stems for keyword matching. Catches inflections without regex complexity.
-
Fuzzy matching is your safety net. RapidFuzz with tunable thresholds handles novel outputs.
-
Track which tier matched. Metadata enables debugging and coverage monitoring.
-
Grow your dictionaries from production. Every new variation you see should become an exact match.
Resources
- Sentinel on GitHub – The project where this pattern was developed
- Cognee – Knowledge graph library for LLM applications
- RapidFuzz – Fast fuzzy string matching
Related Articles
- Why Your Calendar App Misses the Real Conflicts – The “why” behind Sentinel (general audience)
- Building a CLI Tool with Cognee: Lessons from 5 Epics – Cognee-specific integration lessons
Built while participating in the Cognee Mini Challenge 2026. The full implementation is in src/sentinel/core/engine.py.
