Taming LLM Output Chaos: A 3-Tier Normalisation Pattern

3-TIER LLM OUTPUT NORMALIZATION

You ask the LLM for “DRAINS” relationships.

You get:

drains, depletes, exhausts, causes_fatigue,
emotionally_draining, negatively_impacts,
energy_draining, leads_to_exhaustion,
saps_vitality, wears_out, causes_drain...

Eleven variations. Per concept. Per run.

If your application depends on consistent output from an LLM, you have a problem. I learned this the hard way while building Sentinel, a CLI tool that uses Cognee to detect energy conflicts in personal schedules.

This article shares the pattern I developed to normalise chaotic LLM output into predictable, application-ready data.

The Problem: LLMs Don’t Follow Instructions

I was building a knowledge graph where activities could DRAINS energy or REQUIRES focus. Simple enough. I wrote a custom extraction prompt:

**REQUIRED RELATIONSHIP TYPES** (use ONLY these exact names):
- DRAINS: Activity depletes energy/focus/motivation
- REQUIRES: Activity needs energy/focus/resources

The LLM nodded along and generated… whatever it felt like.

My collision detection algorithm expected DRAINS. Cognee’s LLM returned is_emotionally_draining. My BFS traversal found nothing. Tests passed (mocks are liars). Production was broken.

The harsh truth: Prompting alone gets you ~70% consistency. For the remaining 30%, you need a normalisation layer.

Why Prompting Isn’t Enough

I tried harder. I added examples. I used few-shot prompting. I YELLED IN CAPS.

**CRITICAL**: Use ONLY these relationship types:
- DRAINS (not "depletes", not "exhausts", not "causes_fatigue")

Result: The LLM now generated DRAINS 85% of the time. But also drains_energy, energy_draining, and my personal favourite: negatively_impacts_emotional_state.

The LLM understands semantics, not syntax. It knows these concepts are equivalent. It doesn’t care about your string matching.

My detection rate: 15-20% of actual collisions found.

The Solution: 3-Tier Normalisation

The pattern I landed on processes LLM output through three increasingly fuzzy matching tiers:

┌─────────────────────────────────────────────────────────┐
│                    LLM Output                           │
│              "totally_exhausting_day"                   │
└─────────────────────┬───────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────┐
│  TIER 1: Exact Match (O(1))                             │
│  ───────────────────────────                            │
│  Dictionary lookup. Instant.                            │
│  "drains" → DRAINS ✓                                    │
│  "totally_exhausting_day" → miss, try next tier         │
└─────────────────────┬───────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────┐
│  TIER 2: Keyword Match (O(n))                           │
│  ────────────────────────────                           │
│  Check if input contains known stems.                   │
│  "totally_exhausting_day" contains "exhaust" → DRAINS ✓ │
└─────────────────────┬───────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────┐
│  TIER 3: Fuzzy Match (O(n×m))                           │
│  ─────────────────────────────                          │
│  RapidFuzz similarity scoring against candidates.       │
│  "reduces vitality" ~= "reduces energy" (82%) → DRAINS  │
└─────────────────────────────────────────────────────────┘

Key insight: Most lookups hit Tier 1. Only novel variations fall through to expensive fuzzy matching. Performance stays fast; coverage stays complete.

Implementation

Tier 1: Exact Match Dictionary

The fast path. Cover every variation you’ve ever seen:

RELATION_TYPE_MAP: dict[str, str] = {
    # DRAINS mappings
    "drains": "DRAINS",
    "depletes": "DRAINS",
    "exhausts": "DRAINS",
    "tires": "DRAINS",
    "fatigues": "DRAINS",
    "causes_fatigue": "DRAINS",
    "drains_energy": "DRAINS",
    "is_emotionally_draining": "DRAINS",
    "emotionally_draining": "DRAINS",
    "causes_exhaustion": "DRAINS",
    "negatively_impacts": "DRAINS",
    "leads_to_exhaustion": "DRAINS",
    # ... 70+ more entries across all types

    # REQUIRES mappings
    "requires": "REQUIRES",
    "needs": "REQUIRES",
    "demands": "REQUIRES",
    "depends_on": "REQUIRES",
    "requires_focus": "REQUIRES",
    "needs_energy": "REQUIRES",
    # ...
}

Lookup:

def map_relation_type(raw_type: str) -> str | None:
    normalized = raw_type.lower()

    # Tier 1: Exact match (O(1))
    if normalized in RELATION_TYPE_MAP:
        return RELATION_TYPE_MAP[normalized]

    # Fall through to Tier 2...

When to grow this dictionary: Every time you see a new variation in logs, add it. This dictionary is your institutional knowledge of LLM quirks.

Tier 2: Keyword Matching

When an exact match fails, check for semantic keywords. Use word stems to catch variations:

SEMANTIC_KEYWORDS: dict[str, list[str]] = {
    "DRAINS": [
        "drain",     # matches: draining, drained, drains
        "exhaust",   # matches: exhausting, exhausted, exhaustion
        "deplet",    # matches: depleting, depleted, depletion
        "fatigue",   # matches: fatiguing, fatigued
        "tire",      # matches: tiring, tired
        "sap",       # matches: sapping, sapped
        "stress",    # matches: stressing, stressed, stressful
    ],
    "REQUIRES": [
        "require",   # matches: requiring, required, requirement
        "need",      # matches: needing, needed, needs
        "demand",    # matches: demanding, demanded
        "depend",    # matches: depending, dependent, dependency
    ],
    "CONFLICTS_WITH": [
        "conflict",  # matches: conflicting, conflicted
        "clash",     # matches: clashing, clashed
        "interfer",  # matches: interfering, interference
        "hinder",    # matches: hindering, hindered
    ],
}

def _keyword_match_relation(relation_type: str) -> str | None:
    normalized = relation_type.lower()

    for canonical_type, keywords in SEMANTIC_KEYWORDS.items():
        for keyword in keywords:
            if keyword in normalized:
                return canonical_type

    return None

Why stems work: The LLM might say energy_depleting or causes_depletion or depleted_state. The stem deplet catches all three.

Tier 3: Fuzzy Matching with RapidFuzz

For truly novel outputs, use semantic similarity scoring:

from rapidfuzz import fuzz, process

FUZZY_CANDIDATES: dict[str, list[str]] = {
    "DRAINS": [
        "drains",
        "drains energy",
        "emotionally draining",
        "causes exhaustion",
        "leads to fatigue",
        "reduces energy",
        "saps energy",
        "wears out",
    ],
    "REQUIRES": [
        "requires",
        "needs",
        "demands",
        "depends on",
        "prerequisite for",
        "essential for",
    ],
    # ...
}

DEFAULT_FUZZY_THRESHOLD: int = 50  # Tune based on your domain

def _fuzzy_match_relation(
    relation_type: str,
    threshold: int = DEFAULT_FUZZY_THRESHOLD
) -> str | None:
    # Normalize: lowercase and spaces instead of underscores
    normalized = relation_type.lower().replace("_", " ")

    best_match: str | None = None
    best_score: float = 0.0

    for canonical_type, candidates in FUZZY_CANDIDATES.items():
        result = process.extractOne(
            normalized,
            candidates,
            scorer=fuzz.WRatio
        )
        if result is not None:
            match_str, score, _ = result
            if score > best_score and score >= threshold:
                best_score = score
                best_match = canonical_type

    return best_match

Why RapidFuzz? It’s fast (written in C++), handles partial matches well, and WRatio balances token order flexibility with precision.

Putting It Together

The main mapping function chains all three tiers:

def _map_cognee_relation_to_edge(relation: dict) -> Edge | None:
    relation_type = relation.get("type", "").lower()

    # Tier 1: Exact match (fastest)
    edge_type = RELATION_TYPE_MAP.get(relation_type)
    if edge_type:
        return Edge(
            relationship=edge_type,
            metadata={"match_tier": "exact"},
            # ...
        )

    # Tier 2: Keyword match
    edge_type = _keyword_match_relation(relation_type)
    if edge_type:
        return Edge(
            relationship=edge_type,
            metadata={"match_tier": "keyword"},
            # ...
        )

    # Tier 3: Fuzzy match (slowest, most flexible)
    edge_type = _fuzzy_match_relation(relation_type)
    if edge_type:
        return Edge(
            relationship=edge_type,
            metadata={"match_tier": "fuzzy"},
            # ...
        )

    # No match - log and filter out
    logger.warning("Unknown relation type: %s", relation_type)
    return None

Notice the metadata. Tracking which tier matched helps you:

  1. Debug why something matched (or didn’t)
  2. Know when to add new exact mappings (if Tier 3 hits the same input repeatedly)
  3. Monitor your coverage over time

Code simplified for clarity. The actual implementation handles full Edge construction with source/target IDs and confidence scores.

Testing the Pattern

Parametrized tests cover all tiers:

import pytest

@pytest.mark.parametrize("input_type,expected_type,expected_tier", [
    # Tier 1: Exact matches
    ("drains", "DRAINS", "exact"),
    ("requires", "REQUIRES", "exact"),
    ("emotionally_draining", "DRAINS", "exact"),

    # Tier 2: Keyword matches
    ("energy_depleting", "DRAINS", "keyword"),
    ("absolutely_exhausting", "DRAINS", "keyword"),
    ("strongly_dependent", "REQUIRES", "keyword"),

    # Tier 3: Fuzzy matches (no exact or keyword match)
    ("reduces vitality", "DRAINS", "fuzzy"),
    ("critical for success", "REQUIRES", "fuzzy"),
])
def test_3_tier_mapping(input_type, expected_type, expected_tier):
    result = _map_cognee_relation_to_edge({
        "type": input_type,
        "source_id": "a",
        "target_id": "b",
    })

    assert result is not None
    assert result.relationship == expected_type
    assert result.metadata["match_tier"] == expected_tier

Critical: Add real LLM outputs to this test as you discover them. Your test suite becomes documentation of LLM behaviour.

Results

Before the 3-tier pattern:

  • Collision detection rate: 15-20%
  • Unmatched relation types: 11+ per run

After:

  • Collision detection rate: 100%
  • Unmatched relation types: 0
  • Performance impact: Negligible (most lookups hit Tier 1’s O(1) dictionary)

The journey from broken to working required:

  1. Custom extraction prompt (got edge types right)
  2. 3-tier normalisation (got relation mapping right)
  3. Semantic node consolidation (got node matching right)

Each layer catches what the previous layer misses.

Note: Semantic node consolidation is a separate module that merges equivalent nodes (e.g., “emotional_exhaustion” and “low_energy”) using similar RapidFuzz techniques. The 3-tier pattern handles relation types; consolidation handles node identity. Both are needed for reliable graph traversal.

When to Use This Pattern

Use 3-tier normalisation when:

  • Your application depends on categorical LLM output
  • You’re building knowledge graphs, classification systems, or structured extraction
  • Prompt engineering alone isn’t achieving consistency
  • You need to handle novel variations gracefully

Consider alternatives when:

  • Output is free-form text (summarization, chat)
  • You control the training/fine-tuning
  • Exact matching is acceptable (you can reject unknowns)

Key Takeaways

  1. Prompting is necessary but not sufficient. Even great prompts get ~70-85% consistency.

  2. Build the fast path first. Exact match dictionary handles most cases in O(1).

  3. Use word stems for keyword matching. Catches inflections without regex complexity.

  4. Fuzzy matching is your safety net. RapidFuzz with tunable thresholds handles novel outputs.

  5. Track which tier matched. Metadata enables debugging and coverage monitoring.

  6. Grow your dictionaries from production. Every new variation you see should become an exact match.

Resources

  • Sentinel on GitHub – The project where this pattern was developed
  • Cognee – Knowledge graph library for LLM applications
  • RapidFuzz – Fast fuzzy string matching

Built while participating in the Cognee Mini Challenge 2026. The full implementation is in src/sentinel/core/engine.py.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

SOLID com Node.js e TypeScript: Escrevendo Código de Gente Grande

Next Post

Automate Screenshots in Sphinx Documentation (With Dark Mode)

Related Posts