Software

10 minute read

# Pre-Execution Gates: How to Block Before You Execute (Part 2/3)

April 22, 2026

This is Part 2 of a three-part series on AI governance architecture. In Part 1, we explored why signed receipts can’t solve the negative proof problem—the challenge of proving that unauthorized actions didn’t happen. Today, we’ll examine the architectural pattern that does solve it: pre-execution gates that evaluate governance policy before any AI execution occurs.

Note: This series explores architectural patterns for AI governance based on regulatory requirements and cryptographic best practices. Code examples are simplified illustrations for educational purposes, not production implementations. The patterns discussed apply broadly across different tech stacks and deployment environments.

In Part 1, we established that receipt-based governance systems face a fundamental limitation. They’re excellent at proving what happened, but they cannot prove what didn’t happen. When HIPAA requires that you prevent unauthorized PHI access, or when PCI DSS mandates preventing cardholder data access beyond need-to-know, receipts showing proper access don’t address the core requirement. The regulation isn’t asking for detection—it’s demanding prevention.

The architectural pattern that solves this problem is conceptually straightforward but requires rethinking where governance evaluation occurs in your AI request flow. Instead of logging decisions after execution completes, you evaluate governance policy before execution begins. The AI request cannot proceed until that evaluation completes. If the policy says DENY, execution is blocked. The model never gets called, the tool never gets invoked, the data never gets accessed.

This might sound like a small change in sequencing, but it creates a fundamentally different kind of evidence artifact. Instead of a receipt proving “here’s what happened,” you get a denial proof demonstrating “here’s what was prevented from happening.” That distinction is what makes negative proofs possible.

Understanding the Timeline Difference

The clearest way to see why this matters is to compare the execution timelines side by side. Let’s start with how a receipt-based system handles a request.

In a receipt-based architecture, the sequence looks like this. First, your request arrives at the AI system’s entry point. Maybe that’s an API endpoint, maybe it’s a message queue, maybe it’s a function call inside your application code. Wherever it enters, the system immediately begins processing it. The AI model gets invoked with the request payload. The model generates a response based on its training and the input it received. Your application processes that response and potentially takes actions based on it—updating a database, calling external APIs, returning results to a user. Only after all of that execution completes does the governance layer get involved. It creates a receipt documenting what just happened. That receipt gets signed cryptographically to prevent tampering, then stored in your audit log for future review.

Notice what this means: execution happened first, then governance was applied. The system evaluated “did this request follow policy?” after the request had already completed. If the answer turns out to be no, you have a receipt documenting the policy violation, but the violation itself already occurred. The unauthorized data access already happened, the prohibited action already executed, the boundary already got crossed.

Now let’s look at a pre-execution gate architecture. The request still arrives at your system’s entry point, but what happens next is different. Before any execution occurs, before the AI model gets called, before any tools get invoked, the request passes through a governance evaluation layer. This layer loads the policy that applies to this request—which might be tenant-specific, folder-specific, or role-specific depending on your system design. It evaluates whether the request should be allowed based on that policy. If the policy returns ALLOW, execution proceeds normally and the system generates a receipt just like the receipt-based architecture would. But if the policy returns DENY, something different happens: execution is blocked entirely. The model call never happens, the tool invocation never occurs, the data access is prevented. Instead of a receipt for a completed action, the system generates a denial proof showing that the governance layer blocked an unauthorized request.

The critical architectural difference is in what can happen after the policy evaluation. In a receipt-based system, execution already occurred, so a DENY decision is just creating documentation of a violation. In a gate-based system, execution hasn’t happened yet, so a DENY decision actually prevents the violation from occurring. That’s the shift from detection to prevention.

What This Looks Like in Code

Let’s make this concrete with a simplified implementation. Here’s what a pre-execution gate looks like in a serverless AI architecture running on AWS Lambda and Bedrock. The specifics of the cloud platform don’t matter much—the pattern works equally well on other infrastructure. What matters is the sequence of operations and where governance evaluation occurs relative to execution.

async def handle_ai_request(request, context):
    """
    ExecutionRouter - this runs BEFORE any AI execution.
    Every AI request passes through here with no bypass paths.
    """

    # Step 1: Authenticate the caller
    # We need to know who's making this request before we can evaluate
    # whether they're allowed to do what they're asking for
    caller_identity = validate_jwt(request.headers['Authorization'])

    # Step 2: Resolve tenant and folder context
    # Governance policies are scoped to organizational boundaries,
    # so we need to know which tenant and folder this request belongs to
    tenant_id = caller_identity.tenant_id
    folder_id = request.body.get('folder_id')

    # Step 3: Load the governing policy
    # Policies are versioned immutably so we can prove which rules
    # were in effect when decisions were made
    policy = get_policy(tenant_id, folder_id)

    # Step 4: Evaluate request against policy BEFORE execution
    # This is the pre-execution gate - nothing proceeds until this completes
    decision = evaluate_policy(
        request=request.body,
        policy=policy,
        caller=caller_identity
    )

    # Step 5a: If DENY, block execution and generate denial proof
    # Note that invoke_bedrock_model is never called in this branch
    if decision.verdict == 'DENY':
        # Create proof showing what was prevented
        denial_proof = generate_denial_proof(
            request_hash=hash_request(request.body),
            policy_version=policy.version_hash,
            rule_fired=decision.rule_id,
            timestamp=utcnow(),
            reason=decision.reason
        )

        # Sign with KMS to make tampering detectable
        signed_proof = kms_sign(denial_proof)

        # Store in audit ledger for compliance queries
        store_denial(signed_proof)

        # Return 403 with the signed proof
        # The caller gets evidence that governance prevented their request
        return {
            'statusCode': 403,
            'body': {
                'error': 'Governance policy denied this request',
                'denial_proof': signed_proof,
                'reason': decision.reason
            }
        }

    # Step 5b: If ALLOW, now we can proceed with execution
    # This is the only code path that reaches the model
    result = await invoke_bedrock_model(request.body)

    # Step 6: Generate receipt for allowed execution
    # This works just like receipt-based systems for allowed requests
    receipt = generate_receipt(
        request=request.body,
        response=result,
        policy_version=policy.version_hash,
        timestamp=utcnow()
    )

    signed_receipt = kms_sign(receipt)
    store_receipt(signed_receipt)

    return {
        'statusCode': 200,
        'body': result,
        'headers': {
            'X-Governance-Receipt': signed_receipt.id
        }
    }

The key architectural constraint is that model execution must be unreachable if policy evaluation returns DENY. How you enforce this depends on your infrastructure—it might be IAM policies preventing direct model API access, network segmentation that requires routing through the governance layer, or application-level controls that make the execution path conditional on policy decisions. The critical requirement is that there’s no code path, no bypass route, and no error handler that circumvents the gate.

In cloud environments, this typically means using your platform’s access control systems to enforce the constraint. Even if a developer tried to call the model directly from elsewhere in your codebase, the infrastructure access policies would prevent it because model APIs are only accessible through the governance router.

This structural enforcement is fundamentally different from adding logging to an existing execution flow. Many organizations start with a working AI system, then add governance by wrapping function calls in logging statements. That approach creates receipts but doesn’t create gates. The gates pattern requires that governance evaluation be mandatory and blocking, not optional and observational.

Why Determinism Becomes Essential

Once you implement pre-execution gates, you inherit a new requirement that receipt-based systems can often ignore: your policy evaluation must be deterministic. If you evaluate the same request against the same policy twice, you must get the same decision both times. No randomness, no time-dependent logic that might produce different results on different days, no external API calls that might return different data.

This matters because deterministic evaluation enables replay verification, which is how you prove to an auditor that a denial actually happened and wasn’t fabricated. The verification process works like this.

An auditor pulls up one of your denial proofs and wants to verify its authenticity. They start by retrieving the policy version that was in effect when the denial occurred. Your system stored that policy immutably, so they get exactly the same policy document that was used for the original decision. Next, they retrieve the original request, or at least a hash of it that’s included in the denial proof. Then comes the crucial step: they re-run the policy evaluation using the original request and the original policy. If your policy engine is deterministic, this replay evaluation must produce the same DENY decision with the same reason code.

If the replay produces a different decision, something is wrong. Either the policy was mutated after the fact, which should be impossible if you’re versioning policies immutably, or the governance engine itself is non-deterministic, which means you can’t trust any of its decisions. The determinism requirement is what makes denial proofs verifiable and therefore trustworthy.

Receipt-based systems can often get away with non-deterministic logging because they’re just documenting what happened, not making enforce-or-allow decisions that need to be reproducible. But once you’re blocking execution based on policy evaluation, reproducibility becomes mandatory. An auditor needs to be able to confirm that the policy would still produce a DENY decision if evaluated again with the same inputs.

Here’s what a deterministic policy evaluation looks like for the cross-patient PHI access scenario from Part 1:

def evaluate_folder_isolation_policy(request, policy):
    """
    Deterministic evaluation - same request + same policy = same decision.
    No external API calls, no time-dependent logic, no random values.
    """

    # Extract request context
    source_folder = request.folder_id
    target_folder = request.target_folder_id
    data_classification = request.target_data_class

    # Load policy rule (from the policy document, not external system)
    rule = policy.get_rule('prevent_cross_folder_phi_access')

    # Evaluate deterministically
    if source_folder != target_folder and data_classification == 'PHI':
        return Decision(
            verdict='DENY',
            rule_id=rule.id,
            reason=f"Cross-folder PHI access denied per {rule.regulatory_basis}",
            policy_version=policy.version_hash
        )

    return Decision(verdict='ALLOW')

Notice what this policy doesn’t do. It doesn’t call an external API to check whether cross-folder access is allowed. It doesn’t query a database to see if there’s an active sharing relationship. It doesn’t check the current time to see if we’re in an allowed time window. All of those patterns would make the policy evaluation non-deterministic, which would break replay verification. Instead, the policy rule is self-contained: it examines the request itself and makes a decision based solely on the data in that request and the rules in the policy document.

This doesn’t mean you can’t have sophisticated governance logic. You can absolutely have complex rules that consider many factors. But those factors need to come from the request context or from the policy document itself, not from external state that might change between the original evaluation and a replay verification.

Solving the Performance Problem

The obvious concern with pre-execution gates is latency. If every AI request has to pass through a policy evaluation layer before execution can begin, doesn’t that add overhead that might be unacceptable for latency-sensitive applications?

Yes, it does add overhead. That’s not something to handwave away—it’s a real tradeoff that you need to account for in your architecture. But the overhead is manageable if you design your policy evaluation with performance in mind.

The pattern that works well in practice is fast-path synchronous evaluation with async fallback. You try to evaluate the policy synchronously with a tight timeout, typically 50 milliseconds or less. Most governance rules are simple enough that they evaluate in single-digit milliseconds: folder isolation checks, budget verifications, PII masking rules. These run fast because they’re just comparing values from the request against thresholds or patterns defined in the policy.

If the fast-path evaluation completes within your timeout, you get a decision immediately and execution proceeds with minimal added latency. But if the policy evaluation times out—maybe because the policy is complex, maybe because it requires some expensive computation—you fall back to async evaluation. The system enqueues the evaluation as a background job, returns a provisional ALLOW to let execution proceed, but flags the result for review.

Here’s what that looks like in code:

async def evaluate_policy_with_fallback(request, policy, caller):
    """
    Try fast synchronous evaluation first, fall back to async if needed.
    Most requests take the fast path. Complex policies hit async fallback.
    """

    try:
        # Fast path: evaluate with 50ms timeout
        # This handles 95%+ of requests in production
        decision = await evaluate_policy_fast(
            request=request,
            policy=policy,
            timeout_ms=50
        )
        return decision

    except TimeoutError:
        # Fast path timed out, use async fallback
        # This is rare but necessary for complex policies
        job_id = enqueue_async_evaluation(request, policy)

        # Return provisional ALLOW so execution isn't blocked
        # But flag this for later review when async eval completes
        return Decision(
            verdict='ALLOW',
            provisional=True,
            async_job_id=job_id,
            reason='Policy evaluation delegated to async worker'
        )

The async fallback pattern means you’re not blocking execution indefinitely waiting for slow policy evaluations to complete. But you’re also not just giving up on governance for complex policies. If the async evaluation later returns DENY, that gets surfaced as a compliance alert that your security team can investigate. This is still better than having no gate at all, because the decision is being evaluated and logged even if it can’t enforce in real time.

Many organizations run both patterns in parallel during initial rollout to reduce risk. They start with observer mode on all surfaces: the gate evaluates policy but always returns ALLOW, so nothing gets blocked while they validate that policy rules are working correctly. Denials are logged with full denial proofs, but execution proceeds. This lets you build confidence in your policies without risking production breakage.

Once you’ve validated that observer mode is working well, you enable enforcer mode selectively. Typically organizations start with high-risk surfaces like data export and cross-tenant access where the blast radius of blocking something incorrectly is manageable and the security benefit of enforcement is high. Lower-risk surfaces like model selection or tool invocation might stay in observer mode longer while you refine the policies.

What We’ve Established

At this point, we’ve covered the core concepts of pre-execution gates: they evaluate policy before execution rather than after, they create denial proofs rather than just violation receipts, they require deterministic policy evaluation to enable replay verification, and they can be implemented with acceptable performance overhead using fast-path evaluation and async fallback.

What we haven’t covered yet is how to actually build a complete pre-execution gate system in production. That’s what Part 3 will tackle: a layered reference architecture that shows you exactly which components you need, how they fit together, what each layer is responsible for, and when you can get away with simpler receipt-based systems versus when pre-execution gates become mandatory.

We’ll also explore the policy design principles that make gates practical to operate. Not every governance rule belongs in a pre-execution gate. Some controls are better implemented as detective measures that analyze patterns over time. Figuring out which goes where is part of building a governance architecture that’s both secure and operationally sustainable.

Read Part 1: The Negative Proof Problem in AI Governance

Read Part 3: Building a Production-Ready AI Governance Stack [coming soon]

AEO metrics every marketer should track in 2026

April 22, 2026

AI - Artificial-Intelligence

Google Cloud launches two new AI chips to compete with Nvidia

April 22, 2026

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

tl.extend — Register Custom CSS Variants Anywhere in Your Codebase, No Central Config Required

The running list: major tech layoffs in 2026 where employers cited AI

ChatGPT Market Share Falls Below 50%: What Gemini and Claude’s Surge Means for Developers (June 2026)

Trending Tags

# Pre-Execution Gates: How to Block Before You Execute (Part 2/3)

Understanding the Timeline Difference

What This Looks Like in Code

Why Determinism Becomes Essential

Solving the Performance Problem

What We’ve Established

Leave a Reply Cancel reply

Previous Post

AEO metrics every marketer should track in 2026

Next Post

Google Cloud launches two new AI chips to compete with Nvidia

# Pre-Execution Gates: How to Block Before You Execute (Part 2/3)

Understanding the Timeline Difference

What This Looks Like in Code

Why Determinism Becomes Essential

Solving the Performance Problem

What We’ve Established

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts