The Pitfalls of Test Coverage: Introducing Mutation Testing with Stryker and Cosmic Ray

Overview

  • Goal: Overcome the limitations of Code Coverage metrics and introduce ‘Mutation Testing’ to verify if test codes actually catch errors in business logic.
  • Scope: Core modules of the enterprise orchestrator project (Ochestrator) in both Frontend (TypeScript) and Backend (Python).
  • Expected Results: Improve code stability and test reliability by securing a ‘Mutation Score’ beyond simple line coverage.

We often believe that high test coverage means safe code. However, it’s difficult to answer the question: “Who tests the tests?” Tests that simply execute code without proper assertions still contribute to coverage metrics. To solve this ‘coverage trap’, we introduced mutation testing.

Mutation Testing Flow

Implementation

1. TypeScript Environment: Introducing Stryker Mutator

For the TypeScript environment, including frontend and common utilities, we chose Stryker. It integrates well with Vitest and is easy to configure.

  • Tech Stack: TypeScript, Vitest, Stryker Mutator
  • Key Configuration (stryker.config.json):
  {
    "testRunner": "vitest",
    "reporters": ["html", "clear-text", "progress"],
    "concurrency": 4,
    "incremental": true,
    "mutate": [
      "src/utils/**/*.ts",
      "src/services/**/*.ts"
    ]
  }

We enabled the incremental option to efficiently perform tests only on changed files.

2. Python Environment: Introducing Cosmic Ray

For the backend environment, we introduced Cosmic Ray. It generates powerful mutations by manipulating the AST (Abstract Syntax Tree) using Python’s dynamic nature.

  • Tech Stack: Python, Pytest, Cosmic Ray, Docker
  • Execution Architecture: Since mutation testing consumes significant computational resources, we configured it to run in parallel across multiple workers using Docker.
  # Partial docker-compose.test.yaml
  cosmic-worker-1:
    command: uv run cosmic-ray worker cosmic.sqlite
  cosmic-runner:
    depends_on: [cosmic-worker-1, cosmic-worker-2]
    command: |
      uv run cosmic-ray init cosmic-ray.toml cosmic.sqlite
      uv run cosmic-ray exec cosmic-ray.toml cosmic.sqlite

Debugging/Challenges

Real-world Case: Survived Mutants in VideoSplitter.ts

The most interesting case was videoSplitter.ts, which handles video splitting. This file had over 95% line coverage, but Stryker revealed shocking results.

  • Problem Statement:
    A large number of mutants survived in the logic that checks available memory.
  // Original Code
  if (availableMemory < requiredMemory) {
    throw new Error("Insufficient memory.");
  }

Even when Stryker changed this code to if (false) or if (availableMemory <= requiredMemory), all existing tests PASSED.

  • Root Cause Analysis:
    Existing tests focused only on "whether an error occurs," missing boundary value tests for exactly which conditions trigger the error. In other words, coverage was high, but the actual logic wasn't being thoroughly verified.

  • Solution:
    To 'kill' the surviving mutants, we reinforced the test cases with boundary value analysis.

  test('Boundary value verification for memory', () => {
    // Simulate situations where memory is exactly equal to or slightly less than requiredMemory
    // ... reinforced test code ...
  });

Results

  • Achievements:

    • Discovered and removed 12 Survived Mutants in core utility modules.
    • Elevated test code from simply 'executing' code to truly 'verifying' it.
  • Key Metrics:

    • Mutation Score: Improved from an initial 62% to 88%.
    • Reliability: Prevented potential regression bugs by running test:mutation scripts before deployment.
  • User Feedback: Positive reactions from team members: "I can now refactor with confidence, trusting our tests."

Key Takeaways

  • Coverage is just the beginning: Line coverage only tells you 'what is not tested,' not the 'quality of what is tested.'
  • Mutation testing is expensive but worth it: Although it takes time (up to tens of minutes for full execution), it's essential for core business logic or complex utilities.
  • Incremental Adoption: Rather than applying it to all code, it's important to build success stories by starting with core infrastructure code like VideoSplitter.

After completion, ensure the following checklist is met:

Verification Checklist

  • [x] Overview: Are the goals and scope clear?
  • [x] Implementation: Are the tech stack and specific code examples included?
  • [x] Debugging: Is there at least one specific problem and its solution process?
  • [x] Results: Are there numerical data or performance indicators?
  • [x] Key Takeaways: Are the lessons learned and future plans clear?

Length Guidelines

  • [x] Overall: 400-800 lines (currently ~100 lines - can be expanded if needed)
  • [x] Each section: Minimum 50 lines (if possible)
  • [x] Code examples: 2-3 examples included
Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

ReactJS ~React Server Components~

Related Posts