Software

6 minute read

Pre-fork due diligence for OSS contributors

May 12, 2026

Note: This article was researched and drafted with AI assistance (Claude Sonnet 4.6 via Claude Code). All claims about specific repository policies are illustrative; readers should verify current state before acting on them.

Why you should scan a repo before you fork it

You found an issue. You know exactly how to fix it. You fork the repo, write the code, open a pull request — and it gets closed in minutes, not by a human, but by an automated workflow you never knew existed. No review. No feedback. Just a bot verdict and a “wasted” label.

This scenario has become noticeably more common in 2025 and 2026. A growing number of open-source maintainers have responded to the flood of low-quality, AI-generated contributions by deploying automated trust-gate systems directly in their CI pipelines. These gates can reject a PR silently — or with a curt machine-generated comment — based on signals that have nothing to do with whether your code is correct. They evaluate who contributed and how, not just what was contributed.

The cost is asymmetric. A maintainer’s automated rejection takes milliseconds. The contributor’s lost time — understanding the codebase, writing tests, drafting a good PR description — might be hours or days. Pre-fork due diligence costs five minutes. Doing it consistently is one of the highest-leverage habits an AI-assisted contributor can develop in 2026.

Common rejection vectors

Automated trust-gate workflows

The most aggressive rejection mechanism is a CI workflow that evaluates the contributor’s account history before it evaluates the code. These tools look at signals like global merge ratio (how many of your past PRs across all of GitHub were merged versus closed), account age, and contribution velocity. If your profile doesn’t meet the threshold, the workflow closes the PR automatically and may apply a label like suspicious-author or spam-likely.

These workflows are usually small GitHub Actions that run on pull_request events. They’re often invisible from the repo’s front page — you have to look inside .github/workflows/ to find them. Common identifiers include step names or action references containing strings like trust-score, min-global-merge-ratio, or references to community-maintained anti-spam action collections. A new GitHub account used primarily for AI-assisted contribution is exactly the profile these tools are tuned to catch.

Anti-slop quality-gate workflows

A second category focuses on content quality rather than account history. These workflows look for statistical signals associated with machine-generated text — unusual vocabulary distributions, patterns common in LLM output, or structural anti-patterns in commit messages and PR descriptions. The term “slop” has become shorthand for this class of low-effort generated content in OSS communities. Workflows in this family typically reference action names or step IDs containing anti-slop or similar identifiers.

It is worth noting that a well-crafted, human-reviewed AI-assisted contribution can pass these checks — but only if the contributor has actually read and understood the code before submitting. Blind “generate and submit” workflows are what these gates are designed to block.

Explicit AI bans in contribution documentation

The third category is simpler to detect but easier to overlook: written policy. Many maintainers have added explicit clauses to CONTRIBUTING.md, PR templates, or even the main README.md stating that AI-generated or AI-assisted contributions are not accepted. Language varies:

"AI tools are not permitted"
"no AI" / "ban AI" / "prohibit AI"
"LLM not allowed" / "Copilot is not allowed"
"all submissions must be human-written"
"human-authored contributions only"

Some policies stop short of an outright ban but require disclosure: "disclose AI" or "AI disclosure required". These MEDIUM-severity signals are worth reading carefully — a disclosure requirement is very different from a ban, but missing it can still get your PR closed.

Rejection-signal labels

Finally, some repos attach labels that serve as a public ledger of past rejections. Labels like no-ai, ai-rejected, human-only, ai-ban, and ai-generated-rejected are visible on closed PRs and on the label list itself. A repo with fifty closed PRs all tagged ai-generated-rejected is telling you something important about maintainer tolerance, regardless of what the written policy says.

The manual scan workflow

You can run a quick scan by hand using the GitHub CLI (gh). The following three commands cover the main surface areas.

Step 1 — Check workflow files for trust-gate patterns:

# List all workflow file names, then inspect suspicious ones
gh api repos///contents/.github/workflows 
  --jq '.[].name'

# Fetch the content of a specific workflow and grep for known patterns
gh api repos///contents/.github/workflows/pr-check.yml 
  --jq '.content' | base64 -d | 
  grep -iE 'trust-score|anti-slop|min-global-merge-ratio|fossier'

Step 2 — Scan CONTRIBUTING.md for policy language:

# Fetch CONTRIBUTING.md and search for AI-related policy keywords
gh api repos///contents/CONTRIBUTING.md 
  --jq '.content' | base64 -d | 
  grep -iE 'no.?ai|ai.is.not.allowed|ai.tools|human.authored|human.written|llm.not.allowed|disclose.ai|ban.ai|prohibit.ai|reject.ai'

Step 3 — Inspect repository labels:

# List all labels; look for rejection-signal names
gh label list --repo / | 
  grep -iE 'no-ai|ai-rejected|human-only|ai-ban|ai-generated'

Running all three before you fork gives you a solid picture in under a minute. The limitation is that you have to remember to do it, and you need to know what patterns to look for. That’s the gap the tool below is designed to close.

Automating with gh-pr-trust-scan

gh-pr-trust-scan is a small Python CLI that wraps the three manual steps above into a single command, applies a curated set of detection patterns, and produces a machine-readable verdict. It was built specifically to answer one question: “Will this project reject my AI-assisted PR on policy grounds before anyone looks at the code?”

Installing the tool

# Recommended: isolated environment via pipx
pipx install gh-pr-trust-scan

# Or with pip
pip install gh-pr-trust-scan

Note: The package is not yet published to PyPI (coming soon). During the development period, install from source:
git clone https://github.com/taiman724/gh-pr-trust-scan
cd gh-pr-trust-scan
pip install -e ".[dev]"

Requirements: Python 3.10+ and the GitHub CLI (gh) authenticated via gh auth login.

Running a scan

# Basic scan — prints a human-readable verdict
gh-pr-trust-scan owner/repo

# Full GitHub URL also works
gh-pr-trust-scan https://github.com/owner/repo

# JSON output for scripting or CI integration
gh-pr-trust-scan owner/repo --json

The tool produces one of three verdicts:

Verdict	When it fires
`SAFE`	No explicit AI contribution policy detected (all findings LOW or none)
`WARN`	Discouraging policy language or rejection labels found, but no automated gate
`AVOID`	At least one HIGH-severity finding — an automated rejection gate is present

A SAFE verdict on a repo with an actively maintained codebase and no policy signals is a reasonable green light. A WARN verdict calls for reading the actual CONTRIBUTING.md carefully before investing time. An AVOID verdict means a bot will likely close your PR before a human sees it.

Here is what the output looks like for a repo with multiple signals:

Scanning example-org/example-repo ...

Repo:    example-org/example-repo
Verdict: AVOID  (trust-gate detected)

Findings:
  [HIGH  ] Trust-score gate detected in workflow (.github/workflows/pr-review.yml)
  [MEDIUM] 'human-written' requirement found (line 18): All submissions must be human-written. (CONTRIBUTING.md)
  [MEDIUM] Label 'human-only' found

Stats:
  Last commit: 1 day ago
  Open PRs: 23
  Closed-no-merge PRs (last 30): 9

And the equivalent JSON, useful for scripting:

{
  "repo": "example-org/example-repo",
  "verdict": "AVOID",
  "findings": [
    {
      "severity": "HIGH",
      "category": "trust_gate",
      "evidence": "trust-score gate detected in workflow",
      "file": ".github/workflows/pr-review.yml"
    },
    {
      "severity": "MEDIUM",
      "category": "human_only_requirement",
      "evidence": "'human-written' requirement found (line 18)",
      "file": "CONTRIBUTING.md"
    },
    {
      "severity": "MEDIUM",
      "category": "label",
      "evidence": "Label 'human-only' found",
      "file": "labels"
    }
  ],
  "stats": {
    "last_commit": "1 day ago",
    "open_prs": 23,
    "closed_no_merge_last_30d": 9,
    "flagged_closed_prs": 0
  }
}

Adding custom patterns

All detection keywords live in a single file: src/gh_pr_trust_scan/patterns.py. Adding a new trust-gate or policy phrase requires no changes to the scanner logic — just append an entry to the appropriate list:

# patterns.py — adding a custom workflow pattern
WORKFLOW_PATTERNS.append({
    "pattern": r"my-org/custom-trust-gate-action",
    "severity": "HIGH",
    "category": "trust_gate",
    "description": "Custom trust gate action detected",
})

# Adding a new text-file pattern (e.g. a new policy phrase)
TEXT_PATTERNS_HIGH.append({
    "pattern": r"bnos+generateds+codeb",
    "severity": "HIGH",
    "category": "ai_ban_explicit",
    "description": "'no generated code' policy found",
})

The pattern values are Python regexes compiled case-insensitively, so you can handle variations with standard regex syntax. The community is especially interested in patterns for emerging tools and newly observed policy phrases — if you encounter a rejection mechanism that the tool misses, a PR adding the pattern is a concise and high-value contribution.

Closing thoughts

gh-pr-trust-scan is a static signal detector. It catches what is written down and what is visible in the repository’s public API. It cannot tell you whether a maintainer will appreciate your change, whether the project is actively maintained, or whether your implementation approach aligns with the project’s unstated conventions. Those questions still require reading the repo: scanning open issues, reviewing recent merged PRs, and — when in doubt — opening an issue to discuss before writing code.

The broader advice stands regardless of what tools you use: invest a few minutes of research before you invest hours of implementation. OSS contribution policies are increasingly explicit and machine-enforced. Treating due diligence as part of your workflow, rather than an afterthought, is what separates PRs that get merged from PRs that get closed by bots.

Contributions to gh-pr-trust-scan are welcome. The highest-value PRs are new detection patterns for trust-gate tools or policy language not yet covered. If you encounter a rejection signal that the tool misses, please open an issue first — especially for patterns that touch specific third-party tools, where context matters.

This article was researched and drafted with AI assistance (Claude Sonnet 4.6 via Claude Code). Pattern data and tool behavior are based on the gh-pr-trust-scan codebase as of May 2026. Repository policies change — always verify current CONTRIBUTING.md content before acting on a scan result.

Hugging Face hosted malicious software masquerading as OpenAI release

May 12, 2026

Product Management

Where do product marketing leaders go next?

May 12, 2026

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Machinery Orders Continue Rally in Q12026 Despite Uncertainty From Iran War

Where do product marketing leaders go next?

Pre-fork due diligence for OSS contributors

Trending Tags

Pre-fork due diligence for OSS contributors

Why you should scan a repo before you fork it

Common rejection vectors

Automated trust-gate workflows

Anti-slop quality-gate workflows

Explicit AI bans in contribution documentation

Rejection-signal labels

The manual scan workflow

Automating with gh-pr-trust-scan

Installing the tool

Running a scan

Adding custom patterns

Closing thoughts

Leave a Reply Cancel reply

Previous Post

Hugging Face hosted malicious software masquerading as OpenAI release

Next Post

Where do product marketing leaders go next?

Pre-fork due diligence for OSS contributors

Why you should scan a repo before you fork it

Common rejection vectors

Automated trust-gate workflows

Anti-slop quality-gate workflows

Explicit AI bans in contribution documentation

Rejection-signal labels

The manual scan workflow

Automating with gh-pr-trust-scan

Installing the tool

Running a scan

Adding custom patterns

Closing thoughts

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts