[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"skill-NousResearch-hermes-agent-optional-skills-security-oss-forensics":3},{"error":4,"detail":5,"metadata":65,"markdownContent":68,"rawMarkdown":62},false,{"repo_full_name":6,"owner":7,"repo_name":8,"repo_forks":9,"skill_path":10,"repo_stars":11,"name":12,"category_id":13,"description":14,"file_tree":15,"skill_md_content":62,"skill_id":63,"skill_key":64},"NousResearch/hermes-agent","NousResearch","hermes-agent",23195,"optional-skills/security/oss-forensics",147649,"oss-forensics",9,"Supply chain investigation, evidence recovery, and forensic analysis for GitHub repositories.\nCovers deleted commit recovery, force-push detection, IOC extraction, multi-source evidence\ncollection, hypothesis formation/validation, and structured forensic reporting.\nInspired by RAPTOR's 1800+ line OSS Forensics system.",[16,21,42,50],{"name":17,"path":18,"size":19,"type":20},"SKILL.md","optional-skills/security/oss-forensics/SKILL.md",19960,"file",{"name":22,"path":23,"type":24,"children":25},"references","optional-skills/security/oss-forensics/references","folder",[26,30,34,38],{"name":27,"path":28,"size":29,"type":20},"evidence-types.md","optional-skills/security/oss-forensics/references/evidence-types.md",5113,{"name":31,"path":32,"size":33,"type":20},"github-archive-guide.md","optional-skills/security/oss-forensics/references/github-archive-guide.md",5996,{"name":35,"path":36,"size":37,"type":20},"investigation-templates.md","optional-skills/security/oss-forensics/references/investigation-templates.md",5553,{"name":39,"path":40,"size":41,"type":20},"recovery-techniques.md","optional-skills/security/oss-forensics/references/recovery-techniques.md",5767,{"name":43,"path":44,"type":24,"children":45},"scripts","optional-skills/security/oss-forensics/scripts",[46],{"name":47,"path":48,"size":49,"type":20},"evidence-store.py","optional-skills/security/oss-forensics/scripts/evidence-store.py",12108,{"name":51,"path":52,"type":24,"children":53},"templates","optional-skills/security/oss-forensics/templates",[54,58],{"name":55,"path":56,"size":57,"type":20},"forensic-report.md","optional-skills/security/oss-forensics/templates/forensic-report.md",4826,{"name":59,"path":60,"size":61,"type":20},"malicious-package-report.md","optional-skills/security/oss-forensics/templates/malicious-package-report.md",1357,"---\nname: oss-forensics\ndescription: |\n  Supply chain investigation, evidence recovery, and forensic analysis for GitHub repositories.\n  Covers deleted commit recovery, force-push detection, IOC extraction, multi-source evidence\n  collection, hypothesis formation/validation, and structured forensic reporting.\n  Inspired by RAPTOR's 1800+ line OSS Forensics system.\nplatforms: [linux, macos, windows]\ncategory: security\ntriggers:\n  - \"investigate this repository\"\n  - \"investigate [owner/repo]\"\n  - \"check for supply chain compromise\"\n  - \"recover deleted commits\"\n  - \"forensic analysis of [owner/repo]\"\n  - \"was this repo compromised\"\n  - \"supply chain attack\"\n  - \"suspicious commit\"\n  - \"force push detected\"\n  - \"IOC extraction\"\ntoolsets:\n  - terminal\n  - web\n  - file\n  - delegation\n---\n\n# OSS Security Forensics Skill\n\nA 7-phase multi-agent investigation framework for researching open-source supply chain attacks.\nAdapted from RAPTOR's forensics system. Covers GitHub Archive, Wayback Machine, GitHub API,\nlocal git analysis, IOC extraction, evidence-backed hypothesis formation and validation,\nand final forensic report generation.\n\n---\n\n## ⚠️ Anti-Hallucination Guardrails\n\nRead these before every investigation step. Violating them invalidates the report.\n\n1. **Evidence-First Rule**: Every claim in any report, hypothesis, or summary MUST cite at least one evidence ID (`EV-XXXX`). Assertions without citations are forbidden.\n2. **STAY IN YOUR LANE**: Each sub-agent (investigator) has a single data source. Do NOT mix sources. The GH Archive investigator does not query the GitHub API, and vice versa. Role boundaries are hard.\n3. **Fact vs. Hypothesis Separation**: Mark all unverified inferences with `[HYPOTHESIS]`. Only statements verified against original sources may be stated as facts.\n4. **No Evidence Fabrication**: The hypothesis validator MUST mechanically check that every cited evidence ID actually exists in the evidence store before accepting a hypothesis.\n5. **Proof-Required Disproval**: A hypothesis cannot be dismissed without a specific, evidence-backed counter-argument. \"No evidence found\" is not sufficient to disprove—it only makes a hypothesis inconclusive.\n6. **SHA/URL Double-Verification**: Any commit SHA, URL, or external identifier cited as evidence must be independently confirmed from at least two sources before being marked as verified.\n7. **Suspicious Code Rule**: Never run code found inside the investigated repository locally. Analyze statically only, or use `execute_code` in a sandboxed environment.\n8. **Secret Redaction**: Any API keys, tokens, or credentials discovered during investigation must be redacted in the final report. Log them internally only.\n\n---\n\n## Example Scenarios\n\n- **Scenario A: Dependency Confusion**: A malicious package `internal-lib-v2` is uploaded to NPM with a higher version than the internal one. The investigator must track when this package was first seen and if any PushEvents in the target repo updated `package.json` to this version.\n- **Scenario B: Maintainer Takeover**: A long-term contributor's account is used to push a backdoored `.github/workflows/build.yml`. The investigator looks for PushEvents from this user after a long period of inactivity or from a new IP/location (if detectable via BigQuery).\n- **Scenario C: Force-Push Hide**: A developer accidentally commits a production secret, then force-pushes to \"fix\" it. The investigator uses `git fsck` and GH Archive to recover the original commit SHA and verify what was leaked.\n\n---\n\n> **Path convention**: Throughout this skill, `SKILL_DIR` refers to the root of this skill's\n> installation directory (the folder containing this `SKILL.md`). When the skill is loaded,\n> resolve `SKILL_DIR` to the actual path — e.g. `~/.hermes/skills/security/oss-forensics/`\n> or the `optional-skills/` equivalent. All script and template references are relative to it.\n\n## Phase 0: Initialization\n\n1. Create investigation working directory:\n   ```bash\n   mkdir investigation_$(echo \"REPO_NAME\" | tr '/' '_')\n   cd investigation_$(echo \"REPO_NAME\" | tr '/' '_')\n   ```\n2. Initialize the evidence store:\n   ```bash\n   python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json list\n   ```\n3. Copy the forensic report template:\n   ```bash\n   cp SKILL_DIR/templates/forensic-report.md ./investigation-report.md\n   ```\n4. Create an `iocs.md` file to track Indicators of Compromise as they are discovered.\n5. Record the investigation start time, target repository, and stated investigation goal.\n\n---\n\n## Phase 1: Prompt Parsing and IOC Extraction\n\n**Goal**: Extract all structured investigative targets from the user's request.\n\n**Actions**:\n- Parse the user prompt and extract:\n  - Target repository (`owner/repo`)\n  - Target actors (GitHub handles, email addresses)\n  - Time window of interest (commit date ranges, PR timestamps)\n  - Provided Indicators of Compromise: commit SHAs, file paths, package names, IP addresses, domains, API keys/tokens, malicious URLs\n  - Any linked vendor security reports or blog posts\n\n**Tools**: Reasoning only, or `execute_code` for regex extraction from large text blocks.\n\n**Output**: Populate `iocs.md` with extracted IOCs. Each IOC must have:\n- Type (from: COMMIT_SHA, FILE_PATH, API_KEY, SECRET, IP_ADDRESS, DOMAIN, PACKAGE_NAME, ACTOR_USERNAME, MALICIOUS_URL, OTHER)\n- Value\n- Source (user-provided, inferred)\n\n**Reference**: See [evidence-types.md](./references/evidence-types.md) for IOC taxonomy.\n\n---\n\n## Phase 2: Parallel Evidence Collection\n\nSpawn up to 5 specialist investigator sub-agents using `delegate_task` (batch mode, max 3 concurrent). Each investigator has a **single data source** and must not mix sources.\n\n> **Orchestrator note**: Pass the IOC list from Phase 1 and the investigation time window in the `context` field of each delegated task.\n\n---\n\n### Investigator 1: Local Git Investigator\n\n**ROLE BOUNDARY**: You query the LOCAL GIT REPOSITORY ONLY. Do not call any external APIs.\n\n**Actions**:\n```bash\n# Clone repository\ngit clone https://github.com/OWNER/REPO.git target_repo && cd target_repo\n\n# Full commit log with stats\ngit log --all --full-history --stat --format=\"%H|%ae|%an|%ai|%s\" > ../git_log.txt\n\n# Detect force-push evidence (orphaned/dangling commits)\ngit fsck --lost-found --unreachable 2>&1 | grep commit > ../dangling_commits.txt\n\n# Check reflog for rewritten history\ngit reflog --all > ../reflog.txt\n\n# List ALL branches including deleted remote refs\ngit branch -a -v > ../branches.txt\n\n# Find suspicious large binary additions\ngit log --all --diff-filter=A --name-only --format=\"%H %ai\" -- \"*.so\" \"*.dll\" \"*.exe\" \"*.bin\" > ../binary_additions.txt\n\n# Check for GPG signature anomalies\ngit log --show-signature --format=\"%H %ai %aN\" > ../signature_check.txt 2>&1\n```\n\n**Evidence to collect** (add via `python3 SKILL_DIR/scripts/evidence-store.py add`):\n- Each dangling commit SHA → type: `git`\n- Force-push evidence (reflog showing history rewrite) → type: `git`\n- Unsigned commits from verified contributors → type: `git`\n- Suspicious binary file additions → type: `git`\n\n**Reference**: See [recovery-techniques.md](./references/recovery-techniques.md) for accessing force-pushed commits.\n\n---\n\n### Investigator 2: GitHub API Investigator\n\n**ROLE BOUNDARY**: You query the GITHUB REST API ONLY. Do not run git commands locally.\n\n**Actions**:\n```bash\n# Commits (paginated)\ncurl -s \"https://api.github.com/repos/OWNER/REPO/commits?per_page=100\" > api_commits.json\n\n# Pull Requests including closed/deleted\ncurl -s \"https://api.github.com/repos/OWNER/REPO/pulls?state=all&per_page=100\" > api_prs.json\n\n# Issues\ncurl -s \"https://api.github.com/repos/OWNER/REPO/issues?state=all&per_page=100\" > api_issues.json\n\n# Contributors and collaborator changes\ncurl -s \"https://api.github.com/repos/OWNER/REPO/contributors\" > api_contributors.json\n\n# Repository events (last 300)\ncurl -s \"https://api.github.com/repos/OWNER/REPO/events?per_page=100\" > api_events.json\n\n# Check specific suspicious commit SHA details\ncurl -s \"https://api.github.com/repos/OWNER/REPO/git/commits/SHA\" > commit_detail.json\n\n# Releases\ncurl -s \"https://api.github.com/repos/OWNER/REPO/releases?per_page=100\" > api_releases.json\n\n# Check if a specific commit exists (force-pushed commits may 404 on commits/ but succeed on git/commits/)\ncurl -s \"https://api.github.com/repos/OWNER/REPO/commits/SHA\" | jq .sha\n```\n\n**Cross-reference targets** (flag discrepancies as evidence):\n- PR exists in archive but missing from API → evidence of deletion\n- Contributor in archive events but not in contributors list → evidence of permission revocation\n- Commit in archive PushEvents but not in API commit list → evidence of force-push/deletion\n\n**Reference**: See [evidence-types.md](./references/evidence-types.md) for GH event types.\n\n---\n\n### Investigator 3: Wayback Machine Investigator\n\n**ROLE BOUNDARY**: You query the WAYBACK MACHINE CDX API ONLY. Do not use the GitHub API.\n\n**Goal**: Recover deleted GitHub pages (READMEs, issues, PRs, releases, wiki pages).\n\n**Actions**:\n```bash\n# Search for archived snapshots of the repo main page\ncurl -s \"https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO&output=json&limit=100&from=YYYYMMDD&to=YYYYMMDD\" > wayback_main.json\n\n# Search for a specific deleted issue\ncurl -s \"https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/issues/NUM&output=json&limit=50\" > wayback_issue_NUM.json\n\n# Search for a specific deleted PR\ncurl -s \"https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/pull/NUM&output=json&limit=50\" > wayback_pr_NUM.json\n\n# Fetch the best snapshot of a page\n# Use the Wayback Machine URL: https://web.archive.org/web/TIMESTAMP/ORIGINAL_URL\n# Example: https://web.archive.org/web/20240101000000*/github.com/OWNER/REPO\n\n# Advanced: Search for deleted releases/tags\ncurl -s \"https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/releases/tag/*&output=json\" > wayback_tags.json\n\n# Advanced: Search for historical wiki changes\ncurl -s \"https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/wiki/*&output=json\" > wayback_wiki.json\n```\n\n**Evidence to collect**:\n- Archived snapshots of deleted issues/PRs with their content\n- Historical README versions showing changes\n- Evidence of content present in archive but missing from current GitHub state\n\n**Reference**: See [github-archive-guide.md](./references/github-archive-guide.md) for CDX API parameters.\n\n---\n\n### Investigator 4: GH Archive / BigQuery Investigator\n\n**ROLE BOUNDARY**: You query GITHUB ARCHIVE via BIGQUERY ONLY. This is a tamper-proof record of all public GitHub events.\n\n> **Prerequisites**: Requires Google Cloud credentials with BigQuery access (`gcloud auth application-default login`). If unavailable, skip this investigator and note it in the report.\n\n**Cost Optimization Rules** (MANDATORY):\n1. ALWAYS run a `--dry_run` before every query to estimate cost.\n2. Use `_TABLE_SUFFIX` to filter by date range and minimize scanned data.\n3. Only SELECT the columns you need.\n4. Add a LIMIT unless aggregating.\n\n```bash\n# Template: safe BigQuery query for PushEvents to OWNER/REPO\nbq query --use_legacy_sql=false --dry_run \"\nSELECT created_at, actor.login, payload.commits, payload.before, payload.head,\n       payload.size, payload.distinct_size\nFROM \\`githubarchive.month.*\\`\nWHERE _TABLE_SUFFIX BETWEEN 'YYYYMM' AND 'YYYYMM'\n  AND type = 'PushEvent'\n  AND repo.name = 'OWNER/REPO'\nLIMIT 1000\n\"\n# If cost is acceptable, re-run without --dry_run\n\n# Detect force-pushes: zero-distinct_size PushEvents mean commits were force-erased\n# payload.distinct_size = 0 AND payload.size > 0 → force push indicator\n\n# Check for deleted branch events\nbq query --use_legacy_sql=false \"\nSELECT created_at, actor.login, payload.ref, payload.ref_type\nFROM \\`githubarchive.month.*\\`\nWHERE _TABLE_SUFFIX BETWEEN 'YYYYMM' AND 'YYYYMM'\n  AND type = 'DeleteEvent'\n  AND repo.name = 'OWNER/REPO'\nLIMIT 200\n\"\n```\n\n**Evidence to collect**:\n- Force-push events (payload.size > 0, payload.distinct_size = 0)\n- DeleteEvents for branches/tags\n- WorkflowRunEvents for suspicious CI/CD automation\n- PushEvents that precede a \"gap\" in the git log (evidence of rewrite)\n\n**Reference**: See [github-archive-guide.md](./references/github-archive-guide.md) for all 12 event types and query patterns.\n\n---\n\n### Investigator 5: IOC Enrichment Investigator\n\n**ROLE BOUNDARY**: You enrich EXISTING IOCs from Phase 1 using passive public sources ONLY. Do not execute any code from the target repository.\n\n**Actions**:\n- For each commit SHA: attempt recovery via direct GitHub URL (`github.com/OWNER/REPO/commit/SHA.patch`)\n- For each domain/IP: check passive DNS, WHOIS records (via `web_extract` on public WHOIS services)\n- For each package name: check npm/PyPI for matching malicious package reports\n- For each actor username: check GitHub profile, contribution history, account age\n- Recover force-pushed commits using 3 methods (see [recovery-techniques.md](./references/recovery-techniques.md))\n\n---\n\n## Phase 3: Evidence Consolidation\n\nAfter all investigators complete:\n\n1. Run `python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json list` to see all collected evidence.\n2. For each piece of evidence, verify the `content_sha256` hash matches the original source.\n3. Group evidence by:\n   - **Timeline**: Sort all timestamped evidence chronologically\n   - **Actor**: Group by GitHub handle or email\n   - **IOC**: Link evidence to the IOC it relates to\n4. Identify **discrepancies**: items present in one source but absent in another (key deletion indicators).\n5. Flag evidence as `[VERIFIED]` (confirmed from 2+ independent sources) or `[UNVERIFIED]` (single source only).\n\n---\n\n## Phase 4: Hypothesis Formation\n\nA hypothesis must:\n- State a specific claim (e.g., \"Actor X force-pushed to BRANCH on DATE to erase commit SHA\")\n- Cite at least 2 evidence IDs that support it (`EV-XXXX`, `EV-YYYY`)\n- Identify what evidence would disprove it\n- Be labeled `[HYPOTHESIS]` until validated\n\n**Common hypothesis templates** (see [investigation-templates.md](./references/investigation-templates.md)):\n- Maintainer Compromise: legitimate account used post-takeover to inject malicious code\n- Dependency Confusion: package name squatting to intercept installs\n- CI/CD Injection: malicious workflow changes to run code during builds\n- Typosquatting: near-identical package name targeting misspellers\n- Credential Leak: token/key accidentally committed then force-pushed to erase\n\nFor each hypothesis, spawn a `delegate_task` sub-agent to attempt to find disconfirming evidence before confirming.\n\n---\n\n## Phase 5: Hypothesis Validation\n\nThe validator sub-agent MUST mechanically check:\n\n1. For each hypothesis, extract all cited evidence IDs.\n2. Verify each ID exists in `evidence.json` (hard failure if any ID is missing → hypothesis rejected as potentially fabricated).\n3. Verify each `[VERIFIED]` piece of evidence was confirmed from 2+ sources.\n4. Check logical consistency: does the timeline depicted by the evidence support the hypothesis?\n5. Check for alternative explanations: could the same evidence pattern arise from a benign cause?\n\n**Output**:\n- `VALIDATED`: All evidence cited, verified, logically consistent, no plausible alternative explanation.\n- `INCONCLUSIVE`: Evidence supports hypothesis but alternative explanations exist or evidence is insufficient.\n- `REJECTED`: Missing evidence IDs, unverified evidence cited as fact, logical inconsistency detected.\n\nRejected hypotheses feed back into Phase 4 for refinement (max 3 iterations).\n\n---\n\n## Phase 6: Final Report Generation\n\nPopulate `investigation-report.md` using the template in [forensic-report.md](./templates/forensic-report.md).\n\n**Mandatory sections**:\n- Executive Summary: one-paragraph verdict (Compromised / Clean / Inconclusive) with confidence level\n- Timeline: chronological reconstruction of all significant events with evidence citations\n- Validated Hypotheses: each with status and supporting evidence IDs\n- Evidence Registry: table of all `EV-XXXX` entries with source, type, and verification status\n- IOC List: all extracted and enriched Indicators of Compromise\n- Chain of Custody: how evidence was collected, from what sources, at what timestamps\n- Recommendations: immediate mitigations if compromise detected; monitoring recommendations\n\n**Report rules**:\n- Every factual claim must have at least one `[EV-XXXX]` citation\n- Executive Summary must state confidence level (High / Medium / Low)\n- All secrets/credentials must be redacted to `[REDACTED]`\n\n---\n\n## Phase 7: Completion\n\n1. Run final evidence count: `python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json list`\n2. Archive the full investigation directory.\n3. If compromise is confirmed:\n   - List immediate mitigations (rotate credentials, pin dependency hashes, notify affected users)\n   - Identify affected versions/packages\n   - Note disclosure obligations (if a public package: coordinate with the package registry)\n4. Present the final `investigation-report.md` to the user.\n\n---\n\n## Ethical Use Guidelines\n\nThis skill is designed for **defensive security investigation** — protecting open-source software from supply chain attacks. It must not be used for:\n\n- **Harassment or stalking** of contributors or maintainers\n- **Doxing** — correlating GitHub activity to real identities for malicious purposes\n- **Competitive intelligence** — investigating proprietary or internal repositories without authorization\n- **False accusations** — publishing investigation results without validated evidence (see anti-hallucination guardrails)\n\nInvestigations should be conducted with the principle of **minimal intrusion**: collect only the evidence necessary to validate or refute the hypothesis. When publishing results, follow responsible disclosure practices and coordinate with affected maintainers before public disclosure.\n\nIf the investigation reveals a genuine compromise, follow the coordinated vulnerability disclosure process:\n1. Notify the repository maintainers privately first\n2. Allow reasonable time for remediation (typically 90 days)\n3. Coordinate with package registries (npm, PyPI, etc.) if published packages are affected\n4. File a CVE if appropriate\n\n---\n\n## API Rate Limiting\n\nGitHub REST API enforces rate limits that will interrupt large investigations if not managed.\n\n**Authenticated requests**: 5,000/hour (requires `GITHUB_TOKEN` env var or `gh` CLI auth)\n**Unauthenticated requests**: 60/hour (unusable for investigations)\n\n**Best practices**:\n- Always authenticate: `export GITHUB_TOKEN=ghp_...` or use `gh` CLI (auto-authenticates)\n- Use conditional requests (`If-None-Match` / `If-Modified-Since` headers) to avoid consuming quota on unchanged data\n- For paginated endpoints, fetch all pages in sequence — don't parallelize against the same endpoint\n- Check `X-RateLimit-Remaining` header; if below 100, pause for `X-RateLimit-Reset` timestamp\n- BigQuery has its own quotas (10 TiB/day free tier) — always dry-run first\n- Wayback Machine CDX API: no formal rate limit, but be courteous (1-2 req/sec max)\n\nIf rate-limited mid-investigation, record the partial results in the evidence store and note the limitation in the report.\n\n---\n\n## Reference Materials\n\n- [github-archive-guide.md](./references/github-archive-guide.md) — BigQuery queries, CDX API, 12 event types\n- [evidence-types.md](./references/evidence-types.md) — IOC taxonomy, evidence source types, observation types\n- [recovery-techniques.md](./references/recovery-techniques.md) — Recovering deleted commits, PRs, issues\n- [investigation-templates.md](./references/investigation-templates.md) — Pre-built hypothesis templates per attack type\n- [evidence-store.py](./scripts/evidence-store.py) — CLI tool for managing the evidence JSON store\n- [forensic-report.md](./templates/forensic-report.md) — Structured report template\n","3eaf90d9-ef99-5bc1-9e71-3bdad461e97a","NousResearch-hermes-agent-optional-skills-security-oss-forensics",{"name":12,"description":14,"platforms":66,"category":67},"[linux, macos, windows]","security","\u003Ch1>OSS Security Forensics Skill\u003C/h1>\n\u003Cp>A 7-phase multi-agent investigation framework for researching open-source supply chain attacks.\nAdapted from RAPTOR&#39;s forensics system. Covers GitHub Archive, Wayback Machine, GitHub API,\nlocal git analysis, IOC extraction, evidence-backed hypothesis formation and validation,\nand final forensic report generation.\u003C/p>\n\u003Chr>\n\u003Ch2>⚠️ Anti-Hallucination Guardrails\u003C/h2>\n\u003Cp>Read these before every investigation step. Violating them invalidates the report.\u003C/p>\n\u003Col>\n\u003Cli>\u003Cstrong>Evidence-First Rule\u003C/strong>: Every claim in any report, hypothesis, or summary MUST cite at least one evidence ID (\u003Ccode>EV-XXXX\u003C/code>). Assertions without citations are forbidden.\u003C/li>\n\u003Cli>\u003Cstrong>STAY IN YOUR LANE\u003C/strong>: Each sub-agent (investigator) has a single data source. Do NOT mix sources. The GH Archive investigator does not query the GitHub API, and vice versa. Role boundaries are hard.\u003C/li>\n\u003Cli>\u003Cstrong>Fact vs. Hypothesis Separation\u003C/strong>: Mark all unverified inferences with \u003Ccode>[HYPOTHESIS]\u003C/code>. Only statements verified against original sources may be stated as facts.\u003C/li>\n\u003Cli>\u003Cstrong>No Evidence Fabrication\u003C/strong>: The hypothesis validator MUST mechanically check that every cited evidence ID actually exists in the evidence store before accepting a hypothesis.\u003C/li>\n\u003Cli>\u003Cstrong>Proof-Required Disproval\u003C/strong>: A hypothesis cannot be dismissed without a specific, evidence-backed counter-argument. &quot;No evidence found&quot; is not sufficient to disprove—it only makes a hypothesis inconclusive.\u003C/li>\n\u003Cli>\u003Cstrong>SHA/URL Double-Verification\u003C/strong>: Any commit SHA, URL, or external identifier cited as evidence must be independently confirmed from at least two sources before being marked as verified.\u003C/li>\n\u003Cli>\u003Cstrong>Suspicious Code Rule\u003C/strong>: Never run code found inside the investigated repository locally. Analyze statically only, or use \u003Ccode>execute_code\u003C/code> in a sandboxed environment.\u003C/li>\n\u003Cli>\u003Cstrong>Secret Redaction\u003C/strong>: Any API keys, tokens, or credentials discovered during investigation must be redacted in the final report. Log them internally only.\u003C/li>\n\u003C/ol>\n\u003Chr>\n\u003Ch2>Example Scenarios\u003C/h2>\n\u003Cul>\n\u003Cli>\u003Cstrong>Scenario A: Dependency Confusion\u003C/strong>: A malicious package \u003Ccode>internal-lib-v2\u003C/code> is uploaded to NPM with a higher version than the internal one. The investigator must track when this package was first seen and if any PushEvents in the target repo updated \u003Ccode>package.json\u003C/code> to this version.\u003C/li>\n\u003Cli>\u003Cstrong>Scenario B: Maintainer Takeover\u003C/strong>: A long-term contributor&#39;s account is used to push a backdoored \u003Ccode>.github/workflows/build.yml\u003C/code>. The investigator looks for PushEvents from this user after a long period of inactivity or from a new IP/location (if detectable via BigQuery).\u003C/li>\n\u003Cli>\u003Cstrong>Scenario C: Force-Push Hide\u003C/strong>: A developer accidentally commits a production secret, then force-pushes to &quot;fix&quot; it. The investigator uses \u003Ccode>git fsck\u003C/code> and GH Archive to recover the original commit SHA and verify what was leaked.\u003C/li>\n\u003C/ul>\n\u003Chr>\n\u003Cblockquote>\n\u003Cp>\u003Cstrong>Path convention\u003C/strong>: Throughout this skill, \u003Ccode>SKILL_DIR\u003C/code> refers to the root of this skill&#39;s\ninstallation directory (the folder containing this \u003Ccode>SKILL.md\u003C/code>). When the skill is loaded,\nresolve \u003Ccode>SKILL_DIR\u003C/code> to the actual path — e.g. \u003Ccode>~/.hermes/skills/security/oss-forensics/\u003C/code>\nor the \u003Ccode>optional-skills/\u003C/code> equivalent. All script and template references are relative to it.\u003C/p>\n\u003C/blockquote>\n\u003Ch2>Phase 0: Initialization\u003C/h2>\n\u003Col>\n\u003Cli>Create investigation working directory:\u003Cdiv class=\"md-code-block\">\u003Cdiv class=\"md-code-lang\">bash\u003C/div>\u003Cpre>\u003Ccode class=\"hljs language-bash\">\u003Cspan class=\"hljs-built_in\">mkdir\u003C/span> investigation_$(\u003Cspan class=\"hljs-built_in\">echo\u003C/span> \u003Cspan class=\"hljs-string\">&quot;REPO_NAME&quot;\u003C/span> | \u003Cspan class=\"hljs-built_in\">tr\u003C/span> \u003Cspan class=\"hljs-string\">&#x27;/&#x27;\u003C/span> \u003Cspan class=\"hljs-string\">&#x27;_&#x27;\u003C/span>)\n\u003Cspan class=\"hljs-built_in\">cd\u003C/span> investigation_$(\u003Cspan class=\"hljs-built_in\">echo\u003C/span> \u003Cspan class=\"hljs-string\">&quot;REPO_NAME&quot;\u003C/span> | \u003Cspan class=\"hljs-built_in\">tr\u003C/span> \u003Cspan class=\"hljs-string\">&#x27;/&#x27;\u003C/span> \u003Cspan class=\"hljs-string\">&#x27;_&#x27;\u003C/span>)\u003C/code>\u003C/pre>\u003C/div>\u003C/li>\n\u003Cli>Initialize the evidence store:\u003Cdiv class=\"md-code-block\">\u003Cdiv class=\"md-code-lang\">bash\u003C/div>\u003Cpre>\u003Ccode class=\"hljs language-bash\">python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json list\u003C/code>\u003C/pre>\u003C/div>\u003C/li>\n\u003Cli>Copy the forensic report template:\u003Cdiv class=\"md-code-block\">\u003Cdiv class=\"md-code-lang\">bash\u003C/div>\u003Cpre>\u003Ccode class=\"hljs language-bash\">\u003Cspan class=\"hljs-built_in\">cp\u003C/span> SKILL_DIR/templates/forensic-report.md ./investigation-report.md\u003C/code>\u003C/pre>\u003C/div>\u003C/li>\n\u003Cli>Create an \u003Ccode>iocs.md\u003C/code> file to track Indicators of Compromise as they are discovered.\u003C/li>\n\u003Cli>Record the investigation start time, target repository, and stated investigation goal.\u003C/li>\n\u003C/ol>\n\u003Chr>\n\u003Ch2>Phase 1: Prompt Parsing and IOC Extraction\u003C/h2>\n\u003Cp>\u003Cstrong>Goal\u003C/strong>: Extract all structured investigative targets from the user&#39;s request.\u003C/p>\n\u003Cp>\u003Cstrong>Actions\u003C/strong>:\u003C/p>\n\u003Cul>\n\u003Cli>Parse the user prompt and extract:\u003Cul>\n\u003Cli>Target repository (\u003Ccode>owner/repo\u003C/code>)\u003C/li>\n\u003Cli>Target actors (GitHub handles, email addresses)\u003C/li>\n\u003Cli>Time window of interest (commit date ranges, PR timestamps)\u003C/li>\n\u003Cli>Provided Indicators of Compromise: commit SHAs, file paths, package names, IP addresses, domains, API keys/tokens, malicious URLs\u003C/li>\n\u003Cli>Any linked vendor security reports or blog posts\u003C/li>\n\u003C/ul>\n\u003C/li>\n\u003C/ul>\n\u003Cp>\u003Cstrong>Tools\u003C/strong>: Reasoning only, or \u003Ccode>execute_code\u003C/code> for regex extraction from large text blocks.\u003C/p>\n\u003Cp>\u003Cstrong>Output\u003C/strong>: Populate \u003Ccode>iocs.md\u003C/code> with extracted IOCs. Each IOC must have:\u003C/p>\n\u003Cul>\n\u003Cli>Type (from: COMMIT_SHA, FILE_PATH, API_KEY, SECRET, IP_ADDRESS, DOMAIN, PACKAGE_NAME, ACTOR_USERNAME, MALICIOUS_URL, OTHER)\u003C/li>\n\u003Cli>Value\u003C/li>\n\u003Cli>Source (user-provided, inferred)\u003C/li>\n\u003C/ul>\n\u003Cp>\u003Cstrong>Reference\u003C/strong>: See \u003Ca href=\"./references/evidence-types.md\">evidence-types.md\u003C/a> for IOC taxonomy.\u003C/p>\n\u003Chr>\n\u003Ch2>Phase 2: Parallel Evidence Collection\u003C/h2>\n\u003Cp>Spawn up to 5 specialist investigator sub-agents using \u003Ccode>delegate_task\u003C/code> (batch mode, max 3 concurrent). Each investigator has a \u003Cstrong>single data source\u003C/strong> and must not mix sources.\u003C/p>\n\u003Cblockquote>\n\u003Cp>\u003Cstrong>Orchestrator note\u003C/strong>: Pass the IOC list from Phase 1 and the investigation time window in the \u003Ccode>context\u003C/code> field of each delegated task.\u003C/p>\n\u003C/blockquote>\n\u003Chr>\n\u003Ch3>Investigator 1: Local Git Investigator\u003C/h3>\n\u003Cp>\u003Cstrong>ROLE BOUNDARY\u003C/strong>: You query the LOCAL GIT REPOSITORY ONLY. Do not call any external APIs.\u003C/p>\n\u003Cp>\u003Cstrong>Actions\u003C/strong>:\u003C/p>\n\u003Cdiv class=\"md-code-block\">\u003Cdiv class=\"md-code-lang\">bash\u003C/div>\u003Cpre>\u003Ccode class=\"hljs language-bash\">\u003Cspan class=\"hljs-comment\"># Clone repository\u003C/span>\ngit \u003Cspan class=\"hljs-built_in\">clone\u003C/span> https://github.com/OWNER/REPO.git target_repo &amp;&amp; \u003Cspan class=\"hljs-built_in\">cd\u003C/span> target_repo\n\n\u003Cspan class=\"hljs-comment\"># Full commit log with stats\u003C/span>\ngit \u003Cspan class=\"hljs-built_in\">log\u003C/span> --all --full-history --\u003Cspan class=\"hljs-built_in\">stat\u003C/span> --format=\u003Cspan class=\"hljs-string\">&quot;%H|%ae|%an|%ai|%s&quot;\u003C/span> &gt; ../git_log.txt\n\n\u003Cspan class=\"hljs-comment\"># Detect force-push evidence (orphaned/dangling commits)\u003C/span>\ngit fsck --lost-found --unreachable 2&gt;&amp;1 | grep commit &gt; ../dangling_commits.txt\n\n\u003Cspan class=\"hljs-comment\"># Check reflog for rewritten history\u003C/span>\ngit reflog --all &gt; ../reflog.txt\n\n\u003Cspan class=\"hljs-comment\"># List ALL branches including deleted remote refs\u003C/span>\ngit branch -a -v &gt; ../branches.txt\n\n\u003Cspan class=\"hljs-comment\"># Find suspicious large binary additions\u003C/span>\ngit \u003Cspan class=\"hljs-built_in\">log\u003C/span> --all --diff-filter=A --name-only --format=\u003Cspan class=\"hljs-string\">&quot;%H %ai&quot;\u003C/span> -- \u003Cspan class=\"hljs-string\">&quot;*.so&quot;\u003C/span> \u003Cspan class=\"hljs-string\">&quot;*.dll&quot;\u003C/span> \u003Cspan class=\"hljs-string\">&quot;*.exe&quot;\u003C/span> \u003Cspan class=\"hljs-string\">&quot;*.bin&quot;\u003C/span> &gt; ../binary_additions.txt\n\n\u003Cspan class=\"hljs-comment\"># Check for GPG signature anomalies\u003C/span>\ngit \u003Cspan class=\"hljs-built_in\">log\u003C/span> --show-signature --format=\u003Cspan class=\"hljs-string\">&quot;%H %ai %aN&quot;\u003C/span> &gt; ../signature_check.txt 2&gt;&amp;1\u003C/code>\u003C/pre>\u003C/div>\u003Cp>\u003Cstrong>Evidence to collect\u003C/strong> (add via \u003Ccode>python3 SKILL_DIR/scripts/evidence-store.py add\u003C/code>):\u003C/p>\n\u003Cul>\n\u003Cli>Each dangling commit SHA → type: \u003Ccode>git\u003C/code>\u003C/li>\n\u003Cli>Force-push evidence (reflog showing history rewrite) → type: \u003Ccode>git\u003C/code>\u003C/li>\n\u003Cli>Unsigned commits from verified contributors → type: \u003Ccode>git\u003C/code>\u003C/li>\n\u003Cli>Suspicious binary file additions → type: \u003Ccode>git\u003C/code>\u003C/li>\n\u003C/ul>\n\u003Cp>\u003Cstrong>Reference\u003C/strong>: See \u003Ca href=\"./references/recovery-techniques.md\">recovery-techniques.md\u003C/a> for accessing force-pushed commits.\u003C/p>\n\u003Chr>\n\u003Ch3>Investigator 2: GitHub API Investigator\u003C/h3>\n\u003Cp>\u003Cstrong>ROLE BOUNDARY\u003C/strong>: You query the GITHUB REST API ONLY. Do not run git commands locally.\u003C/p>\n\u003Cp>\u003Cstrong>Actions\u003C/strong>:\u003C/p>\n\u003Cdiv class=\"md-code-block\">\u003Cdiv class=\"md-code-lang\">bash\u003C/div>\u003Cpre>\u003Ccode class=\"hljs language-bash\">\u003Cspan class=\"hljs-comment\"># Commits (paginated)\u003C/span>\ncurl -s \u003Cspan class=\"hljs-string\">&quot;https://api.github.com/repos/OWNER/REPO/commits?per_page=100&quot;\u003C/span> &gt; api_commits.json\n\n\u003Cspan class=\"hljs-comment\"># Pull Requests including closed/deleted\u003C/span>\ncurl -s \u003Cspan class=\"hljs-string\">&quot;https://api.github.com/repos/OWNER/REPO/pulls?state=all&amp;per_page=100&quot;\u003C/span> &gt; api_prs.json\n\n\u003Cspan class=\"hljs-comment\"># Issues\u003C/span>\ncurl -s \u003Cspan class=\"hljs-string\">&quot;https://api.github.com/repos/OWNER/REPO/issues?state=all&amp;per_page=100&quot;\u003C/span> &gt; api_issues.json\n\n\u003Cspan class=\"hljs-comment\"># Contributors and collaborator changes\u003C/span>\ncurl -s \u003Cspan class=\"hljs-string\">&quot;https://api.github.com/repos/OWNER/REPO/contributors&quot;\u003C/span> &gt; api_contributors.json\n\n\u003Cspan class=\"hljs-comment\"># Repository events (last 300)\u003C/span>\ncurl -s \u003Cspan class=\"hljs-string\">&quot;https://api.github.com/repos/OWNER/REPO/events?per_page=100&quot;\u003C/span> &gt; api_events.json\n\n\u003Cspan class=\"hljs-comment\"># Check specific suspicious commit SHA details\u003C/span>\ncurl -s \u003Cspan class=\"hljs-string\">&quot;https://api.github.com/repos/OWNER/REPO/git/commits/SHA&quot;\u003C/span> &gt; commit_detail.json\n\n\u003Cspan class=\"hljs-comment\"># Releases\u003C/span>\ncurl -s \u003Cspan class=\"hljs-string\">&quot;https://api.github.com/repos/OWNER/REPO/releases?per_page=100&quot;\u003C/span> &gt; api_releases.json\n\n\u003Cspan class=\"hljs-comment\"># Check if a specific commit exists (force-pushed commits may 404 on commits/ but succeed on git/commits/)\u003C/span>\ncurl -s \u003Cspan class=\"hljs-string\">&quot;https://api.github.com/repos/OWNER/REPO/commits/SHA&quot;\u003C/span> | jq .sha\u003C/code>\u003C/pre>\u003C/div>\u003Cp>\u003Cstrong>Cross-reference targets\u003C/strong> (flag discrepancies as evidence):\u003C/p>\n\u003Cul>\n\u003Cli>PR exists in archive but missing from API → evidence of deletion\u003C/li>\n\u003Cli>Contributor in archive events but not in contributors list → evidence of permission revocation\u003C/li>\n\u003Cli>Commit in archive PushEvents but not in API commit list → evidence of force-push/deletion\u003C/li>\n\u003C/ul>\n\u003Cp>\u003Cstrong>Reference\u003C/strong>: See \u003Ca href=\"./references/evidence-types.md\">evidence-types.md\u003C/a> for GH event types.\u003C/p>\n\u003Chr>\n\u003Ch3>Investigator 3: Wayback Machine Investigator\u003C/h3>\n\u003Cp>\u003Cstrong>ROLE BOUNDARY\u003C/strong>: You query the WAYBACK MACHINE CDX API ONLY. Do not use the GitHub API.\u003C/p>\n\u003Cp>\u003Cstrong>Goal\u003C/strong>: Recover deleted GitHub pages (READMEs, issues, PRs, releases, wiki pages).\u003C/p>\n\u003Cp>\u003Cstrong>Actions\u003C/strong>:\u003C/p>\n\u003Cdiv class=\"md-code-block\">\u003Cdiv class=\"md-code-lang\">bash\u003C/div>\u003Cpre>\u003Ccode class=\"hljs language-bash\">\u003Cspan class=\"hljs-comment\"># Search for archived snapshots of the repo main page\u003C/span>\ncurl -s \u003Cspan class=\"hljs-string\">&quot;https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO&amp;output=json&amp;limit=100&amp;from=YYYYMMDD&amp;to=YYYYMMDD&quot;\u003C/span> &gt; wayback_main.json\n\n\u003Cspan class=\"hljs-comment\"># Search for a specific deleted issue\u003C/span>\ncurl -s \u003Cspan class=\"hljs-string\">&quot;https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/issues/NUM&amp;output=json&amp;limit=50&quot;\u003C/span> &gt; wayback_issue_NUM.json\n\n\u003Cspan class=\"hljs-comment\"># Search for a specific deleted PR\u003C/span>\ncurl -s \u003Cspan class=\"hljs-string\">&quot;https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/pull/NUM&amp;output=json&amp;limit=50&quot;\u003C/span> &gt; wayback_pr_NUM.json\n\n\u003Cspan class=\"hljs-comment\"># Fetch the best snapshot of a page\u003C/span>\n\u003Cspan class=\"hljs-comment\"># Use the Wayback Machine URL: https://web.archive.org/web/TIMESTAMP/ORIGINAL_URL\u003C/span>\n\u003Cspan class=\"hljs-comment\"># Example: https://web.archive.org/web/20240101000000*/github.com/OWNER/REPO\u003C/span>\n\n\u003Cspan class=\"hljs-comment\"># Advanced: Search for deleted releases/tags\u003C/span>\ncurl -s \u003Cspan class=\"hljs-string\">&quot;https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/releases/tag/*&amp;output=json&quot;\u003C/span> &gt; wayback_tags.json\n\n\u003Cspan class=\"hljs-comment\"># Advanced: Search for historical wiki changes\u003C/span>\ncurl -s \u003Cspan class=\"hljs-string\">&quot;https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/wiki/*&amp;output=json&quot;\u003C/span> &gt; wayback_wiki.json\u003C/code>\u003C/pre>\u003C/div>\u003Cp>\u003Cstrong>Evidence to collect\u003C/strong>:\u003C/p>\n\u003Cul>\n\u003Cli>Archived snapshots of deleted issues/PRs with their content\u003C/li>\n\u003Cli>Historical README versions showing changes\u003C/li>\n\u003Cli>Evidence of content present in archive but missing from current GitHub state\u003C/li>\n\u003C/ul>\n\u003Cp>\u003Cstrong>Reference\u003C/strong>: See \u003Ca href=\"./references/github-archive-guide.md\">github-archive-guide.md\u003C/a> for CDX API parameters.\u003C/p>\n\u003Chr>\n\u003Ch3>Investigator 4: GH Archive / BigQuery Investigator\u003C/h3>\n\u003Cp>\u003Cstrong>ROLE BOUNDARY\u003C/strong>: You query GITHUB ARCHIVE via BIGQUERY ONLY. This is a tamper-proof record of all public GitHub events.\u003C/p>\n\u003Cblockquote>\n\u003Cp>\u003Cstrong>Prerequisites\u003C/strong>: Requires Google Cloud credentials with BigQuery access (\u003Ccode>gcloud auth application-default login\u003C/code>). If unavailable, skip this investigator and note it in the report.\u003C/p>\n\u003C/blockquote>\n\u003Cp>\u003Cstrong>Cost Optimization Rules\u003C/strong> (MANDATORY):\u003C/p>\n\u003Col>\n\u003Cli>ALWAYS run a \u003Ccode>--dry_run\u003C/code> before every query to estimate cost.\u003C/li>\n\u003Cli>Use \u003Ccode>_TABLE_SUFFIX\u003C/code> to filter by date range and minimize scanned data.\u003C/li>\n\u003Cli>Only SELECT the columns you need.\u003C/li>\n\u003Cli>Add a LIMIT unless aggregating.\u003C/li>\n\u003C/ol>\n\u003Cdiv class=\"md-code-block\">\u003Cdiv class=\"md-code-lang\">bash\u003C/div>\u003Cpre>\u003Ccode class=\"hljs language-bash\">\u003Cspan class=\"hljs-comment\"># Template: safe BigQuery query for PushEvents to OWNER/REPO\u003C/span>\nbq query --use_legacy_sql=\u003Cspan class=\"hljs-literal\">false\u003C/span> --dry_run \u003Cspan class=\"hljs-string\">&quot;\nSELECT created_at, actor.login, payload.commits, payload.before, payload.head,\n       payload.size, payload.distinct_size\nFROM \\`githubarchive.month.*\\`\nWHERE _TABLE_SUFFIX BETWEEN &#x27;YYYYMM&#x27; AND &#x27;YYYYMM&#x27;\n  AND type = &#x27;PushEvent&#x27;\n  AND repo.name = &#x27;OWNER/REPO&#x27;\nLIMIT 1000\n&quot;\u003C/span>\n\u003Cspan class=\"hljs-comment\"># If cost is acceptable, re-run without --dry_run\u003C/span>\n\n\u003Cspan class=\"hljs-comment\"># Detect force-pushes: zero-distinct_size PushEvents mean commits were force-erased\u003C/span>\n\u003Cspan class=\"hljs-comment\"># payload.distinct_size = 0 AND payload.size &gt; 0 → force push indicator\u003C/span>\n\n\u003Cspan class=\"hljs-comment\"># Check for deleted branch events\u003C/span>\nbq query --use_legacy_sql=\u003Cspan class=\"hljs-literal\">false\u003C/span> \u003Cspan class=\"hljs-string\">&quot;\nSELECT created_at, actor.login, payload.ref, payload.ref_type\nFROM \\`githubarchive.month.*\\`\nWHERE _TABLE_SUFFIX BETWEEN &#x27;YYYYMM&#x27; AND &#x27;YYYYMM&#x27;\n  AND type = &#x27;DeleteEvent&#x27;\n  AND repo.name = &#x27;OWNER/REPO&#x27;\nLIMIT 200\n&quot;\u003C/span>\u003C/code>\u003C/pre>\u003C/div>\u003Cp>\u003Cstrong>Evidence to collect\u003C/strong>:\u003C/p>\n\u003Cul>\n\u003Cli>Force-push events (payload.size &gt; 0, payload.distinct_size = 0)\u003C/li>\n\u003Cli>DeleteEvents for branches/tags\u003C/li>\n\u003Cli>WorkflowRunEvents for suspicious CI/CD automation\u003C/li>\n\u003Cli>PushEvents that precede a &quot;gap&quot; in the git log (evidence of rewrite)\u003C/li>\n\u003C/ul>\n\u003Cp>\u003Cstrong>Reference\u003C/strong>: See \u003Ca href=\"./references/github-archive-guide.md\">github-archive-guide.md\u003C/a> for all 12 event types and query patterns.\u003C/p>\n\u003Chr>\n\u003Ch3>Investigator 5: IOC Enrichment Investigator\u003C/h3>\n\u003Cp>\u003Cstrong>ROLE BOUNDARY\u003C/strong>: You enrich EXISTING IOCs from Phase 1 using passive public sources ONLY. Do not execute any code from the target repository.\u003C/p>\n\u003Cp>\u003Cstrong>Actions\u003C/strong>:\u003C/p>\n\u003Cul>\n\u003Cli>For each commit SHA: attempt recovery via direct GitHub URL (\u003Ccode>github.com/OWNER/REPO/commit/SHA.patch\u003C/code>)\u003C/li>\n\u003Cli>For each domain/IP: check passive DNS, WHOIS records (via \u003Ccode>web_extract\u003C/code> on public WHOIS services)\u003C/li>\n\u003Cli>For each package name: check npm/PyPI for matching malicious package reports\u003C/li>\n\u003Cli>For each actor username: check GitHub profile, contribution history, account age\u003C/li>\n\u003Cli>Recover force-pushed commits using 3 methods (see \u003Ca href=\"./references/recovery-techniques.md\">recovery-techniques.md\u003C/a>)\u003C/li>\n\u003C/ul>\n\u003Chr>\n\u003Ch2>Phase 3: Evidence Consolidation\u003C/h2>\n\u003Cp>After all investigators complete:\u003C/p>\n\u003Col>\n\u003Cli>Run \u003Ccode>python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json list\u003C/code> to see all collected evidence.\u003C/li>\n\u003Cli>For each piece of evidence, verify the \u003Ccode>content_sha256\u003C/code> hash matches the original source.\u003C/li>\n\u003Cli>Group evidence by:\u003Cul>\n\u003Cli>\u003Cstrong>Timeline\u003C/strong>: Sort all timestamped evidence chronologically\u003C/li>\n\u003Cli>\u003Cstrong>Actor\u003C/strong>: Group by GitHub handle or email\u003C/li>\n\u003Cli>\u003Cstrong>IOC\u003C/strong>: Link evidence to the IOC it relates to\u003C/li>\n\u003C/ul>\n\u003C/li>\n\u003Cli>Identify \u003Cstrong>discrepancies\u003C/strong>: items present in one source but absent in another (key deletion indicators).\u003C/li>\n\u003Cli>Flag evidence as \u003Ccode>[VERIFIED]\u003C/code> (confirmed from 2+ independent sources) or \u003Ccode>[UNVERIFIED]\u003C/code> (single source only).\u003C/li>\n\u003C/ol>\n\u003Chr>\n\u003Ch2>Phase 4: Hypothesis Formation\u003C/h2>\n\u003Cp>A hypothesis must:\u003C/p>\n\u003Cul>\n\u003Cli>State a specific claim (e.g., &quot;Actor X force-pushed to BRANCH on DATE to erase commit SHA&quot;)\u003C/li>\n\u003Cli>Cite at least 2 evidence IDs that support it (\u003Ccode>EV-XXXX\u003C/code>, \u003Ccode>EV-YYYY\u003C/code>)\u003C/li>\n\u003Cli>Identify what evidence would disprove it\u003C/li>\n\u003Cli>Be labeled \u003Ccode>[HYPOTHESIS]\u003C/code> until validated\u003C/li>\n\u003C/ul>\n\u003Cp>\u003Cstrong>Common hypothesis templates\u003C/strong> (see \u003Ca href=\"./references/investigation-templates.md\">investigation-templates.md\u003C/a>):\u003C/p>\n\u003Cul>\n\u003Cli>Maintainer Compromise: legitimate account used post-takeover to inject malicious code\u003C/li>\n\u003Cli>Dependency Confusion: package name squatting to intercept installs\u003C/li>\n\u003Cli>CI/CD Injection: malicious workflow changes to run code during builds\u003C/li>\n\u003Cli>Typosquatting: near-identical package name targeting misspellers\u003C/li>\n\u003Cli>Credential Leak: token/key accidentally committed then force-pushed to erase\u003C/li>\n\u003C/ul>\n\u003Cp>For each hypothesis, spawn a \u003Ccode>delegate_task\u003C/code> sub-agent to attempt to find disconfirming evidence before confirming.\u003C/p>\n\u003Chr>\n\u003Ch2>Phase 5: Hypothesis Validation\u003C/h2>\n\u003Cp>The validator sub-agent MUST mechanically check:\u003C/p>\n\u003Col>\n\u003Cli>For each hypothesis, extract all cited evidence IDs.\u003C/li>\n\u003Cli>Verify each ID exists in \u003Ccode>evidence.json\u003C/code> (hard failure if any ID is missing → hypothesis rejected as potentially fabricated).\u003C/li>\n\u003Cli>Verify each \u003Ccode>[VERIFIED]\u003C/code> piece of evidence was confirmed from 2+ sources.\u003C/li>\n\u003Cli>Check logical consistency: does the timeline depicted by the evidence support the hypothesis?\u003C/li>\n\u003Cli>Check for alternative explanations: could the same evidence pattern arise from a benign cause?\u003C/li>\n\u003C/ol>\n\u003Cp>\u003Cstrong>Output\u003C/strong>:\u003C/p>\n\u003Cul>\n\u003Cli>\u003Ccode>VALIDATED\u003C/code>: All evidence cited, verified, logically consistent, no plausible alternative explanation.\u003C/li>\n\u003Cli>\u003Ccode>INCONCLUSIVE\u003C/code>: Evidence supports hypothesis but alternative explanations exist or evidence is insufficient.\u003C/li>\n\u003Cli>\u003Ccode>REJECTED\u003C/code>: Missing evidence IDs, unverified evidence cited as fact, logical inconsistency detected.\u003C/li>\n\u003C/ul>\n\u003Cp>Rejected hypotheses feed back into Phase 4 for refinement (max 3 iterations).\u003C/p>\n\u003Chr>\n\u003Ch2>Phase 6: Final Report Generation\u003C/h2>\n\u003Cp>Populate \u003Ccode>investigation-report.md\u003C/code> using the template in \u003Ca href=\"./templates/forensic-report.md\">forensic-report.md\u003C/a>.\u003C/p>\n\u003Cp>\u003Cstrong>Mandatory sections\u003C/strong>:\u003C/p>\n\u003Cul>\n\u003Cli>Executive Summary: one-paragraph verdict (Compromised / Clean / Inconclusive) with confidence level\u003C/li>\n\u003Cli>Timeline: chronological reconstruction of all significant events with evidence citations\u003C/li>\n\u003Cli>Validated Hypotheses: each with status and supporting evidence IDs\u003C/li>\n\u003Cli>Evidence Registry: table of all \u003Ccode>EV-XXXX\u003C/code> entries with source, type, and verification status\u003C/li>\n\u003Cli>IOC List: all extracted and enriched Indicators of Compromise\u003C/li>\n\u003Cli>Chain of Custody: how evidence was collected, from what sources, at what timestamps\u003C/li>\n\u003Cli>Recommendations: immediate mitigations if compromise detected; monitoring recommendations\u003C/li>\n\u003C/ul>\n\u003Cp>\u003Cstrong>Report rules\u003C/strong>:\u003C/p>\n\u003Cul>\n\u003Cli>Every factual claim must have at least one \u003Ccode>[EV-XXXX]\u003C/code> citation\u003C/li>\n\u003Cli>Executive Summary must state confidence level (High / Medium / Low)\u003C/li>\n\u003Cli>All secrets/credentials must be redacted to \u003Ccode>[REDACTED]\u003C/code>\u003C/li>\n\u003C/ul>\n\u003Chr>\n\u003Ch2>Phase 7: Completion\u003C/h2>\n\u003Col>\n\u003Cli>Run final evidence count: \u003Ccode>python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json list\u003C/code>\u003C/li>\n\u003Cli>Archive the full investigation directory.\u003C/li>\n\u003Cli>If compromise is confirmed:\u003Cul>\n\u003Cli>List immediate mitigations (rotate credentials, pin dependency hashes, notify affected users)\u003C/li>\n\u003Cli>Identify affected versions/packages\u003C/li>\n\u003Cli>Note disclosure obligations (if a public package: coordinate with the package registry)\u003C/li>\n\u003C/ul>\n\u003C/li>\n\u003Cli>Present the final \u003Ccode>investigation-report.md\u003C/code> to the user.\u003C/li>\n\u003C/ol>\n\u003Chr>\n\u003Ch2>Ethical Use Guidelines\u003C/h2>\n\u003Cp>This skill is designed for \u003Cstrong>defensive security investigation\u003C/strong> — protecting open-source software from supply chain attacks. It must not be used for:\u003C/p>\n\u003Cul>\n\u003Cli>\u003Cstrong>Harassment or stalking\u003C/strong> of contributors or maintainers\u003C/li>\n\u003Cli>\u003Cstrong>Doxing\u003C/strong> — correlating GitHub activity to real identities for malicious purposes\u003C/li>\n\u003Cli>\u003Cstrong>Competitive intelligence\u003C/strong> — investigating proprietary or internal repositories without authorization\u003C/li>\n\u003Cli>\u003Cstrong>False accusations\u003C/strong> — publishing investigation results without validated evidence (see anti-hallucination guardrails)\u003C/li>\n\u003C/ul>\n\u003Cp>Investigations should be conducted with the principle of \u003Cstrong>minimal intrusion\u003C/strong>: collect only the evidence necessary to validate or refute the hypothesis. When publishing results, follow responsible disclosure practices and coordinate with affected maintainers before public disclosure.\u003C/p>\n\u003Cp>If the investigation reveals a genuine compromise, follow the coordinated vulnerability disclosure process:\u003C/p>\n\u003Col>\n\u003Cli>Notify the repository maintainers privately first\u003C/li>\n\u003Cli>Allow reasonable time for remediation (typically 90 days)\u003C/li>\n\u003Cli>Coordinate with package registries (npm, PyPI, etc.) if published packages are affected\u003C/li>\n\u003Cli>File a CVE if appropriate\u003C/li>\n\u003C/ol>\n\u003Chr>\n\u003Ch2>API Rate Limiting\u003C/h2>\n\u003Cp>GitHub REST API enforces rate limits that will interrupt large investigations if not managed.\u003C/p>\n\u003Cp>\u003Cstrong>Authenticated requests\u003C/strong>: 5,000/hour (requires \u003Ccode>GITHUB_TOKEN\u003C/code> env var or \u003Ccode>gh\u003C/code> CLI auth)\n\u003Cstrong>Unauthenticated requests\u003C/strong>: 60/hour (unusable for investigations)\u003C/p>\n\u003Cp>\u003Cstrong>Best practices\u003C/strong>:\u003C/p>\n\u003Cul>\n\u003Cli>Always authenticate: \u003Ccode>export GITHUB_TOKEN=ghp_...\u003C/code> or use \u003Ccode>gh\u003C/code> CLI (auto-authenticates)\u003C/li>\n\u003Cli>Use conditional requests (\u003Ccode>If-None-Match\u003C/code> / \u003Ccode>If-Modified-Since\u003C/code> headers) to avoid consuming quota on unchanged data\u003C/li>\n\u003Cli>For paginated endpoints, fetch all pages in sequence — don&#39;t parallelize against the same endpoint\u003C/li>\n\u003Cli>Check \u003Ccode>X-RateLimit-Remaining\u003C/code> header; if below 100, pause for \u003Ccode>X-RateLimit-Reset\u003C/code> timestamp\u003C/li>\n\u003Cli>BigQuery has its own quotas (10 TiB/day free tier) — always dry-run first\u003C/li>\n\u003Cli>Wayback Machine CDX API: no formal rate limit, but be courteous (1-2 req/sec max)\u003C/li>\n\u003C/ul>\n\u003Cp>If rate-limited mid-investigation, record the partial results in the evidence store and note the limitation in the report.\u003C/p>\n\u003Chr>\n\u003Ch2>Reference Materials\u003C/h2>\n\u003Cul>\n\u003Cli>\u003Ca href=\"./references/github-archive-guide.md\">github-archive-guide.md\u003C/a> — BigQuery queries, CDX API, 12 event types\u003C/li>\n\u003Cli>\u003Ca href=\"./references/evidence-types.md\">evidence-types.md\u003C/a> — IOC taxonomy, evidence source types, observation types\u003C/li>\n\u003Cli>\u003Ca href=\"./references/recovery-techniques.md\">recovery-techniques.md\u003C/a> — Recovering deleted commits, PRs, issues\u003C/li>\n\u003Cli>\u003Ca href=\"./references/investigation-templates.md\">investigation-templates.md\u003C/a> — Pre-built hypothesis templates per attack type\u003C/li>\n\u003Cli>\u003Ca href=\"./scripts/evidence-store.py\">evidence-store.py\u003C/a> — CLI tool for managing the evidence JSON store\u003C/li>\n\u003Cli>\u003Ca href=\"./templates/forensic-report.md\">forensic-report.md\u003C/a> — Structured report template\u003C/li>\n\u003C/ul>\n"]