opensourcesecurity/claude-skill-portfolio-reviewer

Fork 0

A Claude Code skill that reviews a software portfolio via three parallel sub-agents, each applying a different engineering lens.

Python 100%

Find a file

skip 6584e8b91d Remove sample-output.md; will replace with updated version after portfolio refresh		2026-05-27 10:31:56 -06:00
.claude	Add README and requirements.txt; strip em-dashes; clean sample output	2026-05-27 09:45:26 -06:00
evals	Initial commit: portfolio reviewer skill with three sub-agents and eval harness	2026-05-27 09:28:08 -06:00
portfolio_cache	Initial commit: portfolio reviewer skill with three sub-agents and eval harness	2026-05-27 09:28:08 -06:00
scripts	Add README and requirements.txt; strip em-dashes; clean sample output	2026-05-27 09:45:26 -06:00
.gitignore	Initial commit: portfolio reviewer skill with three sub-agents and eval harness	2026-05-27 09:28:08 -06:00
README.md	Use skip.star@ for bearer token contact in README	2026-05-27 09:51:59 -06:00
requirements.txt	Add README and requirements.txt; strip em-dashes; clean sample output	2026-05-27 09:45:26 -06:00

README.md

claude-skill-portfolio-reviewer

A Claude Code skill that reviews a software portfolio by dispatching three specialist sub-agents in parallel, each evaluating a different project through a different engineering lens, then synthesizing their structured findings into a single candidate assessment.

Built as a portfolio piece for FDE applications. The architecture is the demo.

The demo

Clone this repo, run claude inside it, and type:

Review this candidate's portfolio.

About two and a half minutes later, you get an evidence-cited assessment across three engineering domains - production AI, web application architecture, and security engineering - with a synthesis section a hiring manager could quote. See examples/sample-output.md for the actual output of a real run.

The demo works because three reviewer sub-agents fire in parallel against pre-cached project snapshots, each with its own context window, its own scoped system prompt, and its own evaluation lens. They return structured findings in a uniform shape, and the parent Claude synthesizes them. No human in the loop.

Try it

You need Claude Code installed and authenticated (claude --version should work).

git clone https://git.opensourcesecurity.net/opensourcesecurity/claude-skill-portfolio-reviewer.git
cd claude-skill-portfolio-reviewer
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
claude

Then at the Claude Code prompt:

Review this candidate's portfolio.

The cached project snapshots in portfolio_cache/ are committed, so the demo runs zero-setup. If you want to regenerate the cache from the live MCP server, see Regenerating the cache below.

What this skill does

The portfolio-reviewer skill orchestrates three reviewer sub-agents, each scoped to one project and one engineering lens:

Sub-agent	Project	Lens
`rag-reviewer`	`rag_ai`	Production AI engineering - multi-tenancy, error handling, prompt engineering, monitoring, Zammad integration
`opensupport-reviewer`	`opensupport`	Web application architecture - Elixir/OTP idiomatic use, LiveView design, data model, testing discipline, dependency posture
`glassbox-reviewer`	`glassbox`	Security engineering - threat model clarity, defense in depth, whitelist posture, red team rigor, AppArmor/chroot quality

Each reviewer follows the same six-part contract: scope (what to look at), tools (how to look), method (in what order), dimensions (what to evaluate), output format (how to report), and hard rules (what's required). That uniformity is what makes the synthesis tractable - all three return the same shape, so the parent can compose without case-handling.

Architecture

                      User: "Review this candidate's portfolio"
                                        │
                                        ▼
                          ┌─────────────────────────────┐
                          │  Claude Code (main agent)   │
                          │  loads portfolio-reviewer   │
                          │  skill, dispatches in       │
                          │  parallel ↓                 │
                          └─────────────────────────────┘
                                        │
                ┌───────────────────────┼───────────────────────┐
                ▼                       ▼                       ▼
       ┌────────────────┐     ┌────────────────────┐    ┌─────────────────┐
       │ rag-reviewer   │     │ opensupport-       │    │ glassbox-       │
       │  sub-agent     │     │  reviewer sub-agent│    │  reviewer       │
       │                │     │                    │    │  sub-agent      │
       │  reads cache:  │     │  reads cache:      │    │  reads cache:   │
       │  rag_ai.md     │     │  opensupport.md    │    │  glassbox.md    │
       │                │     │                    │    │                 │
       │  returns       │     │  returns           │    │  returns        │
       │  structured    │     │  structured        │    │  structured     │
       │  findings      │     │  findings          │    │  findings       │
       └────────────────┘     └────────────────────┘    └─────────────────┘
                │                       │                       │
                └───────────────────────┼───────────────────────┘
                                        ▼
                          ┌─────────────────────────────┐
                          │  Main Claude synthesizes:   │
                          │  3 findings → 1 assessment  │
                          └─────────────────────────────┘
                                        │
                                        ▼
                              Portfolio assessment

The pattern is fan-out / fan-in orchestration. One parent invokes N workers in parallel, each does scoped work in its own context window, returns its structured result, and the parent synthesizes. Same shape as MapReduce. It's a classical distributed-computing pattern now applied to agents.

Why sub-agents and not one big Claude doing everything? Context isolation. Each reviewer can do deep work on its project without polluting the parent's context with 50KB of tool-call noise. The parent sees only the three structured returns. The architecture scales - adding a fourth project would mean adding a fourth reviewer; the parent's job stays the same.

Repository layout

claude-skill-portfolio-reviewer/
├── .claude/
│   ├── agents/
│   │   ├── rag-reviewer.md           # Sub-agent: production AI lens
│   │   ├── opensupport-reviewer.md   # Sub-agent: web app lens
│   │   └── glassbox-reviewer.md      # Sub-agent: security lens
│   └── skills/
│       └── portfolio-reviewer/
│           └── SKILL.md              # The orchestrating skill
├── portfolio_cache/                  # Pre-fetched project snapshots
│   ├── rag_ai.md
│   ├── opensupport.md
│   └── glassbox.md
├── scripts/
│   └── fetch_portfolio.py            # Cache builder - talks to live MCP server
├── evals/
│   ├── prompts.yaml                  # 5 representative prompts with assertions
│   └── run_evals.py                  # Behavioral eval harness
├── examples/
│   └── sample-output.md              # Real output from a run, for reference
├── requirements.txt
├── .gitignore
└── README.md

Two execution paths

The repo has two paths to the same data, and that's deliberate.

Path 1 - Cached (default). The reviewer sub-agents read from portfolio_cache/*.md, which is committed to the repo. Zero setup. Deterministic. The cache is regenerated by running scripts/fetch_portfolio.py. This is the path the demo uses, and it's what makes a 5-word prompt produce a working review for any hiring manager who clones the repo.

Path 2 - Live MCP. The cache is built by talking to a live MCP server I deployed at mcp.opensourcesecurity.net that exposes my Forgejo instance as nine LLM-callable tools (forgejo_get_repo, forgejo_get_file_contents, forgejo_list_commits, etc.). The MCP server is the architectural centerpiece. The cache is its output, not a replacement.

Right now, the demo path is cached because Claude Code's current behavior around remote MCP sub-agent calls is inconsistent (see Known issues below). The cache path is the floor; live MCP is the ceiling and is actively being made reliable.

Regenerating the cache

To rebuild the cache from the live MCP server:

export MCP_BEARER_TOKEN='your-evaluator-token'   # request from skip.star@opensourcesecurity.net
python scripts/fetch_portfolio.py

This is what proves the MCP server is real and the cache is faithful. Running it takes about 20 seconds and writes one structured markdown file per project. The script is in scripts/fetch_portfolio.py - it's a polished portfolio piece in its own right and documents the MCP Python SDK usage if you want to see what live MCP integration looks like.

Running the evals

The skill has a behavioral eval harness covering five representative prompts (formal, casual, narrow). Each eval invokes Claude Code via claude --print, parses the response, and checks observable behaviors: did the right sub-agents fire, are the projects mentioned, does the synthesis section appear, are evidence citations present.

python evals/run_evals.py                    # run all five
python evals/run_evals.py --prompt NAME      # run one
python evals/run_evals.py --keep-output      # save raw responses to evals/results/

Each eval is one full Claude Code run with sub-agent fan-out, so a complete suite takes roughly 10 minutes and costs roughly $2-3 in API credits. Use sparingly - when modifying the skill or sub-agents, not on every commit.

The assertions are small Python predicates in evals/run_evals.py. Adding a new behavioral check is a one-function addition; new prompts get added to evals/prompts.yaml.

Known issues

Live MCP from sub-agents is unreliable on Claude Code 2.1.150

When sub-agents are configured to call a remote MCP server directly (via the mcpServers: field in their YAML frontmatter), tool calls hang silently - the TUI shows "Running…" indefinitely while the server has already returned. The transcript files in ~/.claude/projects/.../subagents/ reveal that tool results often arrive correctly but the sub-agent's orchestration loop stalls before composing the next assistant turn.

This is consistent with GitHub issue 60061 and a regression introduced around December 2025 (see issue 13898). The diagnosis path used jq against the sub-agent JSONL transcripts to confirm whether tool results landed and whether subsequent assistant turns were ever attempted - they weren't.

The cache layer in this repo is the workaround. Sub-agents read pre-fetched files instead of calling MCP live, which avoids the broken code path entirely. The MCP server itself is healthy - scripts/fetch_portfolio.py calls it without issue using the Python SDK.

Making live MCP work reliably from sub-agents is the next thing to fix.

What's next

The reviewers themselves flagged real gaps in their findings. These are the actionable items:

Reliable live MCP from sub-agents. Investigate keepalive frames at the FastMCP layer, alternative transport configuration, and the GitHub 60061 fix landscape.
Source code in cache, not just READMEs. Right now the reviewers can credit architectural claims but can't verify them. Adding load-bearing source files (3-5 per project) to the cache would let the reviewers report findings instead of claims.
Testing discipline visible in repos. Both the opensupport and rag_ai reviewers flagged that test coverage isn't articulated in the project READMEs. The fix is one paragraph per repo - honest acknowledgment of what is and isn't tested.
Eval harness coverage expansion. Five prompts cover invocation styles. Adding evals for edge cases (ambiguous prompts, prompts naming wrong projects, prompts in different languages) would tighten regression coverage.

Why this exists

Most "portfolio" repos are dumps of projects with a README that says "here's what I built." This one tries to do something different: it builds a tool that reviews the portfolio, so the act of evaluating the candidate becomes self-demonstrating. Run the skill and you've already seen the candidate think about agent orchestration, sub-agent context isolation, structured-output contracts, evaluation discipline, and how to debug a working-but-flaky toolchain.

The audience is hiring managers and engineers at companies hiring Forward Deployed Engineers. The author of this repo is currently exploring such roles. Reach out via opensourcesecurity.net.

License

BSD-2-Clause.