Skills, Hooks, Subagents: how we set up a real Claude Code team

I'll start with a stat that surprised me.

Across the twelve client projects where we integrated Claude Code since February, ten had already created a Skill before we arrived. But none had written a Hook. And only two used Subagents — every time for the wrong reasons.

The problem isn't that Hooks are hard. It's that no one talks about when to use what, and the official docs, as good as they are, don't explicitly say "this is how you should structure your .claude/ for a production repo".

That's what this article tries to fix. Here's what we learned setting these up, debugging them, and explaining them to clients.

The three things to understand once and for all

Claude Code exposes three extensibility layers. They don't serve the same purpose at all, and that's where people get lost.

A Skill is a Markdown file with a frontmatter, invoked as a slash-command. You type /review-component instead of re-typing your 200-word prompt every time. It's a versioned prompt template. Anthropic calls them "Skills" rather than saying "prompt macros", but in practice that's exactly what they are — and that's perfectly fine.

A Hook is a script (shell, Python, whatever) that fires on a Claude Code event. PreToolUse, PostToolUse, Stop, etc. Unlike a Skill, it's not Claude that decides whether to run it — it's the Claude Code harness. If the script exits with code 2, the action is blocked. It's your deterministic safety net, which works even if Claude is poorly prompted or mid-hallucination.

A Subagent, finally, is a separate Claude instance with its own context window, system prompt, and tool set. When the main agent delegates, only the final summary bubbles back up into the parent conversation (official docs).

If you remember one rule, remember this: start with Skills (cheapest, most immediate value). Add Hooks when you need a deterministic guarantee that Claude can't bypass. Pull out Subagents when parallelism or context isolation becomes critical.

Skills are your shared prompt library

A Skill is the equivalent of a snippet or VS Code macro — except instead of pasting text, you ask Claude to run a workflow.

Here's a Skill we deploy at every client for React component reviews:

---
name: review-component
description: Full audit of a React component (readability, perf, a11y)
allowed-tools: Read, Grep, Bash
argument-hint: "[path/to/component.tsx]"
---

# Review Component

You are auditing a React component. Read the file at $ARGUMENTS, then:

1. **Readability**: naming, cyclomatic complexity, nesting depth
2. **Performance**: memoization, re-renders, bundle impact
3. **Accessibility**: ARIA, contrast, focus, semantics
4. **Patterns**: compliance with /docs/style-guide.md

Reply in English, structured by section. Output /10 score per axis + top 3 fixes.

You write it once, version it in the repo, and everyone on the team can type /review-component src/components/Button.tsx. No more "can you send me your review prompt" on Slack.

A Skill is the right tool when you repeat a long prompt more than three times, when the prompt evolves and you want it in git, or when several people need to invoke it the same way. In short: when you want consistency without the cognitive overhead.

Where it goes wrong: people putting safety-critical rules in Skills. Like "never deploy to prod without green tests". A Skill is a prompt — Claude can interpret it creatively, or even ignore it under contextual pressure. If it's critical, you need a Hook.

Hooks are your safety net

Where a Skill politely asks, a Hook enforces. The Claude Code harness intercepts the event, runs your script, and if the script fails, the action is blocked. Full stop.

The first hook we deploy at every client: block direct pushes to main. Here's what it looks like:

#!/usr/bin/env bash
# .claude/hooks/pre-bash.sh — Triggered by PreToolUse on Bash
INPUT=$(cat)
CMD=$(echo "$INPUT" | jq -r '.tool_input.command')

if echo "$CMD" | grep -qE "git push.*\bmain\b|git push.*\bmaster\b"; then
  echo "❌ Direct push to main/master is forbidden. Use a PR." >&2
  exit 2
fi
exit 0

And the binding in .claude/settings.json:

{
  "hooks": {
    "PreToolUse": [
      { "matcher": "Bash", "hooks": [{ "type": "command", "command": ".claude/hooks/pre-bash.sh" }] }
    ]
  }
}

The agent can be charmed, poorly prompted, jailbroken by a sloppy copy-paste from a PR — the hook still runs. It's your last line of defense.

The hooks we set up almost systematically:

A secret-guard that scans diffs before writes and blocks if it sees AWS_SECRET, API_KEY, or -----BEGIN PRIVATE KEY-----. Three out of twelve clients had already committed a secret to git history before we arrived. The hook saved two more during our first weeks.
A format-on-edit that runs prettier or ruff format after every edit. You'll never see a PR with random whitespace again.
A test-on-stop that runs fast tests at the end of each session. Not all tests — just those touching modified files. It's the poor man's test impact analysis, but it works.

Where Hooks get misused: trying to make them do subjective judgment. "Is the code well commented?" — that's not a Hook, that's a review. If you want to automate that, it's a Subagent.

Subagents give you parallelism and isolation

And now Subagents. Probably the most powerful layer, and the one where we see the most mistakes.

A Subagent is a Claude instance with its own context. When the main agent delegates a task, it's the equivalent of await subagent.run("do this") — and only the final summary comes back. The 50K tokens of logs the subagent parsed to find the bug? They stay in its context. The main agent just sees "found it: race condition on line 47, here's the proposed diff".

Three concrete benefits:

First, the main agent's context window stays clean. You hold longer sessions without getting compacted. That's huge on big projects where every token matters.

Second, you can give each subagent different permissions. Our security-scanner has Read and Grep access but no Write or Bash — it can audit but can't modify. Principle of least privilege, applied to agents.

Third, parallelism. Probably the most visible win. On client PRs, we run four subagents in parallel: code-reviewer, test-runner, a11y-auditor, security-scanner. Each does its thing, the main agent synthesizes the four summaries into one PR comment. Before: five minutes sequential. After: 90 seconds parallel.

                Main Claude Agent (orchestrator)
                          │
       ┌──────────────────┼──────────────────┬────────────────┐
       ▼                  ▼                  ▼                ▼
  code-reviewer    test-runner         a11y-auditor     security-scanner
  Read, Grep       Read, Bash          Read, Bash       Read, Grep
       │                  │                  │                 │
       └──── summary ─────┴──── summary ─────┴──── summary ────┘
                          │
                  Main agent synthesizes
                  → final PR comment

Where Subagents go bad: overuse. Spawning a subagent adds three to five seconds of overhead — initialization, system prompt, summary generation. For a two-second operation, you triple the latency for nothing. Our rule: if you can do the task in under five seconds inline, don't pull out a subagent.

The `.claude/` structure we deploy

Here's what it looks like in practice on a medium project. It's become our baseline template:

.claude/
├── settings.json          # config + hook bindings
├── CLAUDE.md              # project spec (auto-loaded)
├── hooks/
│   ├── pre-bash.sh        # blocks git push main, rm -rf /, etc.
│   ├── secret-guard.sh    # PreToolUse Write/Edit — secret scan
│   ├── format-on-edit.sh  # PostToolUse Edit — prettier/ruff
│   └── test-on-stop.sh    # Stop — fast tests on touched files
├── skills/
│   ├── review-component/SKILL.md
│   ├── new-route/SKILL.md
│   ├── debug-prod-incident/SKILL.md
│   └── deploy-staging/SKILL.md
└── agents/
    ├── code-reviewer.md
    ├── security-scanner.md
    ├── a11y-auditor.md
    └── test-runner.md

The rule we set: a new dev should be able to read this folder in 15 minutes and understand how to work on this repo. Not just what to do — but how. What hooks will yell at them, what Skills are available, what subagents exist.

The numbers after six months

A few aggregate metrics across twelve projects, because I know that's what you want to see.

Average PR review time went from 4 min 30 to 1 min 40 (yes, we measured). "I forgot to format" incidents went from ~12 per sprint to zero thanks to format-on-edit. Tokens consumed per typical session went from 180K to 65K — the win comes mostly from subagent isolation.

And the one managers love: zero committed secrets in six months across twelve repos. We'd had three incidents in two years before. The secret-guard hook runs silently and does its job.

The initial investment — setting up hooks, Skills, subagents — is about two days on a medium-sized repo. ROI hits in three to five sprints. If you count it in engineer-hours saved, it's one of the best tooling investments we've seen this year.