From Claude Code to Kiro CLI: Porting a Multi-Agent Skill Builder

Table of Contents

The Port That Wasn’t a Port #

I thought porting Claude Code’s skill-creator to Kiro CLI would take an afternoon. Read the prompts, translate the Agent() calls to subagent configs, done.

It took a week. What looked like a translation exercise turned into an architecture redesign. The reason: Claude Code and Kiro CLI have fundamentally different models for what an “agent” is, and those differences cascade through every design decision — from how you run tests, to how you isolate a control group, to how the orchestrator decides who does what.

This post walks through the full port: what the original does, why Kiro CLI can’t replicate it directly, and the architecture we built instead. Every config snippet is from the working implementation in this blog’s repository.

The original skill-creator lives in Anthropic’s public skills repo, and the methodology is documented in The Complete Guide to Building Skills for Claude. Our port preserves the evaluation methodology while rebuilding the orchestration layer for Kiro CLI’s named-agent model.

TL;DR — What you’ll learn:
How Claude Code’s skill-creator uses ephemeral Agent() calls for evaluation
Why Kiro CLI’s named-agent model requires a different orchestration strategy
The 8-agent architecture we built (meta-builder, skill-executor, baseline, grader, comparator, analyzer, llm-worker, skill-trigger-tester)
Key design decisions: subagents for judgment, shell for execution, TODO templates for determinism
The idempotency problem — why neither the default agent NOR omitting --agent works for reproducible evals
Real benchmark results from the first evaluation run

How Skill-Creator Works in Claude Code #

Claude Code’s skill-creator is elegant in its simplicity. One long-running agent does everything — interviewing the user, drafting skills, running evaluations, grading results, and iterating. When it needs parallel work, it spawns ephemeral workers via the Agent() tool.

graph TD A[Claude Code Session] --> B[skill-creator SKILL.md loaded in context] B --> C{Parent agent orchestrates everything} C --> D["Agent(prompt='Execute with skill...')"] C --> E["Agent(prompt='Execute WITHOUT skill...')"] C --> F["Agent(prompt='Read grader.md, grade these...')"] C --> G["Bash('python run_eval.py ...')"] D --> H[Returns transcript text] E --> H F --> I[Returns grading JSON] G --> J[Returns benchmark data]

The key characteristics:

One agent does everything — no pre-configuration needed
Agent() creates disposable workers with any prompt the parent writes on the fly
Workers inherit parent’s tools — full filesystem, bash, everything
The agents/ folder contains .md files that are read at runtime and passed as prompts to ephemeral Agent() calls
Scripts use claude -p for headless LLM calls (description optimization, trigger testing)

The scripts handle the mechanical parts:

Script	Purpose	Uses LLM?
`run_eval.py`	Tests if a skill’s description triggers correctly	Yes (`claude -p` subprocess)
`run_loop.py`	Orchestrates eval → improve → re-eval cycles	Yes (via run_eval + improve_description)
`improve_description.py`	Rewrites description based on failures	Yes (`claude -p` subprocess)
`aggregate_benchmark.py`	Computes pass rates, mean/stddev, deltas	No
`generate_report.py`	Generates HTML report for description optimization results	No

Here’s how the scripts chain together in the evaluation loop:

Pink = uses LLM (claude -p in the original, kiro-cli chat --no-interactive --agent llm-worker in our port), Blue = pure Python (no LLM calls). Our port also adds validate_workspace.py (pure Python) which validates workspace structure before aggregation — catching format errors that would otherwise cascade.

And the subagent prompts (read by the parent, passed to Agent()):

File	Role
`agents/grader.md`	Grade assertions pass/fail with evidence
`agents/comparator.md`	Blind A/B comparison of two outputs
`agents/analyzer.md`	Surface patterns that aggregate stats hide

This works beautifully in Claude Code because Agent() is infinitely flexible — you write any prompt, get any behavior, no registration required.

Why Kiro CLI Is Different #

Kiro CLI’s subagent tool looks similar on the surface but operates on fundamentally different principles:

Dimension	Claude Code `Agent()`	Kiro CLI `subagent`
Identity	Ephemeral, no name	Named, pre-configured JSON
Prompt	Written on the fly by parent	Fixed in config (or `file://` reference)
Tools	Inherits parent’s tools	Own tools, own MCP servers
Routing	Parent decides in-context	Description-driven (parent reads descriptions)
Lifecycle	Dies after returning	Persistent config, spawned per-call
Parallelism	Unlimited	Parallel (DAG-based task graphs)

The routing problem #

In Claude Code, the parent constructs the perfect prompt for each worker. In Kiro CLI, the parent must choose from a fixed menu of named agents based on their description fields. If descriptions are vague, routing fails silently — the parent picks the wrong agent or skips delegation entirely.

This makes the description field the most critical piece of the architecture. It’s not documentation — it’s the routing table.

What can’t be a subagent #

Test execution — running the skill under test — can’t use the subagent tool because:

The skill-executor agent needs its own tools, skills, and MCP servers
The baseline agent must not see any workspace skills (to be a valid control)
Subagents inherit the parent’s MCP paths but load their own agent config — you can’t get a truly clean environment
Both must be idempotent — the same config every time, regardless of user settings

The solution: kiro-cli chat --no-interactive --agent skill-executor --trust-all-tools and --agent baseline as shell subprocesses. Separate process, separate config, clean isolation, reproducible results.

The Kiro CLI Architecture #

Here’s what we built — eight agents, each with a specific role (seven for evaluation orchestration, plus a skill-trigger-tester used by the description optimization scripts):

The orchestrator: `meta-builder.json` #

The full evaluation flow — showing exactly which component handles each step:

sequenceDiagram participant U as User participant MB as meta-builder participant SH as Shell (kiro-cli) participant PY as Python scripts participant SG as skill-grader participant SC as skill-comparator participant SA as skill-analyzer U->>MB: "Evaluate my kiro-docs skill" MB->>MB: Read skill, draft test prompts MB->>U: Confirm test prompts? U->>MB: Approved par Test Execution (shell subprocesses — idempotent agents) MB->>SH: kiro-cli chat --agent skill-executor "test query" MB->>SH: kiro-cli chat --agent baseline "test query" end MB->>SG: Grade with_skill transcript SG-->>MB: grading.json (pass/fail + evidence) MB->>SG: Grade without_skill transcript SG-->>MB: grading.json MB->>SC: Blind A/B compare (doesn't know which is which) SC-->>MB: comparison.json (winner + reasoning) MB->>PY: python -m scripts.validate_workspace PY-->>MB: ✓ valid (or fix errors) MB->>PY: python -m scripts.aggregate_benchmark PY-->>MB: benchmark.json (pass rates, deltas) MB->>SA: Analyze patterns in benchmark SA-->>MB: analyst_notes.json MB->>PY: python generate_review.py --static PY-->>MB: report.html MB->>U: Review results (link to report)

{
  "name": "meta-builder",
  "description": "Kiro CLI skill and agent builder. Invoke when you need to create new skills, scaffold agent configurations, or design custom agents with proper security and tooling. Guides through structured interviews before generating artifacts.",
  "prompt": "file://../prompts/meta-builder.md",
  "tools": ["read", "write", "shell", "subagent", "todo_list"],
  "allowedTools": ["read", "write", "shell", "todo_list", "subagent"],
  "resources": [
    "skill://.kiro/skills/agent-creator/SKILL.md",
    "skill://.kiro/skills/skill-creator/SKILL.md"
  ],
  "hooks": {
    "agentSpawn": [
      { "command": "ls .kiro/skills/ 2>/dev/null | head -10 || echo 'No skills found'" },
      { "command": "ls .kiro/agents/ 2>/dev/null | head -10 || echo 'No agents found'" }
    ]
  },
  "toolsSettings": {
    "fs_write": {
      "allowedPaths": [".kiro/skills/**", ".kiro/agents/**", ".kiro/prompts/**", "/tmp/**", "/private/tmp/**"]
    },
    "execute_bash": {
      "allowedCommands": [
        "^kiro-cli agent validate.*$",
        "^kiro-cli agent list.*$",
        "^kiro-cli chat --no-interactive.*$",
        "^python -m scripts\\..*$",
        "^python .kiro/skills/skill-creator/.*$",
        "^mkdir -p /tmp/.*$",
        "^cp -r .* /tmp/.*$"
      ],
      "deniedCommands": ["^rm -rf.*$", "^git push.*$"],
      "autoAllowReadonly": true
    },
    "subagent": {
      "availableAgents": ["researcher", "skill-grader", "skill-comparator", "skill-analyzer"],
      "trustedAgents": ["researcher", "skill-grader", "skill-comparator", "skill-analyzer"]
    }
  },
  "welcomeMessage": "What skill or agent would you like to build? I'll guide you through the process."
}

Notice what’s happening: the meta-builder can shell out to kiro-cli chat --no-interactive (for test execution) and delegate to subagents (for judgment). It can’t rm -rf or git push. The trustedAgents list means it won’t prompt for permission before delegating — these are pre-approved. The hooks.agentSpawn lists workspace skills and agents on every invocation for context. And fs_write.allowedPaths scopes write access to .kiro/ config directories and /tmp/ — nothing else.

The execution pair: `skill-executor.json` and `baseline.json` #

These two agents form the A/B pair. They’re identical in capability — the only difference is skill discovery:

{
  "name": "skill-executor",
  "description": "Full-capability agent with all workspace skills loaded. Used as the 'with_skill' executor in skill evaluations. Mirrors kiro_default behavior — same tools, same prompt style, same skill discovery — but is deterministic regardless of user's default agent settings.",
  "prompt": "You are the Kiro CLI agent, bringing the power of AI-assisted development directly to the user's terminal. You help with coding tasks, system operations, AWS management, and development workflows. When a skill's description matches the user's request, read and follow that skill's instructions.",
  "tools": ["*"],
  "allowedTools": ["*"],
  "resources": [
    "skill://.kiro/skills/*/SKILL.md",
    "skill://~/.kiro/skills/*/SKILL.md",
    "file://.kiro/steering/**/*.md"
  ]
}

{
  "name": "baseline",
  "description": "Clean agent with no workspace skills loaded. Used as a control in skill evaluation baselines — mirrors kiro_default (same tools, same steering, same prompt style) but without any skill:// resources. This isolates the skill's contribution from tool capability.",
  "prompt": "You are the default Kiro CLI agent, bringing the power of AI-assisted development directly to the user's terminal. You help with coding tasks, system operations, AWS management, and development workflows.",
  "tools": ["*"],
  "allowedTools": ["*"],
  "resources": [
    "file://AGENTS.md",
    "file://README.md",
    "file://.kiro/steering/**/*.md"
  ]
}

Both are pinned configs — they behave the same regardless of what the user has set as their default agent. This makes evaluations idempotent: run them on any machine, get reproducible A/B comparisons.

The judgment agents #

All three share the same pattern — minimal tools, write scoped to /tmp/, prompts loaded from the skill-creator’s agents/ directory:

{
  "name": "skill-grader",
  "description": "Skill evaluation grader. Invoke to evaluate assertions/expectations against execution transcripts and output files. Grades each assertion as pass/fail with evidence.",
  "prompt": "file://../skills/skill-creator/agents/grader.md",
  "tools": ["read", "write"],
  "allowedTools": ["read", "write"],
  "toolsSettings": {
    "fs_write": {
      "allowedPaths": ["/tmp/**", "/private/tmp/**"]
    }
  }
}

The comparator adds isolation — it cannot read metadata that would reveal which output came from which variant:

{
  "name": "skill-comparator",
  "description": "Blind output comparator. Invoke to compare two outputs without knowing which skill produced them. Judges quality and task completion without bias.",
  "prompt": "file://../skills/skill-creator/agents/comparator.md",
  "tools": ["read", "write"],
  "allowedTools": ["read", "write"],
  "toolsSettings": {
    "fs_write": {
      "allowedPaths": ["/tmp/**", "/private/tmp/**"]
    },
    "fs_read": {
      "deniedPaths": ["**/eval_metadata.json", "**/skill-snapshot/**"]
    }
  }
}

The headless worker: `llm-worker.json` #

{
  "name": "llm-worker",
  "description": "Minimal LLM agent for single-turn text generation. No tools, no skills, no MCP. Used by scripts that need raw model output without side effects.",
  "prompt": "You are a text generation assistant. Respond only with what is asked. Do not use tools.",
  "tools": [],
  "allowedTools": [],
  "resources": []
}

This replaces claude -p in the original scripts. Zero tools, zero skills — pure text generation for description optimization loops.

Key Design Decisions #

Decision 1: Subagents for judgment, shell for execution #

Subagents are in-process — fast, return structured data directly to the parent’s context. But test execution needs a completely separate agent config with different skills loaded. Shell subprocess is the only way to get a clean execution environment.

The rule: if the task needs the target’s own config, use shell. If it needs the parent’s context, use subagent.

Decision 2: The idempotency problem #

This one almost shipped broken — twice.

First realization: the baseline. The default Kiro agent auto-discovers ALL workspace skills via skill://.kiro/skills/*/SKILL.md. You can’t use it as a “without skill” baseline because it sees the skill you’re testing. We needed baseline.json — "tools": ["*"] with zero skill:// resources.

Second realization: the with_skill side. We initially ran with_skill tests by omitting --agent (using whatever default the user has). But users can configure a custom default agent in kiro-cli settings — different prompt, different tools, maybe fewer skills. That makes evals non-reproducible across machines.

The fix: both sides are explicit named agents:

skill-executor.json — all tools + skill://.kiro/skills/*/SKILL.md + global skills. Always discovers and loads workspace skills, regardless of user settings.
baseline.json — all tools + zero skill:// resources. Never sees skills.

{
  "name": "skill-executor",
  "description": "Full-capability agent with all workspace skills loaded. Used as the 'with_skill' executor in skill evaluations. Mirrors kiro_default behavior — same tools, same prompt style, same skill discovery — but is deterministic regardless of user's default agent settings.",
  "prompt": "You are the Kiro CLI agent...",
  "tools": ["*"],
  "allowedTools": ["*"],
  "resources": [
    "skill://.kiro/skills/*/SKILL.md",
    "skill://~/.kiro/skills/*/SKILL.md",
    "file://.kiro/steering/**/*.md"
  ]
}

Now evaluations are idempotent: same agents, same configs, same results — no matter who runs them or what their personal defaults are.

Decision 3: TODO templates beat implicit planning #

Claude Code’s parent agent figures out what to do from the skill instructions — it reads the methodology and improvises an execution plan. This works because Agent() is infinitely flexible.

Kiro’s meta-builder needs explicit step-by-step TODO templates with executor annotations. Each step specifies WHO executes it:

1.  [self]              Read skill, understand purpose
2.  [shell]             Snapshot skill to /tmp/
3.  [self]              Draft test prompts
4.  [user]              Confirm
5.  [shell]             kiro-cli chat --no-interactive --agent skill-executor --trust-all-tools
6.  [shell]             kiro-cli chat --no-interactive --agent baseline --trust-all-tools
7.  [subagent:grader]   Grade outputs → grading.json
8.  [subagent:comparator] Blind A/B → comparison.json
9.  [shell]             python -m scripts.validate_workspace (catch errors early)
10. [shell]             python -m scripts.aggregate_benchmark
11. [subagent:analyzer] Surface patterns → analyst_notes.json
12. [shell]             python generate_review.py --static
13. [user]              Review, provide feedback
14. [goto:5]            Iterate

Without these annotations, the meta-builder would misroute — we saw test execution go to the researcher subagent because it was the only one with broad tools. The TODO template makes routing deterministic.

The skill-comparator does blind A/B comparison — it must not know which output came from which variant. We enforce this with deniedPaths:

"fs_read": {
  "deniedPaths": ["**/eval_metadata.json", "**/skill-snapshot/**"]
}

The metadata file maps “Output A” → “with_skill” and “Output B” → “without_skill”. The comparator physically cannot read it. This is stronger than a prompt instruction — it’s a tool-level constraint.

Decision 5: Write access scoped to `/tmp/` #

All evaluation work happens in /tmp/<skill-name>-workspace/. Subagents can write results there but cannot touch the actual skill files in .kiro/. The meta-builder writes to .kiro/ only with user confirmation.

/tmp/<skill-name>-workspace/
├── skill-snapshot/          (copy of skill before changes)
├── iteration-1/
│   ├── <eval-name>/         (descriptive: "trending-ai", "pdf-extraction")
│   │   ├── eval_metadata.json   (prompt + assertions)
│   │   ├── with_skill/
│   │   │   ├── transcript.md
│   │   │   ├── outputs/
│   │   │   └── grading.json
│   │   ├── without_skill/
│   │   │   ├── transcript.md
│   │   │   ├── outputs/
│   │   │   └── grading.json
│   │   └── comparison.json  (blind A/B from comparator)
│   ├── benchmark.json       (generated by aggregate_benchmark.py)
│   ├── analyst_notes.json   (generated by skill-analyzer)
│   └── report.html          (generated by generate_review.py)
└── iteration-2/
    └── ...

Decision 6: Scripts ported from `claude -p` to `kiro-cli chat --no-interactive` #

The original run_eval.py uses claude -p for headless LLM calls. Our port uses kiro-cli chat --no-interactive --trust-all-tools. The trigger detection creates a temporary skill in .kiro/skills/, runs a query, and checks if the output contains a marker string — proving the skill was auto-discovered and loaded:

def run_single_query(query, skill_name, skill_description,
                     timeout, project_root, model=None, agent=None):
    unique_id = uuid.uuid4().hex[:8]
    clean_name = f"{skill_name}-eval-{unique_id}"
    skill_dir = Path(project_root) / ".kiro" / "skills" / clean_name
    skill_file = skill_dir / "SKILL.md"

    skill_dir.mkdir(parents=True, exist_ok=True)
    skill_content = (
        f"---\n"
        f"name: {clean_name}\n"
        f"description: {skill_description}\n"
        f"---\n\n"
        f"If this skill was loaded, respond with exactly: SKILL_TRIGGERED_{unique_id}\n"
    )
    skill_file.write_text(skill_content)

    cmd = [
        "kiro-cli", "chat",
        "--no-interactive",
        "--trust-all-tools",
        "--agent", agent or "skill-trigger-tester",
    ]
    if model:
        cmd.extend(["--model", model])
    cmd.append(query)

    result = subprocess.run(cmd, capture_output=True, text=True,
                            cwd=project_root, timeout=timeout)
    output = result.stdout + result.stderr
    return f"SKILL_TRIGGERED_{unique_id}" in output

The Mapping Table #

For anyone doing a similar port, here’s the complete translation:

Claude Code	Kiro CLI	Why different
`Agent(prompt="...")`	`subagent → named-agent`	Kiro requires pre-registered agents with JSON config
`Agent()` for test execution	`shell: kiro-cli chat --no-interactive --agent skill-executor`	Target needs its own config/tools/skills; must be idempotent
`agents/grader.md` read at runtime	`skill-grader.json` with `"prompt": "file://../skills/skill-creator/agents/grader.md"`	.md is shared, JSON is the identity layer
`TaskCreate/TaskUpdate`	`todo_list` tool	Same concept, different implementation
`claude -p` in scripts	`kiro-cli chat --no-interactive --agent llm-worker`	Zero-tool agent for pure text generation
Implicit routing (parent decides)	Description-driven + TODO annotations	Kiro needs descriptions; we augment with deterministic TODO
No baseline needed (`Agent()` with custom prompt = no skill)	Dedicated `baseline` + `skill-executor` agents	Both sides must be pinned for idempotent evaluation

Results #

First full evaluation run on a youtube-trending skill — 6 parallel test runs:

3 with-skill runs via --agent skill-executor (default-like agent with skill discovery)
3 baseline runs via --agent baseline (same tools, zero skills)
Automated grading via skill-grader subagent
Benchmark: 100% pass rate with skill vs 13-20% without (+80-87% delta)
Full HTML report generated via generate_review.py --static

The pipeline produces a clear discriminating signal: with the skill, the agent runs the bundled Python script and produces structured data tables. Without it, the agent does web searches and returns general commentary — useful but not structured.

The first run exposed two pipeline bugs (documented in Lessons Learned below) that we fixed before getting clean end-to-end execution. The architecture works, but the orchestrator prompt required significant iteration to prevent shortcutting behavior.

Lessons Learned #

1. Description is the routing table. In Kiro CLI, if your subagent’s description doesn’t clearly say when to invoke it, the parent will misroute. We learned this when test execution went to researcher instead of shell. The fix: descriptions that state trigger conditions, not just capabilities.

Bad: "AWS helper agent" Good: "AWS fact-checker. Invoke when blog content makes claims about AWS service features, limits, pricing, or configurations that need verification against official docs."

2. TODO templates beat implicit planning. Claude Code’s parent improvises from instructions. Kiro’s meta-builder needs explicit executor annotations ([shell], [subagent:name], [self], [user]) to stay deterministic. The TODO list isn’t just progress tracking — it’s the execution plan.

3. Idempotency requires pinning both sides. We almost shipped with the default agent as baseline before discovering it auto-discovers all workspace skills. Then we almost shipped with --agent omitted for with_skill runs before realizing users can configure custom defaults. The fix: both skill-executor and baseline are explicit named agents — same config every time, regardless of user settings. Reproducibility comes from pinning the entire evaluation environment, not just the control group.

4. Subagents can’t write without explicit permission. First run, the grader returned grading data in its Summary but couldn’t write grading.json. Adding write tool + /tmp/** in allowedPaths fixed it. Default is deny-all.

5. The .md files are portable. The grader, comparator, and analyzer prompts from Claude Code’s skill-creator work unchanged in Kiro CLI. Only the identity layer (JSON config) needed to be created. The prompts are the reusable asset; the orchestration is what changes.

6. The orchestrator will shortcut if you let it. Our first real evaluation run failed because the meta-builder wrote an inline grading script instead of delegating to skill-grader. It “saved time” by skipping the subagent call, but produced grading.json in the wrong format, which broke aggregate_benchmark.py, which caused it to hand-write benchmark.json — cascading through the entire pipeline. The fix: hard rules in the prompt (“NEVER grade outputs yourself”) plus a validate_workspace.py script that catches structural errors before aggregation. The agent can’t shortcut past a failing validation check.

7. Shell permission parsing is sophisticated. Compound commands (cd X && kiro-cli chat ...) get split by tree-sitter and each sub-command checked independently. This matters for allowedCommands regex design — anchor with ^ and $.

What’s Next #

The architecture is working but there’s more to build:

Parallel eval runs — Currently sequential; the shell tool blocks. A wrapper script that spawns background processes and polls for completion would cut eval time significantly.
Cross-skill benchmarking — Run the same eval suite across multiple skills to find interaction effects.
Description optimization at scale — The run_loop.py script iterates on descriptions automatically. We’ve ported it but haven’t stress-tested at scale yet.
Prompt hardening — The orchestrator still occasionally attempts shortcuts despite hard rules. We’re exploring whether a pre-tool-use hook that rejects inline grading attempts would be more reliable than prompt instructions alone.

The broader lesson: porting between AI agent frameworks isn’t about syntax translation. It’s about understanding the execution model — who decides what, when, and with what information. Claude Code’s model is “one smart agent with infinite flexibility.” Kiro CLI’s model is “many focused agents with explicit contracts.” Neither is wrong, but they demand different architectures for the same problem. And in the named-agent world, idempotency must be designed in — you can’t assume the default agent is what you think it is.

Try It: Building a Skill with Meta-Builder #

If you want to use this architecture in your own project, here’s the quickstart:

Prerequisites #

# Enable experimental features needed by the pipeline
kiro-cli settings chat.enableTodoList true
kiro-cli settings chat.enableDelegate true

Step 1: Copy the agents and skill into your project #

# Clone or copy these into your .kiro/ directory:
.kiro/agents/meta-builder.json
.kiro/agents/skill-executor.json
.kiro/agents/baseline.json
.kiro/agents/skill-grader.json
.kiro/agents/skill-comparator.json
.kiro/agents/skill-analyzer.json
.kiro/agents/llm-worker.json
.kiro/agents/skill-trigger-tester.json
.kiro/prompts/meta-builder.md
.kiro/skills/skill-creator/          # The full skill-creator skill

Step 2: Start a session with meta-builder #

kiro-cli chat --agent meta-builder

Step 3: Tell it what you want #

> I want to build a skill that searches YouTube and returns a markdown table with video stats

The meta-builder will:

Interview you (max 3 questions per turn)
Draft the SKILL.md and test cases
Ask for confirmation
Run evaluations: --agent skill-executor (with skill) vs --agent baseline (without)
Grade via skill-grader subagent
Run blind comparison via skill-comparator
Aggregate benchmarks and generate an HTML report
Present results and iterate based on your feedback

What you’ll see #

The pipeline produces a /tmp/<skill-name>-workspace/ directory with:

Transcripts from both variants
Grading JSON with pass/fail + evidence per assertion
A benchmark showing the delta (e.g., +80% pass rate with skill)
A standalone HTML report for reviewing outputs qualitatively

Tips #

Be specific in your intent — “a skill that does X, outputs Y format, triggered when user says Z” gives better results than vague requests
Review the test prompts — the meta-builder drafts them, but you know your users better
The first iteration is rarely final — expect 2-3 rounds of eval → feedback → improve
Check the HTML report — it shows side-by-side outputs so you can judge quality, not just pass rates
Environment variables — if your skill needs API keys, export them before starting the session so eval runs can access them

References #

The Complete Guide to Building Skills for Claude (PDF) — Anthropic’s official methodology for skill design and evaluation
skill-creator source (GitHub) — The original Claude Code skill we ported from
Kiro CLI documentation — Agent configuration, subagents, and hooks reference

This post was co-authored by Mladen Trampic and Kiro, demonstrating the collaborative approach to technical content creation.

From Claude Code to Kiro CLI: Porting a Multi-Agent Skill Builder

The Port That Wasn’t a Port #

How Skill-Creator Works in Claude Code #

Why Kiro CLI Is Different #

The routing problem #

What can’t be a subagent #

The Kiro CLI Architecture #

The orchestrator: `meta-builder.json` #

The execution pair: `skill-executor.json` and `baseline.json` #

The judgment agents #

The headless worker: `llm-worker.json` #

Key Design Decisions #

Decision 1: Subagents for judgment, shell for execution #

Decision 2: The idempotency problem #

Decision 3: TODO templates beat implicit planning #

Decision 4: Blind comparator isolation #

Decision 5: Write access scoped to `/tmp/` #

Decision 6: Scripts ported from `claude -p` to `kiro-cli chat --no-interactive` #

The Mapping Table #

Results #

Lessons Learned #

What’s Next #

Try It: Building a Skill with Meta-Builder #

Prerequisites #

Step 1: Copy the agents and skill into your project #

Step 2: Start a session with meta-builder #

Step 3: Tell it what you want #

What you’ll see #

Tips #

References #

Authors

Mladen Trampic

Kiro

The Port That Wasn’t a Port #

How Skill-Creator Works in Claude Code #

Why Kiro CLI Is Different #

The routing problem #

What can’t be a subagent #

The Kiro CLI Architecture #

The orchestrator: meta-builder.json #

The execution pair: skill-executor.json and baseline.json #

The judgment agents #

The headless worker: llm-worker.json #

Key Design Decisions #

Decision 1: Subagents for judgment, shell for execution #

Decision 2: The idempotency problem #

Decision 3: TODO templates beat implicit planning #

Decision 4: Blind comparator isolation #

Decision 5: Write access scoped to /tmp/ #

Decision 6: Scripts ported from claude -p to kiro-cli chat --no-interactive #

The Mapping Table #

Results #

Lessons Learned #

What’s Next #

Try It: Building a Skill with Meta-Builder #

Prerequisites #

Step 1: Copy the agents and skill into your project #

Step 2: Start a session with meta-builder #

Step 3: Tell it what you want #

What you’ll see #

Tips #

References #

Authors

Mladen Trampic

Kiro

The orchestrator: `meta-builder.json` #

The execution pair: `skill-executor.json` and `baseline.json` #

The headless worker: `llm-worker.json` #

Decision 5: Write access scoped to `/tmp/` #

Decision 6: Scripts ported from `claude -p` to `kiro-cli chat --no-interactive` #