Your Skill Files Are Written for You (Not the AI)

Most of what’s in your AI skill files isn’t for the AI.

That clicked for me sometime last year, and I’ve been kicking myself about it since. I’d built out a whole library of skills for my Claude agents — organized, well-documented, working great. Pop open any SKILL.md and you knew exactly what was going on: every step had a reason, every decision had a little “here’s why” paragraph. It read like something I might hand a new hire on their first day.

Which was sort of the problem.

Skills aren’t documentation. They’re instructions for an AI to execute on, and an AI doesn’t need the same hand-holding a human does. Transition phrases don’t help it. Explanatory asides don’t help it. The “in other words, what this means is…” paragraph that makes total sense to you? Pure overhead. Every one of those tokens costs money if you’re on an API plan, or eats into your context window if you’re on a subscription. Different accounting, same result.

Compressing My Skill Files

I grabbed one of my most-used skill files and did a proper before-and-after. Stripped the prose explanations. Killed the transitional language. Collapsed everything into terse, YAML-formatted steps — triggers, constraints, nothing else. Kept everything load-bearing; cut everything that was there for my own comfort.

Roughly 35 percent fewer input tokens. On something I run constantly.

So naturally I went back in for round two. If 35 percent is good, 60 must be better.

DON’T DO IT.

(Or do it, test it, find out the hard way like I did. Either way.)

When you compress skill instructions past a certain point, they’re technically all present but practically ambiguous. And when a large language model — emphasis on those two middle words, because it matters here — hits something it can’t cleanly parse, it doesn’t fail gracefully. It starts reasoning through what you probably meant. That process burns what are called runtime tokens: the tokens the model spends during inference working out the ambiguity you introduced. My sleek, optimized skill wound up costing 15-20 percent more on the output side. Which more than erased what I’d saved.

There’s published research on exactly this tradeoff, and the findings are a little alarming. Aggressive compression — around 70 percent reduction — produced something like 2,000 percent output token expansion in one model tested. Not a typo. A different model held completely stable under the same conditions. Researchers have achieved 20x compression on certain task types with only 1.5 percent accuracy loss, but that math doesn’t hold for skill files, which fail differently. The tradeoff is real, and it swings hard depending on the model.

That’s why I stopped pushing past 30-45 percent. It’s where I stopped breaking things.

The things worth cutting:

  1. Prose explanations of why — the model infers rationale better than you’d think
  2. Transition language: “additionally,” “it’s important to note that,” “furthermore” (if “furthermore” is in your skill file, I have questions)
  3. Non-critical examples — one tight, pattern-specific example beats three illustrative ones
  4. Articles and passive voice where meaning holds without them

Constraints stay in plain sentences. That’s the one place you actively cannot afford ambiguity — it’s exactly where you don’t want the model filling in blanks on its own.

When Skill Files Compress Well (And When They Don’t)

Not all skill files compress equally, and that’s the part most people find out too late. A skill that orchestrates multi-step workflows with branching logic is a bad candidate for aggressive compression — the more decision points there are, the more the model needs clear signposts to land on. Something linear like “take input, do X, return Y” can handle a lot more stripping. That’s where I’d start before touching anything complicated.

Only the outputs humans actually see need to be human-readable. Everything under the hood is up for grabs.

The result is going to look strange. My optimized skill files are, as I once described them to a friend, dang near a foreign language. But the AI reads them fine — maybe better. Only the outputs humans actually see need to be human-readable. Everything under the hood is up for grabs.

If you want to hand the compression off to your agent — which, yes, I know how that sounds — here’s a prompt that’s produced clean output on the first pass:

I have a Claude skill file written in natural language prose.
Compress it to 150–250 tokens while preserving:
1. Core role/identity (1 sentence max)
2. Critical constraints and safety rules
3. Workflow steps (3–5 steps, ultra-terse)
4. Trigger phrases (for the description field)

Use YAML frontmatter for metadata. Strip:
- Pleasantries, preamble, filler
- Examples unless pattern-critical
- Explanations of why
- Articles and passive voice where meaning holds

Format:
---
name: <name>
description: <what it does + trigger phrases, 1 sentence>
---

<workflow steps>
<constraints>

Show token count before and after.

The output is readable enough to tell what’s going on. Just barely. (heh)

One caveat: if you need to be able to open any file in your project and understand it cold, keep a human-readable version around and run the compressed one in production. A skill file that breaks and that you can’t debug isn’t an efficiency win, no matter how few tokens it uses.*

But if you’ve been running workflows written like onboarding manuals and wondering why your token budget keeps climbing — now you know.


*Three weeks after I compressed one of my most-used skills, I needed to update it and spent about 20 minutes reverse-engineering my own work. The original now lives next to it with “REFERENCE ONLY” in the filename. Learn from my mistake.