The content-ops skill that refuses to draft

The brief at the start of the hackathon was the kind that makes everyone in the room nod: build a skill that turns customer feedback into marketing posts. We had Slack channels full of call summaries, call recordings sitting in a meeting-transcript tool, a competitor-blog list, and a marketing site with thirty published posts in the house style. The shape of the demo was obvious — read the signals, pick the topic, write the draft, push it to the CMS. End-to-end.

We almost shipped that. About four hours in, with the pipeline drafted and the first generated post already on screen, somebody asked the question that mattered: if this runs on its own, what's the worst post it could publish under our name? The answer wasn't pretty. The post we were looking at was fluent and badly off-brand, technically about a real customer concern but landing on an angle the sales team was actively defending against. The model didn't know that. The model can't know that.

So we deliberately broke the pipeline in two places.

The result is a workflow that, on the surface, looks slower and less impressive than the end-to-end version. The skill cannot create a draft until a human approves the topic. It cannot mark a topic approved until a separate brand and voice check has passed. Two gates, both blocking, both wired into the contract of the skill itself rather than left as polite suggestions.

This post is why those two gates ended up being the most interesting thing we built. Most AI-content tutorials I read after the hackathon were variations of connect source, prompt model, publish. The interesting question isn't whether you can generate a post. It's where you put the friction so the post is worth publishing.

Three steps, two gates

The skill is organized as three explicit phases — Research, Proposal, Drafting — and the gates sit between them.

Research. Pull from active sources for a cycle (Slack customer channels, call transcripts when available, competitor blogs, latest thirty published posts for style calibration). Cluster the signals. Rank by recurrence and commercial relevance. Write the output to a dated file.
Proposal. Present topics — both the ones the skill discovered and the ones the team suggested — with a one-line rationale for why each is timely. The team picks. Each topic gets a status: proposed, approved_for_draft, or rejected.
Drafting. Only for approved_for_draft topics, and only after a brand/voice check pass. Output goes to a local markdown file and, optionally, a draft in the CMS.

The gate between Research and Drafting is a human decision. The gate between Proposal and Drafting is a checklist against two documents — branding.md and voice.md — that the team owns. Neither gate can be skipped by the skill, and there is no "trust me" escape hatch.

This sounds like overhead. In practice, it is what makes the rest of the workflow safe to automate.

Why the gates matter more than the pipeline

If you let a model run end-to-end on customer signals, you get fluent posts very quickly. You also get the class of failures we'd been about to ship at hour four:

The post is technically about a real customer concern, but the angle the model picked is one your sales team would never use because it concedes an objection you're actively defending.
The tone is fine on a per-paragraph basis and badly off-brand on a per-page basis — too tentative, too marketing-flavored, two registers above where the brand book lives.
The evidence is real but ungeneralizable — a single call with a single prospect now presented as "customers are asking us".
The topic is on-brand but lands in the same week as a competitor's near-identical post, which is the kind of thing you notice only if someone in the team reads the competitor blogs every week.

None of these are model failures. They are judgement failures, and the judgement they require is the kind that lives with the team, not with the model. The cheapest way to keep that judgement in the loop is to make the skill physically unable to proceed without it. So we did.

The first gate — topic approval — is where commercial relevance is checked. A clustering algorithm and a relevance score can rank topics, but they cannot tell you that the deal you're trying to close this quarter would be helped or hurt by writing publicly about a specific objection. That call belongs to the team. So the skill stops, lists the topics with a why_now_one_liner and the source attribution, and waits.

The second gate — brand and voice — is where the house style is checked before any drafting starts. I learned the hard way (at hour four, looking at the off-brand draft we almost shipped) that catching a tone problem after the model has produced 800 polished words is more expensive than catching it before. The model anchors hard on its own draft; the team anchors hard on what the model already wrote. Both bias the review toward keeping bad copy.

Sources are not equal, and the skill knows it

The gates only work if the thing being gated is worth gating on. So the other small piece worth writing down is the source-tracker — a tiny markdown file that lists every signal source (Slack channels, the call-transcript API, competitor blogs, the company's own published-post feed) with an explicit type, and a priority rule attached:

priority_score = 3 when a topic shows up in both Slack customer channels and a competitor blog in the same window.
priority_score = 2 when it appears only in Slack.
priority_score = 1 when it appears only in competitor blogs.
The company's own published posts do not raise priority at all. They calibrate style, not topics.

The last bullet took the longest to argue for and matters the most. The natural instinct is to treat your own published archive as a topic source — "we wrote about X, let's write more about X". That's how you end up with five posts about your favorite topic and silence on the ones the market is asking about. Style and topic come from different places. The skill enforces the separation by giving them different source types and different roles in the prioritization rule.

Similarly, when call transcripts are available, they don't change the numeric priority score either — but the skill is told to prefer them as evidence over Slack summaries when both exist. A summary tells you a topic came up. A transcript tells you how the customer framed it, which is the difference between writing about "data quality" and writing about the specific objection a customer raised in their own words.

What I still don't have good answers for

A hackathon ends with a contract, not with proof. A few things I'd want to see before claiming this works:

Are the gates as cheap as they look? Two gates per cycle is fine when there are five topics. At thirty, the approval step becomes a bottleneck. I don't have a good answer for where the right batching is.
What happens when the brand book changes? The voice check is a checklist run against a static file. When the brand evolves, every approved-but-unpublished post is silently out of date. There's no mechanism in the skill to detect that yet.
Is the priority formula right? A topic showing up in both Slack and a competitor blog scoring higher than one in Slack alone is intuitive, but I haven't tested whether it correlates with posts that actually move the pipeline. Closing that loop — what got published, what worked, how the next cycle should re-rank — is a different post, and the part of the workflow I'd bet against until it has survived a quarter.

If you take one thing

The interesting design decisions in an AI content workflow aren't which model you call. They're where you refuse to call one, and what you force the next call to read before it starts. The two gates we built sit on the front end — topic approval and a brand/voice check — and they cost almost nothing once written into the skill's contract. Without them, the workflow is a draft generator. With them, it's a content operation that happens to be partly automated, and one the team is willing to put its name on. The second thing is the one worth shipping.