The brief at the start of the hackathon was the kind that makes everyone in the room nod: build a skill that turns customer feedback into marketing posts. We had Slack channels full of call summaries, call recordings sitting in a meeting-transcript tool, a competitor-blog list, and a marketing site with thirty published posts in the house style. The shape of the demo was obvious — read the signals, pick the topic, write the draft, push it to the CMS. End-to-end.

We almost shipped that. Then we deliberately broke it in two places.

The result is a workflow that, on the surface, looks slower and less impressive than the end-to-end version. The skill cannot create a draft until a human approves the topic. It cannot mark a topic approved until a separate brand and voice check has passed. Two gates, both blocking, both wired into the contract of the skill itself rather than left as polite suggestions.

I want to write down why those gates ended up being the most interesting thing we built, because the field-notes on this don't really exist yet. Most AI-content tutorials I read after the hackathon were variations of connect source, prompt model, publish. The interesting question isn't whether you can generate a post. It's where you put the friction so the post is worth publishing.

Three steps, two gates

The skill is organized as three explicit phases — Research, Proposal, Drafting — and the gates sit between them.

  1. Research. Pull from active sources for a cycle (Slack customer channels, call transcripts when available, competitor blogs, latest thirty published posts for style calibration). Cluster the signals. Rank by recurrence and commercial relevance. Write the output to a dated file.
  2. Proposal. Present topics — both the ones the skill discovered and the ones the team suggested — with a one-line rationale for why each is timely. The team picks. Each topic gets a status: proposed, approved_for_draft, or rejected.
  3. Drafting. Only for approved_for_draft topics, and only after a brand/voice check pass. Output goes to a local markdown file and, optionally, a draft in the CMS.

The gate between Research and Drafting is a human decision. The gate between Proposal and Drafting is a checklist against two documents — branding.md and voice.md — that the team owns. Neither gate can be skipped by the skill, and there is no "trust me" escape hatch.

This sounds like overhead. In practice, it is what makes the rest of the workflow safe to automate.

Why the gates matter more than the pipeline

If you let a model run end-to-end on customer signals, you get fluent posts very quickly. You also get a class of failures that are hard to catch in review:

  • The post is technically about a real customer concern, but the angle the model picked is one your sales team would never use because it concedes an objection you're actively defending.
  • The tone is fine on a per-paragraph basis and badly off-brand on a per-page basis — too tentative, too marketing-flavored, two registers above where the brand book lives.
  • The evidence is real but ungeneralizable — a single call with a single prospect now presented as "customers are asking us".
  • The topic is on-brand but lands in the same week as a competitor's near-identical post, which is the kind of thing you notice only if someone in the team reads the competitor blogs every week.

None of these are model failures. They are judgement failures, and the judgement they require is the kind that lives with the team, not with the model. The cheapest way to keep that judgement in the loop is to make the skill physically unable to proceed without it. So we did.

The first gate — topic approval — is where commercial relevance is checked. A clustering algorithm and a relevance score can rank topics, but they cannot tell you that the deal you're trying to close this quarter would be helped or hurt by writing publicly about a specific objection. That call belongs to the team. So the skill stops, lists the topics with a why_now_one_liner and the source attribution, and waits.

The second gate — brand and voice — is where the house style is checked before any drafting starts. We learned the hard way that catching a tone problem after the model has produced 800 polished words is more expensive than catching it before. The model anchors hard on its own draft; the team anchors hard on what the model already wrote. Both bias the review toward keeping bad copy.

Sources are not equal, and the skill knows it

The other small piece I keep coming back to is the source-tracker. It's a tiny markdown file that lists every signal source — Slack channels, the call-transcript API, competitor blogs, the company's own published-post feed — with an explicit type, and a priority rule attached:

  • priority_score = 3 when a topic shows up in both Slack customer channels and a competitor blog in the same window.
  • priority_score = 2 when it appears only in Slack.
  • priority_score = 1 when it appears only in competitor blogs.
  • The company's own published posts do not raise priority at all. They calibrate style, not topics.

The last bullet is the one that took the longest to argue for and matters the most. The natural instinct is to treat your own published archive as a topic source — "we wrote about X, let's write more about X". That's how you end up with five posts about your favorite topic and silence on the ones the market is asking about. Style and topic come from different places. The skill enforces the separation by giving them different source types and different roles in the prioritization rule.

Similarly, when call transcripts are available, they don't change the numeric priority score either — but the skill is told to prefer them as evidence over Slack summaries when both exist. Transcripts and summaries are not interchangeable. A summary tells you a topic came up. A transcript tells you how the customer framed it, which is the difference between writing about "data quality" and writing about the specific objection a customer raised in their own words.

What the output looks like

After a cycle, the workflow leaves a trail that a new team member can read cold:

  • A dated research file with ranked topic clusters, source attribution per topic, and a one-paragraph style summary lifted from the last thirty published posts.
  • A topic proposal file with each topic's status, why_now_one_liner, the exact source IDs that must be cited when drafting, and an optional anonymized quote candidate.
  • For each approved_for_draft topic, a markdown article in posts/YYYY-MM-DD/ and, when requested, a draft in the CMS.
  • An impact-memory.md file that records the cycle's hypothesis and, later, the post's performance — so the next cycle can reason about what's working.

None of these files require running the skill to read. That's deliberate. The workflow is a paper trail before it is a tool.

The feedback loop that's supposed to close it

The gates decide what gets written. Something else has to decide whether what got written was worth writing. Otherwise the workflow drifts — the team's intuition about "timely topics" calcifies into whatever happened to feel timely six months ago, and nobody notices.

The piece of the skill that's supposed to push back on that is a file called impact-memory.md. It records, per published post, a small set of numbers: impressions, clicks, CTR, engagement rate, leads, meetings booked, opportunities influenced, and two human scores — an ICP fit score and a message clarity score, both one to five. Snapshots get taken at day three, day seven, and day thirty. The schema is deliberately narrow. Anything you can't fill in within five minutes of opening LinkedIn and the CRM doesn't belong in it.

What makes the file load-bearing rather than ornamental is that the skill is required to consult it. The learning rules are written into the workflow contract:

  • If two posts in a topic cluster exceed median CTR and produce leads, prioritize that cluster in the next cycle's research.
  • If two posts in a row in one cluster sit below median CTR with no lead impact, de-prioritize that cluster for one cycle.
  • If one channel consistently outperforms for a given persona, prioritize that channel for that persona.
  • Every new research cycle must reference at least one learning from this file. Not optional, not implicit.

That last rule is the one that ties the loop shut. The Research step can't quietly ignore the previous cycle's results, because the output template demands a line that names what was learned and how it shaped the new ranking. If nothing was learned because no numbers were filled in, the skill has to say so — which makes the gap visible to the team instead of decaying silently.

The downstream effect on topic selection is straightforward. The priority score from the source-tracker (three for both-source overlap, two for Slack-only, one for blog-only) sets the initial ranking, but it can be overridden by the impact memory. A cluster that's been outperforming gets pulled up. A cluster that's been quietly missing gets pushed down for a cycle. The system has opinions and the opinions have to be earned with data, not borrowed from the loudest person in the room.

I am keeping my expectations honest about this. Every content-ops setup I've seen has tried to wire this loop, and in most of them it's the first thing to break — usually within three weeks of launch, usually because the person responsible for pasting the LinkedIn numbers into the file gets sick once and the habit never recovers. The schema being narrow helps. The skill refusing to start a new cycle without a learning reference helps more. But this is the part of the workflow I'd bet against, not for, until it has survived a quarter.

What I still don't have good answers for

A hackathon ends with a contract, not with proof. A few things I'd want to see before claiming this works:

  • Are the gates as cheap as they look? Two gates per cycle is fine when there are five topics. At thirty, the approval step becomes a bottleneck. I don't have a good answer for where the right batching is.
  • What happens when the brand book changes? The voice check is a checklist run against a static file. When the brand evolves, every approved-but-unpublished post is silently out of date. There's no mechanism in the skill to detect that yet.
  • Is the priority formula right? A topic showing up in both Slack and a competitor blog scoring higher than one in Slack alone is intuitive, but I haven't tested whether it correlates with posts that actually move the pipeline. That's a question impact-memory should answer in a few cycles.

If you take one thing

The interesting design decisions in an AI content workflow aren't which model you call. They're where you refuse to call one, and what you force the next call to read before it starts. The two gates we built sit on the front end — topic approval and a brand/voice check — and the impact-memory loop sits on the back end, feeding the next cycle's ranking. None of these are expensive once they're written into the skill's contract. Without them, the workflow is a draft generator. With them, it's a content operation that happens to be partly automated, and that can argue with itself about what to write next. The second thing is the one worth shipping.