The hackathon brief: build a skill that turns customer feedback into marketing posts. We had Slack channels full of call summaries, call recordings in a transcript tool, a competitor-blog list, and thirty published posts in the house style. The obvious demo was end-to-end — read the signals, pick the topic, write the draft, push it to the CMS.

Four hours in, with the first generated post on screen, someone asked what the worst post this thing could publish under our name would be. The one we were looking at was a fair candidate: fluent, about a real customer concern, and built on an angle the sales team was actively defending against in an open deal. The model had no way to know that.

So we broke the pipeline in two places. The skill cannot draft until a human approves the topic, and cannot approve a topic until a brand and voice check passes. Two blocking gates, written into the skill's contract rather than left as suggestions.

Three steps, two gates

  1. Research. Pull from the active sources for the cycle (Slack customer channels, call transcripts when available, competitor blogs, the latest thirty published posts for style calibration). Cluster the signals, rank by recurrence and commercial relevance, write the output to a dated file.
  2. Proposal. Present topics — discovered and team-suggested — each with a one-line rationale for why now. The team picks. Each topic gets a status: proposed, approved_for_draft, or rejected.
  3. Drafting. Only for approved_for_draft topics, and only after the brand/voice check. Output goes to a markdown file and optionally a CMS draft.

The first gate is a human decision. The second is a checklist against two documents the team owns, branding.md and voice.md. Neither can be skipped by the skill, and there is no "trust me" escape hatch.

Why the gates matter more than the pipeline

A model running end-to-end on customer signals produces fluent posts very quickly, along with the class of failures we'd been about to ship:

  • The angle concedes an objection the sales team is actively defending against.
  • The tone is fine per paragraph and off-brand per page: too tentative, two registers above the brand book.
  • The evidence is real but ungeneralizable — one call with one prospect, presented as "customers are asking us".
  • The topic lands the same week as a competitor's near-identical post, which you only catch if someone reads the competitor blogs weekly.

None of these are model failures. They're judgement calls, and the judgement lives with the team. The cheapest way to keep it in the loop is to make the skill unable to proceed without it.

The first gate checks commercial relevance. A clustering pass can rank topics; it can't tell you that the deal you're closing this quarter would be hurt by writing publicly about a specific objection. So the skill stops, lists topics with a why_now_one_liner and source attribution, and waits.

The second gate checks house style before drafting starts, because catching a tone problem after the model has produced 800 polished words is expensive. The model anchors on its own draft, the team anchors on what's already written, and both biases push the review toward keeping bad copy.

Sources are not equal, and the skill knows it

The other piece worth writing down is the source tracker: a small markdown file listing every signal source with an explicit type and a priority rule attached:

  • priority_score = 3 when a topic shows up in both Slack customer channels and a competitor blog in the same window.
  • priority_score = 2 when it appears only in Slack.
  • priority_score = 1 when it appears only in competitor blogs.
  • The company's own published posts do not raise priority at all. They calibrate style, not topics.

The last rule took the longest to argue for. The natural instinct is to treat your own archive as a topic source — "we wrote about X, let's write more about X" — and that's how you end up with five posts about your favourite topic and silence on what the market is asking. Style and topics come from different places, so the skill gives them different source types and different roles in the ranking.

Call transcripts don't change the numeric score either, but the skill prefers them as evidence over Slack summaries when both exist. A summary tells you a topic came up; a transcript tells you how the customer framed it, which is the difference between writing about "data quality" and writing about the objection in the customer's own words.

What I still don't have good answers for

A hackathon ends with a contract, not with proof. Before claiming this works:

  • Are the gates as cheap as they look? Two gates per cycle is fine with five topics. At thirty, approval becomes a bottleneck, and I don't know where the right batching is.
  • What happens when the brand book changes? The voice check runs against a static file. When the brand evolves, every approved-but-unpublished post is silently out of date, and nothing detects that yet.
  • Is the priority formula right? Slack-plus-competitor outranking Slack-alone is intuitive, but I haven't tested whether it correlates with posts that move the pipeline. Closing that loop is a different post.

The interesting decisions in an AI content workflow aren't which model you call; they're where you refuse to call one. The gates cost almost nothing once they're in the skill's contract, and they're the difference between a draft generator and a content operation the team is willing to put its name on.