Ship two

May 4, 2026 · AI, Career

The second ship is always harder to explain than the first.

Obaron’s Docs Readiness Audit v5.0 went live today — nine categories, a methodology page that actually says something, a changelog, and a Docs Validation Study first run on the calendar for May 11. The first release got the rubric built and scoring. This one I’ll actually remember.

Between the two ships, I had to write the methodology page.

Not documentation — an honest accounting of what the audit claims to measure, what’s provisional, and where the assumptions haven’t been tested yet. The page is the trust surface: if it’s wrong, the product is wrong. So the answers had to exist before the prose could.

Getting there meant twenty-five architect plans and five days on a single category change.

Cat 7b used to measure text-to-HTML ratio — how much of a page’s raw payload is extractable content versus JavaScript and chrome. It looked like the right question. Then I ran it against real sites: pages with genuinely good documentation, penalized because they were stylistically heavy (posthog.com/docs, I’m looking at you and your delightful yet oh so heavy CSS :)). The code was right. The question was wrong.

The new finding fires when visible text in raw HTML drops below an absolute floor — the actual signal that AI consumers can’t find what’s there, not a proxy for how much chrome surrounds it. Eight words of diff in a category description. Five days to get the question right.

Code is the easy part now. Claude writes it faster than I can review it. Figuring out what to measure — catching yourself answering the wrong question before it ships — that’s still the hard work, and it doesn’t compress.

The methodology page names what it doesn’t know. Cat 7a and 7b weights are explicitly provisional, pending Docs Validation Study results. The Score tells you what the rubric found; the Story prose says what to do about it. The Trust snapshot section leads with what the Score does not predict — citation frequency, traffic, per-engine variance. The hard cap conditions are inlined so you can see exactly when a score gets floored and why.

I’m not building Obaron to look like a giant enterprise with all the answers. I’m building it one honest step at a time, in public, and the step I’m most interested in is finding out where the rubric is wrong.

Publishing the limits isn’t weakness. A measurement framework that never tests its assumptions is just positioning.

The Docs Validation Study first run is May 11. Frozen cohort of DevTools docs sites. Canonical questions authored before scoring, not after. Two site-side metrics — fetcher success rate and content-recovery rate — both reproducible, both independent of any AI engine’s hidden behavior. Protocol publishes before the run. Results publish after.

The build is done.

Now we find out.