Voice-over roulette
I built a slot machine for one-take readings. You ask it for a poem and it gives you one — random pick, you don’t know which until it’s on the screen. Then you read it. One take, no edits. If you flub a line, you flub it, and that’s the take that ships.
It generates the rest of the video locally on a Mac Mini. Whisper transcribes the audio. A patchwork of emojis pulses behind your voice, each tied to specific words you say. Colored wisps respond to the loudness of your read. A reactive shader renders out of a Remotion pipeline duct-taped to a Python audio loop. None of it touches the cloud.
The visual treatments come from somewhere specific. I came up in the Flash era, when the best motion graphics on the internet were people like Joshua Davis and Yugo Nakamura — vectors driven by code, parametric beauty, the math was the composition. Before that, I’d sit at a Windows desktop watching the Media Player visualizer go absolutely wild during MP3s. Reactive geometry tied to audio is a deeply-set aesthetic for me. I just hadn’t built any of it in twenty years.
What’s funny is how little code I wrote this time. I directed it. I told the agent what I wanted — the chord scene needs more randomization, the wisps should pulse with more dynamic range, the emoji grid shouldn’t have empty cells — and the agent wrote it. I read the code occasionally, mostly to confirm a constraint was respected. The work I did was creative direction and review. The work the agent did was implementation. Both of those are real work; neither used to be possible at this speed.
I built it because everything else I’m shipping right now — Obaron, Cocoajam, client projects — has a roadmap and a customer. I wanted one thing that didn’t. One thing where the only constraint was you have to take the read it gives you.
The first set was modernist poetry, five pieces in cool palettes — Stevens, H.D., Millay, Teasdale, Yeats. Each one got its own ice blue, mint, slate, sage, aqua, or indigo. The surprise was watching the same visual system render across the set: when the audio changed, everything changed, and the changes were the reads.
The second set is in progress — five readings from the machine age, 1909 to 1996. Marinetti, Kafka, Turing, the Apollo 11 landing transcript, Barlow’s cyberspace manifesto. A different visual scene: a vector chord network, electric cyan and hot magenta on deep blue. A different register: declarative, technical, the long bones of the twentieth century.
There’s no pitch here. This isn’t a product, and I’m not selling anything.
What I’m sharing is: the constraint is the fun. You can’t edit the take makes you read better. You don’t know what poem you’ll get makes you live with it. The visuals generate from the audio means there’s no auteur calling shots — you make the audio, the system makes the rest, and what you see is what came out.
The readings live on YouTube and TikTok if you want to watch the system run.
The slot machine keeps pulling.