From a Tagesschau news item to an AI Deutschrap music video — starring Helmut, the friendly Yeti MC. Orchestrated end-to-end from Claude Code.
It started as a Schlager idea: "a German song about today's news." A tooling-research chat turned into a genre pivot, a casting round, and finally a whole pipeline.
The first question was pure tooling research: which AI tools for text → music → video? Answer stack: news & lyrics right in the chat, Suno for the song, Veo/Kling/Seedance for video, FFmpeg for the cut.
"I'm thinking about creating a german schlager song about today's news with ai tools, potentially including a video."Picked from the day's headlines: the Bundesrat blocks the promised tax-free €1000 rebate. A perfect Schlager arc — anticipation → disappointment → we dance anyway. "Everyone has an electricity bill."
The decisive creative leap. First Schlager-pop, then the voice trimmed to a "gentle-giant baritone" — then the full genre pivot to Fanta-4-/Fettes-Brot-/Beginner boom-bap, 90 BPM, rapped verses with a sung hook. Lyrics and Suno prompt rewritten from scratch.
"and now as a german rap style of late 90s early 2000s"Four mascots evaluated. The Yeti wins for two reasons: the cold metaphor (a Yeti freezing because of his electricity bill = peak Schlager self-irony) and the AI trick — white fur on any background = high contrast = models keep him consistent. The name "Helmut": a warm German everyman-uncle name that carries both framings — Schlager Heimat vibe and Deutschrap blue-collar.
The genre pivot swapped the wardrobe entirely: Trachtenjanker → oversized burgundy hoodie + DJ headphones. To keep it the same Helmut, a tiny edelweiss medallion travels along as a signature — "still Helmut, just remixed."
Final research step: how do you drive this from Claude Code? Result — an MCP gateway for the models, scenes.json as single source of truth, scripts for generate / lip-sync / compose. Exactly that structure is in the repo today.
Cold metaphor + consistency hack. Instantly adoptable.
Maximally German — but proportions tricky for AI video.
Same cold logic, but the Knut trope is used up.
Very German — dogs drift harder in AI than blob creatures.
Topical, musical, linguistic — the influences behind every line.
Three versions of the same song, one playhead. Click a track or drag the crossfader and blend between them live — the position keeps running, like a DJ rig.
Every step driven from Claude Code — Atlas Cloud as the primary model gateway, fal.ai for lip-sync.
Story, rhyme, timing mapped to 90 BPM.
creative/lyrics.lrcDeutschrap beat, 3:53, hand-curated.
tausend_euro.mp38 variants + 3 angles locked.
nano-banana 2Ref-to-video, 1 take each.
Seedance 2.0 · AtlasExperimentally tested only — not in the final cut.
Hedra · fal.aiOn the beat, VHS grade, 1080p.
preview.mp4
Consistency is non-negotiable. Four anchors are forced into every prompt — they travel through every generation.
Each tile = a real frame from the generated clip, time-locked to the lyrics.
The concept doc was written before we touched the API. Almost every assumption was wrong on first contact. A field log:
Three models tried. Veo 3.1 drifted (face went humanoid) and actually cost $0.20/s instead of the assumed $0.03 — abandoned. Kling o3 Pro held the character (16 clips, 3–4 takes/scene) but pricey. Seedance 2.0 became the production model: the only one that renders legible German text on CRT screens, ID cards and bills.


Atlas catalog lives at /v1/models
That's only the OpenAI-compatible text route (105 models). The real catalog (313 models: Veo, Kling, Seedance…) is at /api/v1/models.
3:00 minutes, timings from the concept
Suno delivered 3:53. All scenes re-timed against the real LRC boundaries — v1 (scene_01–09) → v2 (scene_a01–a21).
One reference is enough for consistency
With just 1 ref the face drifts humanoid/baboon-ish. Fix: 1 anchor + 3 angles (face/profile/back) + compact bible + hard negatives.
Catalog prices are accurate
Seedance bills token-metered ≈ 2.17× the catalog rate. Veo $0.20/s instead of $0.03. Budget raised after the lesson.
No Atlas model auto-syncs to a provided audio track. So external trials — MuseTalk vs. sync v3 vs. Hedra Character-3 (fal.ai) — plus custom audio analysis: MFCC template, voice and timbre charts to verify alignment. Outcome: never produced. The final cut uses Seedance's native rap-cadence mouth motion — true lip-sync stayed experimental.



Real sync outputs from the trials — one shows the classic MuseTalk problem (artifacts around the mouth & fur on stylized faces); the v3-pass close-ups hold up usable. Sound on for the sync impression:
The hook has a female harmony behind Helmut's lead. So the screen wouldn't show Helmut miming someone else's voice, Helga was added — a second Yeti character (red beanie, corduroy bomber, the same edelweiss medallion as a continuity anchor). But the moment two characters sing in one scene, you get two simultaneous lip-sync targets plus voice attribution — too error-prone with single-character sync already unsolved.


Decision: Helga removed for now — good enough for now, final cut is Helmut only. Way forward: more thorough planning & shorter cuts — or simply different shots where Helmut is not on screen during the second-voice lines (environment / B-roll), which removes the two-character problem entirely.
Seedance 2.0 occasionally takes 15–25 minutes for a 13–15s clip. The original poller gave up after 900s (15 min); the retry loop submitted a new generation (~$3) — up to 4×. Server-side, the original prediction kept rendering and would have finished. Atlas has no cancel endpoint. Effect: ~$15 burned per scene-that-would-have-worked-anyway. Hit A13 & A14.
Which AI model did what, what it cost, and why it won or got cut. External figures as of May 2026; "verdict" = what actually happened in this project.
Sources & detail: MODELS.md in the repo. Empirical figures (≈2.17×, $12.16, …) from cost_log.csv.
The original budget was $25. It was deliberately raised after we learned Seedance bills ~2.17× the catalog rate — and that too short a timeout is more expensive than patience.