Introduction
Most Seedance 2.0 guides floating around right now give you the same three things. A formula. A table of camera moves. A long list of prompts. That is table stakes. It is also why most creators spend a week with the model, walk away with jittery output they cannot use, and decide Seedance is "overhyped."
This guide is different. I’d personally reviewed the full official Seedance 2.0 documentation, the EvoLinkAI community repo of 164 curated production prompts, weeks of testing threads on r/StableDiffusion and r/generativeAI, and the real patterns that show up when you watch what viral AI creators are actually shipping in 2026.
By the end, you will have the director's mental model that kills vague output, the physics-first vocabulary that separates cinematic from plastic, the shot-script format that the top 1% of users actually use, the @tag reference hierarchy nobody explains properly, and 12 production-ready prompts that earn their place on your desktop.
Let's go.
What Seedance 2.0 Actually Is (And Why Prompts Behave Differently)
Seedance 2.0 is ByteDance's flagship AI video model, built on a Dual-Branch Diffusion Transformer. One branch handles spatial information (what things look like). The other branch handles temporal information (how things move over time). It is the industry's first model supporting quad-modal input, meaning you can combine image, video, audio, and text in a single generation flow.
Three architectural facts change how you write prompts for it:
-
It understands physics, not adjectives. The model was trained on physical interactions. Fabric drapes. Water splashes. Dust rises. Tires smoke when a car drifts. The prompt word "cinematic" means almost nothing to it. The phrase "tires smoke as the car pivots 90 degrees on wet asphalt" gives its physics engine something concrete to simulate.
-
It processes prompts as instructions, not tag soup. Midjourney-era comma-separated incantations ("epic, cinematic, masterpiece, 8K, trending") do very little here. Seedance was trained on natural-language shot descriptions that read like a director briefing a cinematographer.
-
Early tokens get prioritized attention. The model's information hierarchy front-loads the subject and the most important action. If you bury your key instruction at the end of a 200-word prompt, it quietly gets de-weighted.
Good. That is the technical mental model. Now the insight that actually matters.
The Problem Nobody Warns You About
Here is what every beginner runs into. You type "cinematic shot of a woman walking through a city" and get a clip where the camera floats randomly, the woman's legs bend in directions legs do not go, and the whole thing shimmers like you filmed it through a wet window. You add "cinematic, epic, beautiful." Same output. You try "4K, high quality, professional." Same output. You start doubting the model.
The problem is not the model. The problem is that you are describing vibes. Seedance 2.0 cannot render vibes. It renders physics.
A photographer on r/generativeAI ran a 50-prompt stress test a few months ago and landed on the single most useful reframe I have seen for this model: name the lighting physically, not emotionally. Compare these two prompts for the same shot.
Weak (plastic, floaty output): "Cinematic lighting, moody atmosphere, dramatic shadows."
Strong (actually renders): "Single focused spotlight descending from above casting a sharp circular pool of warm tungsten light, sharp falloff into deep shadow at the edges."
The second prompt works because Seedance can simulate light physics. It cannot simulate "moody." The same principle applies to every part of the prompt. You are not describing how the shot feels. You are describing what physically happens in front of the camera and what the lens records.
Once you internalize that shift, you stop fighting the model.
The Core Mindset: You Are a Director, Not a Describer
Every high-performing Seedance 2.0 prompt reads like shot directions, not scene descriptions. There is a real working pattern behind this.
A describer writes: "A woman stands in a dramatic hallway with beautiful lighting, feeling tense."
A director writes: "A woman in a charcoal wool coat stands mid-corridor of a flickering industrial hallway. Single overhead fluorescent strobes intermittently, casting her alternately in pale blue light and deep shadow. She exhales slowly, breath visible in the cold air, then turns her head 30 degrees toward a sound off-frame left. Camera: locked medium shot on a tripod, no movement. 35mm lens, shallow depth of field, grain visible."
Same scene. Completely different output. The second version gives the spatial branch concrete objects and textures, the temporal branch a specific physical action with a defined trigger, and the whole model a rhythm it can execute.
Three shifts get you there.
-
Describe physical interactions instead of emotional outcomes. Not "she is afraid," but "her shoulders tense, her jaw tightens for half a second." Not "the scene is chaotic," but "papers scatter from the desk, a chair tips backward, dust rises from the impact."
-
Give the camera one clear job. Not "dynamic cinematic camera movement throughout." Not "camera rotates and pans and zooms." One primary instruction, phrased as a specific action with a start and end point.
-
Separate the subject's movement from the camera's movement. This is the single most broken rule in beginner prompts, and it is the number one cause of jittery output. More on this in the camera section.
The 6-Part Prompt Formula (With The Upgrades Nobody Teaches)
The officially recommended structure is the starting point. But the version that actually works in 2026, after a year of community testing, has a few key upgrades baked in.
[Subject] + [Action] + [Environment] + [Camera] + [Style Anchor] + [Constraints]
|
Part |
What Goes Here |
Common Upgrade |
|---|---|---|
|
Subject |
Specific visual features. Age, clothing, materials, and at least one asymmetrical detail if human. |
Name the non-idealized feature. "Slight scar above the left eyebrow" beats "ordinary face." |
|
Action |
Specific verbs with quantified intensity. |
Use physical verbs. "Exhales slowly, shoulders dropping" beats "looks relaxed." |
|
Environment |
Where, when, and what light is doing. |
Name the light physically. Direction, temperature, falloff. |
|
Camera |
One primary instruction with start and end points. |
Add pacing words, not technical specs. "Slow, smooth, stable" beats "24fps, f/2.8." |
|
Style Anchor |
A specific tradition, stock, colorist, or director. |
Replace every generic adjective with an industry-specific phrase. |
|
Constraints |
Positive framing only. What should be present, not absent. |
"Maintain consistent facial features" beats "avoid face drift." |
Target length is 60 to 120 words for standard shots. Shot scripts (covered below) run longer, but each block stays tight.
Good vs. Bad Side by Side
Bad: "Cool cinematic shot of a guy drifting a car through a city at night, super dynamic camera, epic vibes."
Good: "A matte black 1992 Nissan 240SX initiates a 90-degree drift through a rain-slicked Shinjuku intersection at 2 AM. Tires erupt in thick white smoke as the rear wheels break traction. Neon reflections streak across the wet asphalt in magenta and cyan. Camera locked on a low tripod at curb height, slight handheld drift only, the car sweeps through the frame left to right. 35mm anamorphic lens, shallow depth of field, film grain. Style: Michael Mann night photography, cold color grade, practical lighting only."
The second prompt gives Seedance something to physically simulate in every sentence. The first prompt forces the model to guess.
The Shot-Script Format: Your Actual Secret Weapon
This is the biggest unlock in the entire guide, and it is the format that separates viral creators from everyone else. Most guides show it as "an advanced option." It should be your default whenever a shot runs longer than 5 seconds or needs narrative structure.
The Structure
[Style] Specific style anchor (director, film, era, or format)[Duration] Total length[00:00-00:04] Shot 1: Shot Name (Camera Type).Physical scene description.Character action with specific body language.Audio cue.[00:04-00:07] Shot 2: Shot Name (Camera Type)....[00:07-00:10] Shot 3: Shot Name (Camera Type)....Consistency constraints. Physics requirements. Palette notes.
Why It Works
Timecodes distribute action predictably. Without them, Seedance spreads the described action unevenly across your duration, often dumping the payoff in the first 2 seconds and then floating for 8 more. With timecodes, the model pins each beat to its exact window.
Named shots force a narrative arc. When you label a shot "The Discovery," "The Hesitation," or "The Impact," the model generates more emotionally coherent motion. It has a target to build toward.
Physical grounding per shot. Every block gets concrete simulation instructions, which the dual-branch architecture actually uses.
A Full Shot-Script Example
[Style] Denis Villeneuve cinematic sci-fi, IMAX 70mm look, desaturated teal and amber palette, grain visible.[Duration] 10 seconds[00:00-00:04] Shot 1: The Scale (Extreme Wide, Static). A lone figure in a white exposure suit stands at the edge of a kilometer-wide crater on a rust-red planet. Thin dust drifts horizontally across the frame in slow curls. The crater stretches to the horizon, dwarfing the figure completely. Deep low-frequency wind rumble.[00:04-00:07] Shot 2: The Reflection (Slow Push to Close-up on Visor). Camera slowly pushes forward from the wide into a tight framing of the helmet visor. In the curved reflection of the glass, a small blue marble appears, Earth, impossibly distant. The figure's breathing is audible inside the suit. Anamorphic flare streaks across the visor edge.[00:07-00:10] Shot 3: The Decision (Low Angle, Tripod Fixed). From below, the figure steps forward off the crater edge. Dust rises from the boot in slow, physically accurate plumes. Camera holds absolutely still as the figure descends out of frame. Deep orchestral swell. Cut to black.Consistent exposure suit design across all shots. Realistic low-gravity dust physics. Grain and halation maintained throughout.
Notice the specifics. "Kilometer-wide crater" beats "huge crater." "Slow curls" beats "drifting." "Anamorphic flare streaks" beats "lens flare." Every adjective is a specific, nameable thing.
The @Tag Reference System (The Feature Most Guides Get Wrong)
Seedance 2.0 supports combining up to 9 images, 3 videos, and 3 audio files (12 total) in a single generation. This is where the real power of the model lives, and it is also where most tutorials oversimplify it.
Here is the framing that actually helps. The three reference types are not interchangeable. Each one controls a different layer of the output, and using them for the wrong job is why your character drifts or your motion references get ignored.
|
Reference |
What It Anchors |
Best Use |
|---|---|---|
|
@Image |
Visual identity. Face, wardrobe, product geometry, environment, style palette. |
Lock character appearance across clips. Lock product design across shots. Set the color and lighting palette. |
|
@Video |
Motion trajectory and camera language. |
Replicate a complex camera move. Transfer a choreography. Copy a transition or VFX pattern. |
|
@Audio |
Rhythm and tempo. |
Beat-synced cuts. Lip-sync timing. Ambient mood. |
The Rule That Solves 80% of Reference Problems
Always state which element you want extracted from which file. The model can pull motion, camera, style, audio rhythm, character identity, or effects from a single reference. If you do not tell it which, it will blend everything and you will get a mushy average.
Wrong:
Use @Video1 for the scene.
Right:
Reference @Video1 for camera movement only.Character appearance references @Image1.Overall color grade references @Image2.
The Most Useful Reference Patterns
-
Character consistency across multiple videos. Upload the exact same portrait image as @Image1 every time. Reinforce with explicit wardrobe and feature callouts in the prompt. "Character references @Image1. Same black wool coat, same short curly dark hair, same green eyes throughout."
-
Camera move replication. If you see a viral clip with a camera movement you want, upload it as @Video1 and instruct: "Replicate the camera movement from @Video1 exactly. Character and scene are new."
-
Style transfer from a still. Upload a hero frame or a reference painting as @Image1. "Color grade, lighting style, and atmospheric texture reference @Image1. Apply to the new scene described below."
-
First-frame and last-frame control. "Use @Image1 as the first frame. Use @Image2 as the last frame. Fill the interpolation with [description]." This is the single most controllable way to get predictable openings and endings.
-
Audio-driven visual pacing. Upload your music or voiceover as @Audio1. "Visual cuts and camera movement sync to the beat rhythm of @Audio1."
The 8 Camera Movements (And The 3 Rules That Actually Matter)
Camera movement is the single biggest quality lever in any Seedance prompt. It is also where beginners make the most damaging mistakes. Here are the eight moves the model recognizes natively.
|
Move |
What It Does |
Best For |
|---|---|---|
|
Push-in (dolly in) |
Camera moves toward subject |
Emotional emphasis, close-up reveal |
|
Pull-out (dolly out) |
Camera pulls back to reveal |
Environmental context, scale reveal |
|
Pan |
Horizontal sweep |
Scanning a scene, tracking lateral motion |
|
Tracking shot |
Camera follows alongside subject |
Walking characters, action scenes |
|
Orbit (arc) |
Camera circles subject |
Product showcase, character portraits |
|
Aerial (drone) |
High-altitude overhead |
Landscapes, cities, scale reveals |
|
Handheld |
Natural slight shake |
Documentary realism, first-person tension |
|
Fixed (tripod) |
Zero camera movement |
Letting subject action carry the shot |
The Three Camera Rules
-
Rule 1: One primary instruction per shot. Compound moves confuse the model. If you need a secondary motion, write it as "primary then secondary," not "primary and secondary simultaneously." Correct: "Camera slow push-in, then subtle rise at the end." Broken: "Camera pushes, pans, rotates, and zooms throughout."
-
Rule 2: Describe rhythm, not specs. The model responds to "slow, smooth, gradual, gentle, stable." It does not respond to "24fps, f/2.8, ISO 800, 85mm." The official Seedance documentation puts it best: "Describe the rhythm as if you are talking to an editor."
-
Rule 3: Separate camera movement from subject movement. This is the rule that saves more prompts than any other. Correct: "The dancer spins slowly in place. Camera holds a fixed medium shot." Broken: "Spinning camera around a dancing person." The second phrasing makes the model try to rotate the camera and the subject together, which almost always produces shaky, warped output.
The Physics-First Vocabulary Cheat Sheet
These are the phrases that actually move Seedance 2.0's output. Stacking two or three of them does more for your prompt than 50 generic adjectives.
Lighting (Highest Leverage)
If you only add one element to a prompt to improve quality, add a specific lighting description.
-
"Single focused spotlight descending from above, sharp falloff at the edges"
-
"Soft golden hour backlight, warm rim on subject, silhouetted against the sky"
-
"Tungsten practical light from a single desk lamp, warm pool of light on the face, deep shadow elsewhere"
-
"Cold overcast diffused daylight, no harsh shadows, even illumination"
-
"Neon magenta fill from frame left, cyan rim from frame right, rain-slicked reflections double the highlights"
Surfaces That Double Your Visual Value
Wet and reflective surfaces force Seedance to render reflections, which doubles the complexity of your shot for free.
-
"Rain-slicked asphalt, reflections of neon signs"
-
"Polished obsidian pedestal, mirror reflection of the subject"
-
"Morning dew on leaves, catching the sunrise"
-
"Condensation on cold glass, slow droplet descent"
-
"Wet pavement after a storm, streetlights pooling in puddles"
Color Anchors (Replace Generic Adjectives)
Give the model three to five explicit color anchors. This globally locks the palette.
-
"Palette of cobalt blue, rust orange, charcoal, and warm cream"
-
"Desaturated teal and amber grade, grain visible"
-
"Dark navy velvet, deep gold, ivory, with silver highlights"
Note the "dark navy" trick. Pure black gives Seedance almost nothing to render. A slight color value inside the darkness produces a much richer, more textured result.
Speed Phrases That Actually Work
The keyword "fast" is the single most likely word to degrade quality. If you need speed, use a specific reference.
-
"240fps feel" or "half-speed"
-
"Slow motion, liquid frame-by-frame detail"
-
"Real-time pace, natural human speed"
-
"Accelerating through the shot, starts slow, ends urgent"
Only one element in your shot should be "fast" at a time. Fast camera plus fast subject plus busy scene is a guaranteed jitter recipe.
Style Anchors: The Industry-Specific Phrases That Outperform "Cinematic"
Community-tested anchors that consistently produce stronger, more specific looks than generic quality words.
|
Anchor |
What It Triggers |
|---|---|
|
Naturalistic film print emulation |
Organic grain, realistic stock characteristics, accurate color science |
|
DaVinci industrial-grade color grading |
Controlled contrast, professional color science, precise saturation |
|
35mm handheld film camera, natural grain, subtle organic shake |
Documentary realism, breathing camera, no digital sterility |
|
Hollywood IMAX blockbuster quality |
Large-format feel, deep dynamic range, epic scale |
|
Cold documentary style, natural light on a cloudy day |
Desaturated, even lighting, no dramatic shadows |
|
Tsui Hark-style wuxia blockbuster |
High contrast, cold jade-blue with amber flowing light |
|
100% real-life shooting texture |
Suppresses CGI tells, pushes toward photographic believability |
|
Anamorphic lens flare, 2.39:1 crop, teal-and-orange grade |
Modern cinematic look, oval bokeh, widescreen composition |
Pick one or two of these per prompt. The pattern across all of them is the same. Name a specific tradition (a director, a film stock, a format, a colorist tool) rather than describing the look in generic adjectives. Seedance has clearly learned these reference points strongly.
Words That Kill Quality
Every prompt you write should avoid these, or they will quietly drag your output toward AI soup.
|
Word |
Why It Fails |
Use Instead |
|---|---|---|
|
Cinematic (alone) |
Too generic, no visual anchor |
Name the stock, director, or format |
|
Epic |
Not a visual specification |
Describe the specific effect (scale, music, composition) |
|
Amazing / beautiful |
No practical instruction |
Specific lighting plus specific composition |
|
Fast (applied to everything) |
Guaranteed jitter when stacked |
Apply speed to one element only, use "240fps" or "half-speed" |
|
Lots of movement |
Causes chaotic output |
Describe one specific motion with a clear arc |
|
Masterpiece / 8K / trending |
Midjourney-era incantations, almost no effect on Seedance |
Replace with a specific camera, lens, or stock reference |
12 Production-Ready Seedance 2.0 Prompts
Each prompt below is engineered around the principles above. Copy, paste, adjust the specific details to your scene, and generate. These are designed to push the model harder than the prompts you will find in most listicles.
1. The Goodbye - Native Lip-Sync Dialogue + Auto Multi-Cam
Showcases: lip-sync dialogue, shot-reverse-shot coverage, emotional physical performance, ambient cafe audio design, golden-hour practical lighting. Total dialogue: 15 words.
【Style】Cinematic contemporary drama in the tradition of Celine Sciamma and Sean Baker, warm practical golden-hour lighting, 35mm spherical lens, shallow depth of field, fine film grain, desaturated warm palette. 2.39:1.【Duration】15 seconds【Native Audio】Full lip-synced English dialogue delivered quietly and naturally. Ambient outdoor cafe layer throughout: distant street traffic, a passing bicycle bell at 00:03, faint chatter from adjacent tables, the occasional clink of a ceramic cup on a saucer, birdsong in the background. No music. Dialogue sits clearly above ambient.[00:00-00:05] Shot 1: Medium two-shot, static tripod, eye-level. Two women in their early 30s sit across a small round marble-topped outdoor cafe table in Paris at golden hour. LEFT: CLAIRE, warm brown skin, long curly dark hair, a soft cream linen shirt. RIGHT: MARIE, pale skin, straight auburn hair pulled back, a charcoal knit cardigan. Between them: two white espresso cups on saucers, a small glass of water, a folded newspaper. Background: blurred warm stone facade, string lights not yet lit, a passing pedestrian out of focus. Claire wraps both hands around her espresso cup, exhales softly, looks up at Marie and says quietly: "So Friday's really happening."[00:05-00:10] Shot 2: Close-up reverse on Marie, over Claire's shoulder, slight handheld breathing. Tight framing on Marie's face. The golden hour side-light catches the left side of her face, soft shadow on the right. Her eyes glisten slightly but she holds composure. She looks down at her own espresso for a beat, then up at Claire and says gently: "I'll visit. I promise."[00:10-00:15] Shot 3: Cut back to the two-shot, identical framing to Shot 1, static hold. A beat of silence. Claire nods once, slowly, a small sad smile crossing her face. She reaches across the table and covers Marie's hand with her own. Marie turns her hand palm-up and interlaces their fingers. Neither speaks. The golden hour light shifts imperceptibly warmer. The bicycle bell fades into distance. Hold on the quiet moment. Slow natural fade at 00:15.Character consistency: Claire's hair, shirt, and features identical across cuts. Marie's hair, cardigan, and features identical across cuts. Espresso cups in identical positions on the table in both two-shots. Lip-sync frame-accurate to each spoken line. Dialogue delivery natural, quiet, emotionally grounded, never theatrical. English. Golden hour practical light as the only motivation, no studio feel.
2. The Rooftop Leap - Action Physics + Layered Sound Design
Showcases: high-energy motion with cloth and dust physics, multi-angle action coverage, rigorous layered soundscape with no dialogue, speed-ramp to slow motion, seamless timing across three camera positions.
【Style】Modern cinematic action in the tradition of Alex Garland and Emmanuel Chivo Lubezki, 35mm spherical, handheld operator energy but stable, desaturated cool palette with warm key sources, visible grain, anamorphic flares only on strongest highlights, 2.39:1.【Duration】15 seconds【Native Audio】Fully designed: pounding footsteps on metal and concrete varying with surface, labored breathing close in the foreground throughout, distant helicopter thump, ambient city wind, a rising string-driven score building from 00:04 peaking at the leap 00:12, sudden audio collapse to wind-only from 00:08-00:12, impact thud and gravel skitter on landing. No dialogue.[00:00-00:04] Shot 1: Tracking side-profile, fluid handheld alongside the runner. Dusk over a dense East Asian city skyline, deep indigo above bleeding to amber at the horizon. A woman in her late 20s in a dark navy tactical jacket, black combat pants, short platinum hair, sprints across the corrugated metal roof of a warehouse. Camera matches her pace, running parallel. Her breath visibly fogs in the cold air. Two male pursuers in dark tactical gear enter the frame behind her, closing. Metal roof panels flex and rumble under her footfalls. Distant helicopter thumps in the mix.[00:04-00:08] Shot 2: First-person POV forward, chest-mounted, slight camera shake synchronized to running impact. Cut to her exact point of view as she sprints toward the edge of the roof. The gap to the next building is wide, roughly five meters, with a twenty-story drop between. The opposing rooftop is visible: concrete, air conditioning units, rooftop door. Her breath dominates the audio foreground. Wind rushes past. The roof edge approaches fast. At 00:07 her feet leave the surface.[00:08-00:12] Shot 3: Wide profile slow-motion leap, low angle, locked dolly, speed ramp from 120fps to 240fps at the peak. She is suspended mid-air against the twilight sky, arms extended, jacket flapping in physically accurate cloth simulation, legs tucked. The two pursuers arrive at the edge of the original rooftop behind her, one skidding to a stop. Time stretches. Audio collapses to wind and a single sustained string note. At 00:11 her trajectory begins to descend.[00:12-00:15] Shot 4: Same low angle, real-time resumes. Audio crash-cuts back in. She lands hard on the opposite roof, knees absorbing, a realistic plume of dust and loose gravel kicked up from the impact, pebbles skittering audibly. Score hits its full peak on impact then resolves. She holds the crouched landing pose, head down, breathing hard. Camera holds static on her silhouette against the fading sky for the final beat.Physics requirements: cloth simulation on the jacket must flow naturally with body motion and wind, not float. Dust plume on landing is physically dispersed, not digital. Hair reacts to airflow during flight. Pebbles bounce with correct restitution on landing. No compositing tells. All lighting practical: last-of-sunset sky, distant neon from the city. Multi-angle action coverage maintains spatial continuity of the two rooftops and the gap between them across every cut.
3. The Fragrance Campaign - Image Reference + Commercial-Grade Audio Design
Showcases: image-anchored product consistency, silk cloth physics, 240fps liquid, multi-shot commercial structure, layered ambient score with voiceover, precision lighting continuity.
REFERENCE IMAGE - generate with Z-Image Turbo or Nano Banana 2, upload as @Image1:
Studio product photograph of a tall slender crystal perfume bottle with precision-cut faceted glass, filled with pale rose-gold liquid, capped with a polished brushed-gold cylindrical cap. The bottle rests center-frame on a circular black obsidian pedestal. Background is deep matte navy velvet fading to pure black at the edges. A single overhead warm tungsten key light creates a sharp specular highlight on the top bevel of the cap and a soft pool of light across the bottle body. A cool cyan rim light from back-left traces a thin luminous line along the right side of the bottle. Glass refracts light internally, fine caustic patterns visible through the faceted sides. Shot on Hasselblad H6D-100c with 120mm macro lens, f/8, editorial fashion product photography aesthetic, hyper-crisp micro-texture on cap grain, zero dust. Palette: dark navy velvet, rose-gold, warm tungsten, cool cyan. 1:1 aspect ratio. Frontal hero composition.
SEEDANCE 2.0 PROMPT:
@Image1 as the opening frame and product reference. Bottle geometry, color, and materials must stay identical to @Image1 across every shot.【Style】Premium fragrance campaign in the tradition of Tom Ford and Chanel. Anamorphic 2.39:1, DaVinci industrial-grade color grade, Kodak Vision3 film stock feel, subtle grain, lens halation on the strongest highlights only.【Duration】15 seconds【Native Audio】Sparse and luxurious: a single swelling cello note beginning at 00:02, slow ambient synth pad entering at 00:06, a crystalline glass chime on the bottle reveal at 00:11, a whispered female voiceover in English at 00:13 saying only the word "Noir." A soft breath of wind audible during the silk transition. Professional mastering.[00:00-00:03] Shot 1: Static macro identical framing to @Image1. Hold the hero composition for three full seconds, absolute stillness, only the faintest caustic shimmer inside the liquid as if just settled. Cello note enters at 00:02.[00:03-00:07] Shot 2: Slow 30-degree orbit around the pedestal, introducing motion. At 00:03 a single strip of black silk ribbon enters frame from above, drifting down past the bottle in slow motion with weighted cloth physics, flowing around the neck, spiraling away off-frame left. Camera executes a smooth orbit during this, revealing the faceted side of the bottle catching the tungsten key at new angles. Synth pad joins the cello.[00:07-00:11] Shot 3: Extreme macro close-up, static, on the cap and bottle neck. The brushed-gold cap slowly, hydraulically rises and separates from the bottle. Rose-gold liquid catches the key light, fine refraction visible through the faceted glass. A single suspended droplet forms at the lip of the bottle neck, hangs in ultra-slow-motion at 240fps feel, trembling. Score holds on a sustained chord.[00:11-00:15] Shot 4: Sharp cut to pull-back wide. The bottle is now fully revealed on the pedestal, the black silk ribbon settled in a graceful curve at its base. Camera pulls back another two meters. The logo NOIR in thin hairline serif gold typography fades in underneath the pedestal at 00:13 as the whispered voiceover delivers the word. Crystalline glass chime hits precisely as the logo locks. Hold the final composition for the last two seconds.All lighting motivated from the single overhead tungsten source and one cyan back-rim, identical direction across all shots. Silk physics realistic and weighted. Liquid obeys correct gravity and surface tension. No jitter, no flicker. Broadcast commercial quality.
4. The Ribbon Goodbye - Anime Key Visual
Showcases: stylized animation maintained across a continuous camera move from extreme close-up to epic wide, cloth physics in stylized form, emotional micro-expression, layered orchestral score with SFX, single unbroken arc rather than cut-driven.
【Style】Makoto Shinkai animation key visual quality, ultra-detailed cloud rendering, soft cel shading over painterly backgrounds, emotional atmospheric lighting, lens flares on sun highlights, summer wind particle effects (pollen, petals, dust motes). 16:9 widescreen.【Duration】15 seconds【Native Audio】Minimal piano score beginning at 00:01, distant train horn at 00:05, cicadas rising at 00:08, strings swelling from 00:10 peaking at 00:13, wind through grass as base layer throughout, no dialogue. Professional mix.[00:00-00:05] Shot 1: Medium shot, slow tripod pan following her gaze left to right. A teenage girl in a white summer school uniform stands alone at the center of a rusted railway overpass at golden hour. Long black hair lifts in a slow summer breeze. A thin red ribbon is tied to her left wrist. Camera at eye-level begins a very slow horizontal pan matching her gaze direction. Below and behind her, a single empty passenger train cuts silently across endless golden rice fields that stretch to distant blue-grey mountains. The sky gradients from deep violet at top through rose to tangerine at the horizon, a single early star visible. Piano enters softly. Distant train horn echoes at 00:04.[00:05-00:10] Shot 2: Slow continuous push-in toward her left wrist, ending in extreme close-up. The red ribbon, previously tied in a neat bow, begins to loosen. One loop slips, then another. In the final close-up we see the fine weave of the ribbon fibers, the faint warmth of her skin in the sun, a scattering of freckles. At 00:08 the ribbon comes completely free and lifts into the air. Cicadas rise in the audio mix. A single tear forms at the corner of her left eye but does not fall.[00:10-00:15] Shot 3: Long continuous pull-back, extreme close-up to epic wide in one unbroken move. Camera pulls back from her wrist to a medium, then to a wide, then to an epic wide. The red ribbon spirals upward into the enormous sunset sky, diminishing as it catches the last rays of sun, eventually becoming a tiny red speck against gold. She closes her eyes and smiles faintly. Her hair and uniform skirt lift one last time. Strings swell to their peak at 00:13. At 00:14 the camera reaches its widest point, revealing her as tiny on the bridge against the vast landscape, the ribbon dissolving into the sun. Final frame: she opens her eyes at 00:15 as the score resolves. Half-beat hold, slow fade.Painterly background texture maintained throughout the pull-back. Hair and ribbon cloth physics wind-driven and natural, not floaty. Lens flares streak across frame at the pull-back peak. Character design consistent across all three shots. Score, ambient, and SFX layered at broadcast levels. Official anime feature-film opening quality.
5. The Concert Peak - Audio Reference + Beat-Synced Multi-Cam
Showcases: @Audio reference driving cut timing and visual dynamics, lip-sync to vocal line, four-angle rapid montage, crowd and stage lighting physics, dynamics mapping (light intensity and camera speed tied to audio peaks).
REQUIRED REFERENCE:@Audio1: upload a 15-second section of the musical track the visuals must sync to. Choose a piece with a clear tempo, an identifiable vocal line, and a dynamic peak around 12 seconds in.PROMPT:@Audio1 is the rhythmic and dynamic anchor. Every cut lands on a beat of @Audio1. Camera speed, light intensity, and subject energy rise and fall with the dynamics of @Audio1. Lip-sync is frame-accurate to the vocal line of @Audio1.【Style】Music film crossover in the tradition of Hiro Murai and Emmanuel Lubezki, 35mm anamorphic, warm tungsten practicals mixed with deep saturated stage gels, high dynamic range, fine grain. 2.39:1.【Duration】15 seconds[00:00-00:04] Shot 1: Slow push-in, backlit silhouette. A solo female vocalist in her late 20s, long dark curls, wearing a black floor-length dress with delicate gold embroidery, stands alone on a smoke-filled dark stage. She is backlit in deep amber from a single source, silhouetted with a halo of light through her hair. She faces away from camera. Camera begins six meters away, slowly dollying forward as the intro of @Audio1 builds. At 00:03, on the first strong downbeat, she turns her head over her shoulder, catching a thin rim of warm light on her jawline.[00:04-00:08] Shot 2: Cut precisely on the downbeat to a tight close-up on her face, now facing camera directly, static framing. Warm key from frame left, deep blue fill from frame right. She begins to sing, lip-sync exact to the vocal line of @Audio1. Her breath visible in the cold stage air. Ambient smoke drifts through frame behind her. Emotional intensity builds with the track.[00:08-00:12] Shot 3: Four-part rapid intercut montage, each sub-shot one beat of @Audio1. A: wide shot of the full stage, smoke and amber lights, her silhouette at center. B: extreme close-up of her hand rising, palm open, as she sustains a note. C: low angle from the stage floor looking up, stage lights streaking like comet tails behind her head. D: over-the-shoulder behind her, looking out at an empty seated audience dissolving into darkness. Each cut lands exactly on a beat.[00:12-00:15] Shot 4: Slow pull-back to epic wide, light explosion. On the biggest beat of @Audio1 at 00:12, cut to the wide pull-back. All stage lights flare at once, briefly blowing the frame to white at the peak, then settling as camera continues pulling back. She is revealed standing at the center of a vast concentric ring of stage lights that were invisible in the darkness, now fully illuminated. Her arms are raised. She sustains the final note. At 00:15 the audio resolves and she lowers her head. Hold.Lip-sync frame-accurate to the vocal line. Cut timing locked to beats of @Audio1, not approximate. Character appearance identical across all cuts. Stage geometry spatially consistent across angles. Light dynamics map precisely to audio dynamics.
6. Neon Rain Alley - Physics Showcase with Ambient Audio Mix
Showcases: dense rain simulation with reflection physics, neon light interaction on wet surfaces, steam and smoke interaction, Chinese typography rendering, ambient sound design without dialogue, three-shot location storytelling.
【Style】Wong Kar-wai meets Blade Runner 2049, anamorphic 2.39:1, heavy rain, practical neon lighting only, tungsten warmth mixed with cool magenta and cyan neon, CineStill 800T halation on every highlight, pronounced bokeh, visible grain, slow pacing.【Duration】15 seconds【Native Audio】Heavy ambient rain as foundation throughout, distant thunder at 00:02 and 00:11, a solo muted trumpet melody carrying the emotional line across all three shots, footsteps splashing on wet pavement, a ceramic bowl setting down at 00:07, a match striking with a clear audible flare at 00:09. No dialogue.[00:00-00:05] Shot 1: Slow low-angle dolly-in through a puddle. A narrow Hong Kong back alley at 2 AM in heavy rain. Vertical neon signs stack on both sides in Cantonese and English characters: magenta, cyan, electric pink, acid green. Rain pours in sheets. The alley is empty except for a single figure in a long black trench coat standing with her back to camera at the far end, steam rising from her breath. Camera at ankle height, slowly dollying forward through a puddle, individual raindrops splashing up toward the lens, neon reflections shimmering on every wet surface. Thunder rolls at 00:02. Trumpet melody enters.[00:05-00:10] Shot 2: Cut to a warm interior, static medium shot. A small noodle stall tucked into the alleyway, lit only by a single bare tungsten bulb. An elderly Chinese man in a white apron ladles broth into a ceramic bowl. Steam rises thickly in the cold air, backlit against the rain-washed street visible past him. He sets the bowl down on the counter with a soft ceramic clink at 00:07. The woman in the trench coat is now seated at the counter, face in profile, wet dark hair framing her cheek. She lights a cigarette at 00:09, the match strike clearly audible, flame briefly bathing her face in warm orange. Her expression is unreadable, contemplative. Rain continues steadily in the background.[00:10-00:15] Shot 3: Close-up on a puddle, slow tilt up. A street puddle at the edge of the stall, raindrops creating overlapping concentric ripple patterns in slow motion. Within the puddle is a perfect reflection of the neon signs overhead, the warm glow of the noodle stall, and at the edge of frame, the woman's silhouette with a glowing cigarette tip. Thunder rolls again. Camera slowly tilts up out of the puddle, following the neon signs vertically up the side of the building, revealing a tall pink neon sign at the top reading DREAM in English and the character 夢 in Chinese. Trumpet melody peaks at 00:14. Final frame holds on the neon sign against the rain-soaked black sky.Rain must be physically accurate and dense: individual droplets visible in close-up, correct splash patterns, sheeting runoff on inclined surfaces, visible mist particulate in the air. Neon is the only light source, producing hard colored shadows and no ambient fill. Cigarette smoke mixes correctly with steam and rain mist. All reflections in wet surfaces are physically accurate. The Chinese character 夢 must render crisply and correctly. No digital tells, every lighting source motivated and practical.
7. Bamboo Duel - Wuxia Combat at 15s with Cloth and Impact Audio
Showcases: multi-angle combat choreography in a single generation, silk cloth physics under momentum, speed-ramp slow motion, impact SFX with percussion hits, spatial continuity across rapid cuts, ancient and dramatic scale.
【Style】Tsui Hark new-style wuxia blockbuster in the tradition of Peter Pau cinematography, cold jade-blue moonlight with amber torch fill, silk-and-flesh physics, anamorphic 2.39:1, fine grain, mythic scale.【Duration】15 seconds【Native Audio】Traditional Chinese erhu melody weaving throughout, wind through bamboo as ambient base, metallic sword ring on each blade contact, silk snap on sleeve movement, bamboo creak and sway, a single deep temple drum beat on each major impact at 00:07, 00:11, and 00:14, night insect layer underneath. No dialogue.[00:00-00:05] Shot 1: Crane descent through bamboo canopy. Camera begins fifteen meters above an ancient bamboo forest under a full moon, looking straight down. Slow controlled descent through multiple layers of bamboo leaves, moon visible directly behind camera's downward axis. At 00:03 the camera breaks through the lowest canopy layer to reveal two figures standing twenty meters apart on a moss-covered stone path. LEFT: a female warrior in flowing jade-green silk robes, long black hair in a warrior's topknot, a silver jian sword held point-down at her side. RIGHT: a masked assassin in black robes, a single curved dao held ready. Wind moves through bamboo, the erhu enters. Both figures stand perfectly still.[00:05-00:10] Shot 2: Multi-angle choreography, rapid cuts each 0.5 to 1 second, real-time speed. At 00:05 the jade warrior launches forward. A: low angle from the path as she sprints, silk sleeves trailing like ribbons behind her. B: mid-air tracking as she leaps onto a single bamboo stalk that flexes with her weight. C: over-the-shoulder on the assassin as she descends toward him with her sword raised. D: extreme close-up on blades meeting in a shower of sparks at 00:07, temple drum hits. E: reverse angle as they separate, her hair whipping, his mask catching moonlight. F: wide shot of both mid-exchange, swords blurring in a flurry of strikes, cherry blossom petals from an off-screen tree scattering around them with each movement. Sword ring audible on every blade contact.[00:10-00:15] Shot 3: Final strike, single-angle extreme slow motion ramping at 00:10 to 240fps. The warrior executes a single upward strike, silk sleeve trailing in a perfect horizontal arc, the jian blade catching a shaft of moonlight piercing the canopy. The assassin's mask splits cleanly in two, halves falling in slow motion. Temple drum at 00:11. Speed returns to near real-time at 00:13 as he falls backward onto the moss, his dao clattering audibly to stone. The warrior lands in a low stance, sword at her side, breathing. A single strand of hair falls across her cheek. She looks up. Final temple drum at 00:14. Hold on her moonlit face as the erhu resolves. Petals continue to drift.Silk physics weighted and realistic, not floaty, sleeves flow with momentum and gravity. Sword metal catches and refracts moonlight accurately. Impact sparks physical-looking, not digital. Bamboo flexes and recoils realistically when the warrior lands and launches. Petal motion follows airflow from character movement. Multi-angle coverage maintains spatial continuity of both fighters and the stone path across every cut. Character appearance, hair, costume, and face continuity identical across all shots.
8. The Survivor - Documentary Lip-Sync Monologue with B-Roll
Showcases: lip-sync monologue, B-roll cutaway with voiceover continuing over the cut, natural documentary interview aesthetic, warm practical daylight, minimal score entry timed to emotional beat. Total dialogue: 20 words.
【Style】Errol Morris and Netflix documentary aesthetic, direct-address interview with one B-roll insert, soft naturalistic daylight, shallow depth of field on an 85mm equivalent, warm cream and muted olive palette, minimal grain, professional broadcast mix. 16:9.【Duration】15 seconds【Native Audio】Primary: first-person English monologue, calm and matter-of-fact in tone, lip-synced. Secondary: room tone throughout, subtle birdsong and garden ambience during the B-roll insert, a single soft piano chord entering at 00:12. No music under the spoken lines.[00:00-00:08] Shot 1: Static medium close-up, direct address, off-axis gaze. A woman in her late 60s sits in a worn leather armchair in a home study. Short silver hair, weathered skin with genuine laugh lines, muted olive cashmere sweater. Bookshelves out of focus behind her. A window at frame right casts soft afternoon light across the left side of her face, leaving the right side in gentle shadow. She looks slightly off-axis, addressing an unseen interviewer just beside camera. She speaks plainly, with natural pauses: "They told me I had six months." A long breath. "That was seven years ago."[00:08-00:11] Shot 2: B-roll insert, macro close-up, warm garden light. Cut to a close-up of the same woman's weathered hands, now lit warmer, gently pressing soil around the base of a small tomato seedling in a terracotta pot on a sunlit garden table. A pair of worn gardening gloves lies beside the pot. Her voiceover continues over the image, quieter: "Every morning feels stolen."[00:11-00:15] Shot 3: Cut back to the interview setup, identical framing to Shot 1. She smiles faintly, a genuine warmth reaching her eyes. She delivers the final line: "And I use it well." A single soft piano chord enters at 00:12 underneath her words. She looks away from camera toward the window at frame right, the afternoon light catching the rim of her cheek. Hold on her in three-quarter profile as the light settles. Natural fade at 00:15.Lip-sync frame-perfect to each spoken phrase. Delivery calm, grounded, and genuinely emotional rather than theatrical. Authentic breath and pauses between lines. Character appearance identical across Shot 1 and Shot 3: same silver hair, sweater, armchair, bookshelves, window position. B-roll hands in Shot 2 visually consistent with the woman's age and complexion. Audio mix documentary-professional: clear dialogue foreground, subtle room tone, garden ambient only during the B-roll, piano chord introduced only at the very end.
9. Medina Sunset POV - Continuous Unbroken Take with Spatial Audio
Showcases: single unbroken 15-second take with no cuts, first-person spatial audio mixing that shifts with environment, surface-accurate footstep audio, heartbeat and breath as emotional foreground, market ambience with diegetic Arabic dialogue from NPCs.
【Style】Adrenaline first-person documentary realism in the tradition of the Children of Men continuous takes, chest-mounted body camera aesthetic with organic breathing shake, 28mm wide, natural grain, desaturated with amber practical sources. 16:9.【Duration】15 seconds【Native Audio】Immersive first-person mix: sharp labored breathing close in the foreground throughout, heartbeat rising in intensity over the clip, footsteps varying with surface (wood, stone, tile), fabric rustle of the runner's jacket, dense market ambience (vendors shouting in Arabic, sizzling food, haggling, distant music, a passing motorcycle), sudden audio collapse on arrival. No music score. No dialogue from the runner.Single continuous unbroken shot, no cuts, camera rigidly chest-mounted on the unseen runner.00:00: Inside a dim wooden-shuttered room. The runner's hands enter the bottom of frame, shoving open a heavy wooden door outward. Light floods in.00:01-00:04: The runner bursts out into a narrow medina alley at sunset. Stone underfoot, footsteps clack audibly. They sprint forward between ochre mud-brick walls, past hanging rugs and brass lanterns catching the last orange light. A cat darts out of the way. They round a sharp right at 00:03, hands briefly visible pushing off the wall for balance. Jacket fabric rustles in the audio foreground.00:04-00:09: The alley opens suddenly into a crowded spice market. Pace slows slightly as they weave through bodies: a vendor with a brass teapot, women in colorful hijabs, conical piles of saffron and turmeric, hanging lanterns overhead. A shopkeeper shouts something in Arabic, gesturing at the camera. The runner ducks under a low-hanging textile at 00:06, brushing it aside with an arm. A motorcycle horn beeps and passes close, the camera flinching away from it. The market audio is dense and layered in spatial 3D.00:09-00:12: The runner accelerates again, emerging into a narrower stone passage lit only by bare hanging bulbs. Echo audibly enters the audio as the space constrains. Breathing is ragged. Heartbeat becomes dominant in the mix. They take a sharp left at 00:10, stumbling briefly, one hand catching the wall.00:12-00:15: The passage opens onto an empty rooftop overlooking the entire medina at sunset. The runner stops abruptly, takes two steps forward, doubles over with hands on knees, gasping. Audio collapses: market noise cuts out, heartbeat softens, only wind and exhausted breathing remain. They straighten slowly, hands on their head, looking out at hundreds of satellite dishes and minarets silhouetted against a vast orange-and-pink sky. A distant muezzin call to prayer begins to rise in the last two seconds. Hold on this moment of exhausted awe.Single continuous unbroken take, no cuts, no speed ramps. Breathing audio synchronized to physical exertion. Footstep audio changes material with surface (wood thud, stone clack, tile scrape). Spatial audio: market sounds swell on entry, collapse on exit to the rooftop, distance cues correct for the muezzin. Camera shake organic to the running rhythm, calming on arrival. Practical lighting only: last-of-sunset outdoors, warm bare bulbs in the corridor.
10. The Bunker Arrival - Character Consistency Across Three Environments
Showcases: image-anchored character identity held across three radically different lighting environments, multi-cam interior coverage, ambient radio transmission dialogue, scale reveal via continuous pull-back, industrial sound design.
REFERENCE IMAGE - generate with Nano Banana 2 or Z-Image Turbo, upload as @Image1:
Medium shot editorial portrait of a 34-year-old woman with warm olive skin, sharp jawline, a small scar through her right eyebrow, deep brown eyes with flecks of amber, dark brown hair pulled back into a low practical bun with a few loose strands framing her face. She wears a weathered navy-blue field jacket over a cream henley shirt, a thin silver chain at her throat. Neutral expression, slight intensity in the eyes. Lit by soft overcast natural daylight, no harsh shadows. Plain desaturated grey-blue background. Shot on Fujifilm GFX 100 with 110mm lens, f/4, editorial documentary aesthetic, visible skin texture and pores, natural color grade, fine grain. Not glamour. Real. 3:4 crop.
SEEDANCE 2.0 PROMPT:
@Image1 is the character reference. The character in this video has identical facial features, hair, jacket, shirt, and necklace to @Image1 across every shot. Same small scar through the right eyebrow. Same hairline, same jaw, same eyes.【Style】Contemporary cinematic thriller in the tradition of Denis Villeneuve's Sicario, 35mm spherical, desaturated teal-and-amber palette, overcast natural light, handheld operator energy but stable, fine grain. 2.39:1.【Duration】15 seconds【Native Audio】Layered: coastal wind throughout, distant gull cries, footsteps on gravel at 00:01-00:03, a heavy metal door closing at 00:05 with industrial weight, low industrial hum inside the corridor, a muffled male radio transmission at 00:08 saying in English "clearance confirmed, you are go," electronic lock beep at 00:10, a low bass drone building from 00:11 to peak at 00:15.[00:00-00:05] Shot 1: Wide to medium, slow tracking push behind her. The character from @Image1 walks alone along a bleak windswept gravel path toward an industrial concrete bunker set into a coastal cliff. Grey overcast sky, dark ocean on the horizon, scattered tough coastal grass. Camera tracks behind her at her pace, pushing forward, her back to us. Her navy jacket flaps in the wind. Her hair is fully tied back now. At 00:03 she reaches the heavy steel door. One beat of hesitation, she exhales, then pushes the door open and steps inside. The door closes behind her with a weighty metallic thud at 00:05.[00:05-00:10] Shot 2: Multi-angle interior coverage, cold institutional light. Inside: a long concrete corridor lit by cold fluorescent panels, pale green walls, industrial piping along the ceiling. Sharp cut to her face in close-up at 00:05: the same scar, same eyes, same loose hair strands, exactly matching @Image1 despite the new cool sterile lighting. She walks forward. Cut to a reverse medium, tracking her from in front, her expression focused and alert. Cut to an overhead angle as she passes underneath, footsteps echoing. At 00:08 a radio transmission crackles from a speaker mounted in the corridor: a male voice in English, clearly audible: "clearance confirmed, you are go." She reaches a final door with a keypad. She enters a code at 00:10, the electronic lock beeps, the keypad light turns green.[00:10-00:15] Shot 3: Slow dolly through doorway into scale reveal, then continuous pull-back. The door slides open. Camera dollies slowly forward through the doorway with her. Bass drone builds. Beyond the door: a vast cavernous hangar-like space, impossibly large, lit by distant overhead work lights, filled with identical rows of towering black server monoliths stretching into the far distance. Camera continues pulling back past her as she walks forward, revealing the full scale of the space, her small silhouette dwarfed against the vastness. Bass drone reaches its peak. Final frame at 00:15 holds on her tiny figure stepping deeper into the enormous room.Character facial features and wardrobe must remain absolutely identical to @Image1 across all three very different lighting environments: overcast exterior, cool fluorescent corridor, dim industrial hangar. Same scar through the right eyebrow visible in all shots where her face is visible. Same jacket. Same necklace chain visible in close-up. Hair bun consistent. The hangar in Shot 3 must feel architecturally vast and cinematic. Audio mix professional: ambient layers sit under the bass drone, key SFX (door thud, radio transmission, lock beep) clear and defined in the mix.
11. The Midnight Ramen - Food Cinematography + Commercial ASMR Audio
Showcases: macro food physics (steam, oil splatter, liquid pour, noodle lift), multi-angle bowl assembly with continuity, warm practical-source-only lighting, ASMR-grade native audio design (sizzle, clink, slurp), quiet character performance without dialogue, restaurant atmosphere.
【Style】Wong Kar-wai atmospheric restaurant cinematography crossed with Netflix Chef's Table food documentary, warm tungsten practical lighting only, handheld but stable operator energy, 35mm macro for food inserts, 50mm for wides, fine grain, deeply saturated warm palette with deep blacks. 2.39:1.【Duration】15 seconds【Native Audio】Commercial-grade food ASMR design: loud dramatic sizzle of fat hitting hot oil at 00:02 with audible crackling, rolling boiling broth throughout the kitchen shots, the metallic clink of a ladle on steel at 00:04, a soft meditative shakuhachi flute underscore at low level, faint rain audible through windows, the scrape of wooden chopsticks, a satisfying noodle slurp at 00:12 that is enthusiastic but not exaggerated, ambient restaurant chatter kept very low. No dialogue.[00:00-00:05] Shot 1: Extreme macro close-up on a blackened steel skillet, locked static. A dark cramped Tokyo ramen shop at 1 AM, lit only by warm tungsten bulbs hanging low from the ceiling and the red-orange glow of a gas burner beneath the skillet. A weathered chef's hands enter frame and place three thick slices of chashu pork belly into shimmering hot oil. At 00:02 the pork hits the oil with a loud dramatic sizzle, fat crackling audibly, individual oil droplets jumping up toward the lens, thick steam rising immediately and backlit golden by the warm bulbs. At 00:04 the chef's other hand enters with long metal tongs, flipping each slice with confident practiced motion, the metallic clink of tongs on steel clearly defined in the audio.[00:05-00:10] Shot 2: Three rapid macro cuts of bowl assembly, each approximately 1.5 seconds. A: overhead shot on a large deep ceramic bowl on a worn wooden counter, as a steel ladle pours thick golden tonkotsu broth in a slow circular swirl, heavy steam billowing upward, reflections shimmering on the surface as it fills. B: macro close-up on the chef's hands lifting a nest of fresh yellow curly noodles from boiling water with chopsticks, water steaming off them, and laying them into the broth in one confident motion. C: overhead macro as the chef places the seared chashu slices one by one, a soft-boiled egg cut in half revealing a perfectly jammy orange yolk, a handful of sliced green scallion rings, a crisp sheet of nori leaning against the rim, and a single twist of yellow yuzu peel at the center. Steam continues rising. Every placement lands in perfect composition.[00:10-00:15] Shot 3: Medium two-shot, static, warm atmosphere. Camera pulls back to reveal a middle-aged Japanese chef in his late 40s in a simple navy apron and white headband, sliding the finished steaming bowl across the wooden counter toward a customer. The customer is a salaryman in a rumpled grey suit with a loosened tie, late 30s, visibly exhausted. He picks up wooden chopsticks, lifts a portion of noodles from the steam, and slurps them audibly at 00:12, his eyes closing briefly in visible satisfaction. The chef watches with a small proud smile, arms now folded. Camera holds as the customer opens his eyes, exhales slowly, warmth visible on his face and a faint blush of heat from the broth. Shakuhachi note resolves. Hold on the moment.Steam physics throughout must be thick, heavy, and correctly backlit by practical tungsten sources. Broth surface tension and reflection physically accurate. Chef's hands move with practiced speed and confidence, not tentative. Noodles hold realistic shape when lifted. Bowl contents must have continuity across the three macro assembly shots: if an egg is placed in shot 2C, the completed bowl in shot 3 must contain the same egg, same chashu, same nori. Chef's face and apron identical across any shot in which he is visible. Rain on windows is suggested subtly in ambient mix, not emphasized. All lighting motivated by practical ceiling bulbs and the gas burner. Food commercial broadcast quality.
12. The Ghost Cat - Nature Documentary with Voiceover Narration
Showcases: voiceover narration that is spatially disembodied (not a visible speaker, different use case than interview lip-sync), realistic animal rigging and fur physics in close-up, epic landscape scale, telephoto wildlife compression, speed-ramp predatory leap, layered nature ambient with restrained orchestral score sitting below narration.
【Style】BBC Planet Earth blue-chip natural history cinematography, ARRI Alexa LF aesthetic, long-lens telephoto compression for wildlife, extreme macro for detail inserts, wide drone plates for environment, pristine color science, minimal grain, cold desaturated palette accented by the warm gold of low-angle sun, 16:9 widescreen, 4K reference feel.【Duration】15 seconds【Native Audio】Three distinct layers. First: David Attenborough-style male British voiceover in English, calm and intimate in tone, sitting forward in the mix. Second: pristine natural ambient (howling high-altitude wind as the base, distant raven calls, the soft crunch of displaced snow under paws). Third: a restrained orchestral underscore building from 00:08 and peaking at the leap at 00:13, never overpowering the narration. No dialogue beyond narration.[00:00-00:04] Shot 1: Epic wide drone plate, slow forward push at altitude. A vast Himalayan valley at dawn, jagged snow-covered peaks receding into the distance under a pale blue sky shifting to gold at the horizon. A solitary figure, small in frame, moves slowly along a high rocky spine: a snow leopard, pale grey and black rosette fur almost indistinguishable from the weathered stone around her. Camera pushes forward at a steady drone pace. Narrator speaks at 00:01, calm and British: "At fourteen thousand feet, in the heart of the Karakoram, the ghost of the mountains begins her hunt." Howling altitude wind is present throughout.[00:04-00:08] Shot 2: Long-lens tracking close-up on her face, compressed shallow depth. Cut to a tight tracking close-up, camera panning with her at the pace of a slow walk. Her pale green-grey eyes are fixed on something off-frame to the right. Her breath fogs visibly in the cold. Delicate ice crystals cling to her whiskers and the fur around her muzzle. Individual guard hairs are visible in the extreme detail. The background is a creamy out-of-focus wash of white snow and grey stone. Narrator continues at 00:05: "She has waited three days. Her cubs, somewhere above, have not eaten for two." A distant raven calls in the ambient layer.[00:08-00:13] Shot 3: Wide reveal of her target, then rapid three-cut hunt buildup. Wide shot reveals what she has been watching: a blue sheep, a bharal, grazing alone on a lichen-covered rock face roughly two hundred meters below her position. The orchestral underscore begins at 00:08. The leopard crouches low to the ground, body flattening against the snow. Three rapid cuts at approximately one second each. A: low-angle extreme close-up on her paws as they shift silently, individual pads pressing into soft powder, no audible step. B: the bharal's head rising abruptly, ears pricking, sensing something indefinable. C: locked wide shot of both animals in the same frame, the tension held, the score rising. At 00:12 the leopard explodes into motion. Camera speed ramps to 240fps as she launches off the ridge into the air, fully airborne, tail streaming behind her like a banner, front paws extended, silhouetted against the vast gold-lit landscape. Score peaks at 00:13.[00:13-00:15] Shot 4: Cutaway to wide vista, strike not shown, real-time resumed. Camera cuts away from the strike itself to a wide vista shot of the valley, the morning sun just cresting the highest peak, golden light flooding across the snowfield. Narrator delivers the closing line, softer: "In the mountains, survival belongs to the patient." Score resolves to a sustained tonic. A single distant raven call. Slow fade.Animal rigging and movement must be anatomically correct for a snow leopard in every shot: correct gait, weight distribution, musculature under fur. Fur responds realistically to wind and motion, individual hairs visible in extreme close-up. Snow displaces physically under paw pressure. Cold-air breath effects visible on both animals in their respective shots. Landscape scale must feel genuinely vast and real, never a set or matte painting. Voiceover narration is disembodied (no speaker visible in any frame), delivered in a calm intimate register matching the cadence and warmth of David Attenborough. Score mixed professionally below narration level. Multi-angle wildlife coverage maintains correct spatial relationship of the ridge above and the bharal on the rock face below. Ice crystal detail on whiskers must be present in the close-up.
Common Mistakes That Wreck Your Output
-
Writing negative prompts as "don't" instructions. Phrase constraints as positive presence. "Maintain consistent facial features and clothing throughout" beats "no face drift." "Clean composition with one clear subject" beats "not cluttered."
-
Loading 40 adjectives into one prompt. Past about 120 words for a single shot (not a multi-block script), attention drifts and instructions start to conflict. Keep each block tight and use shot scripts for longer pieces.
-
Mixing camera movement with subject movement in the same sentence. "Spinning camera around a dancing person" is the classic broken prompt. Split it. "The dancer spins slowly. Camera holds a fixed medium shot."
-
Using generic quality words as style anchors. "Cinematic" on its own does almost nothing. "Denis Villeneuve desaturated teal and amber palette, IMAX 70mm grain, anamorphic lens flares" gives the model a specific look to aim for.
-
Exceeding file limits. Maximum 9 images, 3 videos, 3 audio files, 12 total. Over that, the request fails silently or truncates.
-
Describing emotions instead of physical tells. "She looks sad" does very little. "A single tear slides down her left cheek, her lower lip trembles for half a second" gives Seedance something to actually render.
-
Forgetting to re-anchor on long extension chains. Past 3 extensions, drift risk increases. Periodically re-upload your original character reference and reinforce wardrobe and lighting in every prompt.
Where to Actually Run Seedance 2.0 (And Why Fliki Is Useful Here)
You can access Seedance 2.0 through several platforms. The official ByteDance deployment on Volcengine is the source, with wider rollout through partners over 2026. But if you want to skip model-juggling entirely and run Seedance 2.0 alongside the rest of your production stack in a single tab, Fliki is built for exactly this workflow.
The Fliki playground gives you access to Seedance 2.0 alongside every other leading video and image model, plus a full AI video editor, 2,500+ AI voices across 80+ languages, voice cloning, and multilingual translation. The workflow I actually recommend: write your shot scripts using the patterns in this guide, paste them into Fliki's text-to-video flow, generate the clips, layer in a cloned voiceover, chain extensions where you need longer sequences, and export a finished piece, all in one tab.
For longer narrative pieces, Fliki's AI reel generator pairs well with Seedance scripts, and the built-in editor handles the stitching and audio balancing without forcing you to export to a separate NLE. For one-off clips, the playground model selector lets you test the same prompt across Seedance 2.0, other top models, and iterate in minutes.
One subscription, one export, no model-jumping. If you are running commercial work, that single-workflow advantage compounds quickly.
The Bottom Line
Seedance 2.0 is not hard. It is just different. Once you stop prompting it like a Midjourney spell book and start prompting it like a director briefing a cinematographer, with specific cameras, specific lighting physics, specific style anchors, and named shots with timecodes, the model stops producing jittery soup and starts producing actual cinema.
Stack the shot-script format on top of that and you get narrative coherence. Layer in @tag references (image for identity, video for motion, audio for rhythm) and you get the multimodal precision that separates pro output from hobbyist output. Use the physics-first vocabulary (sharp falloff, rain-slicked, 240fps feel, dark navy not black) and your shots pick up a depth that no amount of "cinematic, epic, masterpiece" can buy.
You now have more working Seedance 2.0 knowledge than 95% of creators on the internet today. Bookmark the physics vocabulary. Save the shot-script template. Steal all 12 prompts. And go make something that does not look like everyone else's.



