video model · by ByteDance

OmniHuman 1.5 AI Talking-Head Generator

Animate a portrait photo with audio to produce a realistic talking-head video. OmniHuman 1.5 is ByteDance's lip-sync video model - upload a still portrait and an audio clip, get back a fully animated talking head with synchronized mouth, expression, and head motion.

Generated with OmniHuman 1.5

A handful of OmniHuman 1.5 clips generated inside Fliki. No edits, no post.

Prompt

Okay so update — I tried it for a full week and… honestly? I'm not even mad. I went in expecting absolutely nothing, and now I'm three notebooks deep. Like, if you've been on the fence about this — just try it. Trust me on this one.

Prompt

We just shipped something I've been wanting to build for two years. Two whole years. And honestly? I almost didn't. Three rewrites, two pivots, one really bad weekend where I nearly deleted the repo. But it's live now, and it actually works. Go check it out.

Prompt

We started this company with one stubborn belief — that quality is not a feature. It is the whole product. Twelve years in, we have not changed our minds. Not once. Same hands. Same standards. Same promise. That is who we are.

100M+VIDEOS CREATED
12M+USERS WORLDWIDE
80+LANGUAGES SUPPORTED

Trusted by 50,000+ companies worldwide

What makes OmniHuman 1.5 the right talking-head model

Photo + audio talking head

OmniHuman 1.5 takes a single portrait image and an audio clip and produces a realistic talking-head video. No video reference needed — just a photo and the audio you want spoken.

Tight lip-sync accuracy

Mouth shape and timing align with the audio at phoneme level — the kind of sync quality you'd otherwise need a recorded performance for.

Natural micro-expressions

Beyond mouth motion, OmniHuman 1.5 generates eye movement, blinks, and subtle head motion — the small cues that make talking-head video feel alive rather than uncanny.

Identity retention

OmniHuman 1.5 holds the source portrait’s identity across the full clip — likeness, styling, and lighting carry through without drift.

10-second clips at 720p

OmniHuman 1.5 outputs 10-second 720p talking-head clips. Stitch multiple generations together inside Fliki to build longer presenter sequences.

No video source required

Unlike face-swap or motion-transfer workflows, OmniHuman 1.5 doesn't need an existing video clip. A single photo and an audio file are enough — that's a much lower asset bar for production.

Built for presenter content

Founder-led explainers, multilingual versions of a single recording, personalized outreach, and presenter cuts for ads and social — all from a single source portrait.

Inside Fliki, no setup

Run OmniHuman 1.5 through Fliki's standard interface — no separate ByteDance account, no Runware setup. Upload, attach audio, generate.

Works across visual styles

OmniHuman 1.5 handles realistic photographs, anime characters, illustrated portraits, and stylized artwork — plus non-human subjects like animals and anthropomorphic figures. The same model carries a brand mascot or a real spokesperson.

How it works

How to make a talking-head video with OmniHuman 1.5

OmniHuman 1.5 runs inside Fliki's lip-sync workflow. Five steps.

Fliki prompt input showing a cinematic text-to-video description for OmniHuman 1.5 AI video generator
Step 1

Upload a portrait image

Use a clear, front-facing portrait. Better source clarity gives the model more to work with — sharp lighting, unobstructed face, neutral expression all help.

Fliki model selector dropdown with OmniHuman 1.5 chosen for AI video generation
Step 2

Select OmniHuman 1.5

Pick OmniHuman 1.5 from Fliki's lip-sync model selector. It's designed specifically for image-plus-audio talking-head workflows.

Choose 16:9, 9:16, or 1:1 aspect ratio for OmniHuman 1.5 video generation on Fliki
Step 3

Upload your audio

Upload the dialogue or voiceover audio you want the portrait to speak. OmniHuman 1.5 syncs mouth movement, micro-expressions, and head motion to the audio.

Set video duration on the Fliki slider for OmniHuman 1.5 multi-shot AI video generation
Step 4

Generate at 720p

Hit Generate. OmniHuman 1.5 produces a 10-second 720p talking-head clip with the source identity and the new spoken content.

Upload an optional reference image to anchor subject and style with OmniHuman 1.5 on Fliki
Step 5

Drop into your timeline

Use the output as a presenter clip, an explainer talking head, a personalized message, or a social hook. Inside Fliki you can stitch multiple OmniHuman clips together for longer presenter sequences.

OmniHuman 1.5 FAQ

Frequently asked questions

Everything you need to know about generating talking-head video with OmniHuman 1.5 inside Fliki.

What is OmniHuman 1.5?

OmniHuman 1.5 is ByteDance's lip-sync video model. It takes a single portrait image and an audio clip, and produces a realistic talking-head video with synchronized mouth movement, expressions, and natural head motion.

What inputs does OmniHuman 1.5 need?

A single portrait image and an audio file. No source video required, no text prompt — the audio drives the spoken content and timing.

How long are OmniHuman 1.5 outputs?

OmniHuman 1.5 generates 10-second talking-head clips. Stitch multiple generations together inside Fliki to build longer presenter sequences.

What resolution does OmniHuman 1.5 output?

OmniHuman 1.5 generates at 720p, optimized for talking-head quality and lip-sync sharpness.

How is OmniHuman 1.5 different from a regular video model?

It’s lip-sync only — designed specifically for the photo-plus-audio talking-head workflow. Regular text-to-video and image-to-video models like Veo 3.1 Fast or Kling 3.0 Pro generate full scenes; OmniHuman 1.5 specializes in animated portraits driven by audio.

Can I use any portrait photo?

Most clear, front-facing portraits work well. Sharp lighting, unobstructed face, and neutral expression give the model the strongest source to work with.

Still curious?

Try Fliki free in your browser, no credit card required.

Start free

Can OmniHuman 1.5 work in any language?

Yes. The model syncs to whatever audio you provide — English, Spanish, Mandarin, anything. Pair it with Fliki’s multilingual voices to localize a presenter recording in minutes.

Can I use OmniHuman 1.5 for commercial content?

Yes, subject to Fliki's and ByteDance's terms, plus the rights you hold to the source portrait. Always make sure you have permission to use someone's likeness in commercial contexts.

OmniHuman 1.5 · Free forever plan

Turn a photo into a talking head.

Upload a portrait, attach audio, get a realistic 720p talking-head clip. Free to start, no credit card required.

Try OmniHuman 1.5 free

Free forever plan · No credit card required · Cancel anytime