OmniHuman 1.5 - AI Talking-Head Video from Photo + Audio

Generated with OmniHuman 1.5

A handful of OmniHuman 1.5 clips generated inside Fliki. No edits, no post.

Prompt

Okay so update — I tried it for a full week and… honestly? I'm not even mad. I went in expecting absolutely nothing, and now I'm three notebooks deep. Like, if you've been on the fence about this — just try it. Trust me on this one.

Prompt

We just shipped something I've been wanting to build for two years. Two whole years. And honestly? I almost didn't. Three rewrites, two pivots, one really bad weekend where I nearly deleted the repo. But it's live now, and it actually works. Go check it out.

Prompt

We started this company with one stubborn belief — that quality is not a feature. It is the whole product. Twelve years in, we have not changed our minds. Not once. Same hands. Same standards. Same promise. That is who we are.

Prompt

Okay so update — I tried it for a full week and… honestly? I'm not even mad. I went in expecting absolutely nothing, and now I'm three notebooks deep. Like, if you've been on the fence about this — just try it. Trust me on this one.

Prompt

We just shipped something I've been wanting to build for two years. Two whole years. And honestly? I almost didn't. Three rewrites, two pivots, one really bad weekend where I nearly deleted the repo. But it's live now, and it actually works. Go check it out.

Prompt

We started this company with one stubborn belief — that quality is not a feature. It is the whole product. Twelve years in, we have not changed our minds. Not once. Same hands. Same standards. Same promise. That is who we are.

100M+

VIDEOS CREATED

12M+

USERS WORLDWIDE

80+

LANGUAGES SUPPORTED

What makes OmniHuman 1.5 the right talking-head model

Photo + audio talking head

OmniHuman 1.5 takes a single portrait image and an audio clip and produces a realistic talking-head video. No video reference needed — just a photo and the audio you want spoken.

Tight lip-sync accuracy

Mouth shape and timing align with the audio at phoneme level — the kind of sync quality you'd otherwise need a recorded performance for.

Natural micro-expressions

Beyond mouth motion, OmniHuman 1.5 generates eye movement, blinks, and subtle head motion — the small cues that make talking-head video feel alive rather than uncanny.

Identity retention

OmniHuman 1.5 holds the source portrait’s identity across the full clip — likeness, styling, and lighting carry through without drift.

10-second clips at 720p

OmniHuman 1.5 outputs 10-second 720p talking-head clips. Stitch multiple generations together inside Fliki to build longer presenter sequences.

No video source required

Unlike face-swap or motion-transfer workflows, OmniHuman 1.5 doesn't need an existing video clip. A single photo and an audio file are enough — that's a much lower asset bar for production.

Built for presenter content

Founder-led explainers, multilingual versions of a single recording, personalized outreach, and presenter cuts for ads and social — all from a single source portrait.

Inside Fliki, no setup

Run OmniHuman 1.5 through Fliki's standard interface — no separate ByteDance account, no Runware setup. Upload, attach audio, generate.

Works across visual styles

OmniHuman 1.5 handles realistic photographs, anime characters, illustrated portraits, and stylized artwork — plus non-human subjects like animals and anthropomorphic figures. The same model carries a brand mascot or a real spokesperson.

How it works

How to make a talking-head video with OmniHuman 1.5

OmniHuman 1.5 runs inside Fliki's lip-sync workflow. Five steps.

Fliki prompt input showing a cinematic text-to-video description for OmniHuman 1.5 AI video generator

Step 1

Upload a portrait image

Use a clear, front-facing portrait. Better source clarity gives the model more to work with — sharp lighting, unobstructed face, neutral expression all help.

Fliki model selector dropdown with OmniHuman 1.5 chosen for AI video generation

Step 2

Select OmniHuman 1.5

Pick OmniHuman 1.5 from Fliki's lip-sync model selector. It's designed specifically for image-plus-audio talking-head workflows.

Choose 16:9, 9:16, or 1:1 aspect ratio for OmniHuman 1.5 video generation on Fliki

Step 3

Upload your audio

Upload the dialogue or voiceover audio you want the portrait to speak. OmniHuman 1.5 syncs mouth movement, micro-expressions, and head motion to the audio.

Set video duration on the Fliki slider for OmniHuman 1.5 multi-shot AI video generation

Step 4

Generate at 720p

Hit Generate. OmniHuman 1.5 produces a 10-second 720p talking-head clip with the source identity and the new spoken content.

Upload an optional reference image to anchor subject and style with OmniHuman 1.5 on Fliki

Step 5

Drop into your timeline

Use the output as a presenter clip, an explainer talking head, a personalized message, or a social hook. Inside Fliki you can stitch multiple OmniHuman clips together for longer presenter sequences.

OmniHuman 1.5 FAQ

Frequently asked questions

Everything you need to know about generating talking-head video with OmniHuman 1.5 inside Fliki.

OmniHuman 1.5 is ByteDance's lip-sync video model. It takes a single portrait image and an audio clip, and produces a realistic talking-head video with synchronized mouth movement, expressions, and natural head motion.

A single portrait image and an audio file. No source video required, no text prompt — the audio drives the spoken content and timing.

OmniHuman 1.5 generates 10-second talking-head clips. Stitch multiple generations together inside Fliki to build longer presenter sequences.

OmniHuman 1.5 generates at 720p, optimized for talking-head quality and lip-sync sharpness.

It’s lip-sync only — designed specifically for the photo-plus-audio talking-head workflow. Regular text-to-video and image-to-video models like Veo 3.1 Fast or Kling 3.0 Pro generate full scenes; OmniHuman 1.5 specializes in animated portraits driven by audio.

Most clear, front-facing portraits work well. Sharp lighting, unobstructed face, and neutral expression give the model the strongest source to work with.

Yes. The model syncs to whatever audio you provide — English, Spanish, Mandarin, anything. Pair it with Fliki’s multilingual voices to localize a presenter recording in minutes.

Yes, subject to Fliki's and ByteDance's terms, plus the rights you hold to the source portrait. Always make sure you have permission to use someone's likeness in commercial contexts.

Still curious?

Try Fliki free in your browser, no credit card required.

Start free

More from Fliki

Tutorial

How to Create AI Avatar Videos in 2 Minutes

Learn how to create AI avatar videos and get a humanly touch to your videos with our step-by-step guide tailored for businesses and content creators.

Tutorial

How to Create a Talking Avatar from a Photo (Step-by-Step Guide)

Learn how to create a talking avatar from a photo in under 15 minutes. Follow this simple step-by-step guide to turn any portrait into an AI-powered video avatar.

Guide

Digital Humans: The Game-Changer in Training (and the Risks No One Talks About)

Learn how digital humans can boost training and communication, while tackling trust, privacy, and bias plus tips to pilot them in your workplace.

OmniHuman 1.5 · Free forever plan

Turn a photo into a talking head.

Upload a portrait, attach audio, get a realistic 720p talking-head clip. Free to start, no credit card required.

Try OmniHuman 1.5 free

Free forever plan · No credit card required · Cancel anytime

OmniHuman 1.5 AI Talking-Head Generator

Generated with OmniHuman 1.5

What makes OmniHuman 1.5 the right talking-head model

Photo + audio talking head

Tight lip-sync accuracy

Natural micro-expressions

Identity retention

10-second clips at 720p

No video source required

Built for presenter content

Inside Fliki, no setup

Works across visual styles

How to make a talking-head video with OmniHuman 1.5

Upload a portrait image

Select OmniHuman 1.5

Upload your audio

Generate at 720p

Drop into your timeline

Frequently asked questions

What is OmniHuman 1.5?

What inputs does OmniHuman 1.5 need?

How long are OmniHuman 1.5 outputs?

What resolution does OmniHuman 1.5 output?

How is OmniHuman 1.5 different from a regular video model?

Can I use any portrait photo?

Can OmniHuman 1.5 work in any language?

Can I use OmniHuman 1.5 for commercial content?

More from Fliki

How to Create AI Avatar Videos in 2 Minutes

How to Create a Talking Avatar from a Photo (Step-by-Step Guide)

Digital Humans: The Game-Changer in Training (and the Risks No One Talks About)

Discover more

Discover features

Turn a photo into a talking head.