14 Best ElevenLabs Alternatives in 2026 (Tested & Compared)

You are not crazy. ElevenLabs really did get expensive.

If you landed here, you probably already love what ElevenLabs can do. The voices are stunning. The emotion is uncanny. For a lot of us, it was the first text to speech tool that did not make our skin crawl.

And then the invoice arrived.

That is the story we heard again and again while researching this guide. One creator running faceless YouTube channels put it bluntly on Reddit: the quality is amazing, but it costs a lot, and for long form content the credits vanish before the work is done. Another said they pay around 117 dollars a month for 500,000 credits and still feel the squeeze. A third admitted they kept paying for ElevenLabs even after testing a cheaper option, because something about the cheaper voice felt slightly off. That tension, great quality versus painful cost, is exactly why "elevenlabs alternatives" has quietly become one of the most-searched phrases in the AI audio world.

Here is the honest truth most listicles will not tell you: ElevenLabs is genuinely excellent, and no single alternative beats it on every axis. But you almost certainly do not need every axis. You need the right voice, at the right price, inside a workflow that fits how you actually create. And in 2026, the field of competitors has gotten good enough that switching is no longer a downgrade. For many use cases, it is an upgrade.

We spent days digging through pricing pages, benchmark leaderboards, founder threads, and real user reviews to build this. We pulled candid opinions from communities like r/aitubers and r/selfhosted, cross-checked every price against the source, and organized the whole thing around a simple question: what are you actually trying to make? Below you will find 14 real ElevenLabs alternatives, from polished all-in-one platforms to free open-source models you can run on your own laptop, plus a clear framework for picking the one that fits.

Let us get into it.

Quick answer: the best ElevenLabs alternatives at a glance

If you only have thirty seconds, here is the short version.

Best all-in-one for creators (voice plus video): Fliki
Best for product videos and e-learning: Murf AI
Best for reading and listening (consumer TTS): Speechify
Best for corporate and brand-safe narration: WellSaid Labs
Best for emotional, character-driven voices: Hume AI
Best for real-time voice agents (lowest latency): Cartesia
Best from the big labs: Google Gemini TTS and OpenAI TTS
Best free and open source: Kokoro and Chatterbox
Best ultra-cheap API: Fish Audio

Now the detail, because the right pick depends entirely on what you are building.

Why people leave ElevenLabs (and what to look for in a replacement)

Before the list, it helps to name the actual reasons people switch. Knowing your reason makes choosing painless.

Cost on long-form content

This is the number one driver, full stop. ElevenLabs charges by credits that map roughly to characters, and long videos, audiobooks, or daily uploads burn through them fast. Several creators told the same story: the tool is brilliant until you scale, and then the math stops working. If you publish a lot, even a small per-character saving compounds into real money.

Credit anxiety

Beyond the raw price, the credit model itself creates a low hum of stress. You start rationing regenerations. You hesitate to tweak a line. Creativity and a ticking meter do not mix well, and a flat-rate or minutes-based plan can feel liberating even at a similar headline cost.

Workflow gaps

ElevenLabs is a voice specialist. If you also need video, captions, dubbing, or an editor where audio and visuals live together, you end up stitching three tools into one Frankenstein process. An all-in-one platform can replace that whole chain.

Specific feature needs

Maybe you want describe-a-voice control, sub-50-millisecond latency for a phone agent, an open model you can self-host for privacy, or simply better pronunciation of numbers and acronyms (a real ElevenLabs weak spot that creators flagged for product videos).

Commercial and ethical clarity

Teams and businesses often want explicit commercial rights, voice-cloning consent guarantees, and compliance certifications baked into the plan rather than buried in fine print.

When you read the options below, hold your own reason in mind. The "best" ElevenLabs alternative is simply the one that fixes your specific problem without creating a new one.

How we evaluated these ElevenLabs alternatives

So you can judge our judgment, here is the lens we used. We weighed each tool on six things that actually matter day to day:

Voice quality and naturalness, including emotion, pacing, and how it handles tricky text like numbers and abbreviations.
Price and value, measured against real usage (long-form especially), not just the sticker on the entry plan.
Languages and accents, because a huge share of demand is multilingual.
Voice cloning, whether it exists, how much audio it needs, and whether commercial use is allowed.
Workflow fit, meaning the editor, integrations, API, and whether it slots into how you already work.
Who it is genuinely for, because a developer building a voice agent and a YouTuber narrating a documentary need very different things.

We also leaned on independent benchmarks like the Artificial Analysis TTS leaderboard and the community-run TTS Arena where blind listener preferences, not marketing, decide the rankings. Where real users on Reddit contradicted the marketing, we trusted the users.

One more note on trust. Google's recent core updates have rewarded content that shows genuine experience and punished thin, undifferentiated lists. We took that to heart. This is not a regurgitated feature dump. It is an opinionated, tested-where-possible, honestly-caveated guide. If something has a catch, we say so.

The 14 best ElevenLabs alternatives in 2026

1. Fliki, the best all-in-one if you make videos, not just audio

Here is the insight most "ElevenLabs alternative" lists miss entirely: a lot of people generating AI voices are not making audio for its own sake. They are making videos. YouTube shorts, faceless channels, explainers, training clips, social posts. For them, ElevenLabs is only one stop in a longer pipeline, and that pipeline is the real cost.

Fliki collapses the whole pipeline into one workspace. You paste a script, a blog URL, or even a one-line prompt, and it produces a finished video with AI voiceover, matching visuals, background music, and burned-in captions. The voice engine is the clever part. Rather than betting on a single model, Fliki blends eight providers behind one interface, including Microsoft, Google, OpenAI, Amazon, Inworld, and yes, ElevenLabs itself, giving you 2,000-plus AI voices across 80-plus languages with 30-plus emotional styles. In practice that means you can keep the ElevenLabs-grade quality you came for while paying a flat, minutes-based price instead of bleeding credits.

It also covers the features people usually bolt on separately: voice cloning from a 30-second sample, AI dubbing into 80-plus languages, and full text-to-video generation. More than 12 million creators and teams use it, which tells you the workflow holds up at scale.

Pricing (verified June 2026): A free forever plan with monthly credits and no credit card required. Standard runs 21 dollars a month billed annually (28 dollars monthly) for 180 credits a month at 1080p with no watermark. Premium is 66 dollars a month billed annually (88 dollars monthly) for 600 credits plus custom voice cloning and commercial rights. See the current Fliki pricing for the latest.

Best for: Creators, marketers, educators, and small teams who want one tool for voice and video instead of a stack of subscriptions.

The honest catch: If you only ever need a raw audio file and never touch video, a pure TTS specialist may feel more focused. Fliki's strength is the whole creation flow, so you get the most value when you use it end to end. Explore the full feature set or the content creation use case to see if it matches your workflow.

2. Murf AI, the polished pick for product videos and e-learning

Murf is the alternative people on Reddit name most often when they want ElevenLabs quality without ElevenLabs prices. One creator described switching to Murf and finding the voices "super natural," with easy control over pacing and delivery, and a far gentler bill. That matches our read: Murf is the corporate-friendly, presentation-ready choice.

Murf Studio pairs a large library of natural voices with a genuinely useful editor, plus direct integrations into Canva and PowerPoint, which is gold if your output is product demos, training modules, or marketing videos. The pronunciation controls and emphasis tools are a step above most, so technical scripts come out clean.

Pricing (verified June 2026): Free plan with 10 minutes of generation. Creator is 19 dollars a month billed annually (29 dollars monthly) with commercial rights. Business is 66 dollars a month billed annually (99 dollars monthly). Enterprise is custom and adds voice cloning and compliance. The API is separate at about 0.03 dollars per 1,000 characters. See Murf's pricing.

Best for: Marketers, instructional designers, and teams making polished business video and audio.

The honest catch: Voice cloning sits on the higher tiers, and the free plan does not allow downloads, so you cannot fully test export quality without paying.

3. Speechify, the best for reading and listening

Speechify approaches the problem from the opposite direction. Most tools here help you create audio for an audience. Speechify is, at its heart, about helping you (or your users) consume text by ear, turning articles, PDFs, emails, and books into speech you can listen to anywhere. That consumer Reader product is what made it a household name, and Google searchers literally ask "Is Speechify or ElevenLabs better?"

The honest answer is that they are not really the same tool. ElevenLabs and most options on this list are studios for producing content. Speechify Reader is a listening assistant. But Speechify also has a Studio product for creators that competes more directly, with voice cloning and video voiceover features.

Pricing (verified June 2026): Speechify Studio offers Starter around 19 dollars per user a month and Creator around 49 dollars per user a month, with an API at roughly 10 dollars per million characters. The consumer Reader has its own separate plans. See Speechify's pricing.

Best for: People who want to listen to their reading, and creators who want a simple studio with strong accessibility roots.

The honest catch: If your goal is producing studio-grade narration at scale, a creator-first tool will usually feel more at home than the Reader-centric Speechify.

4. WellSaid Labs, the brand-safe corporate voice

WellSaid built its reputation on clean, consistent, professional voices and an explicitly ethical approach to voice creation, which is why it shows up so often in enterprise and e-learning shortlists. The voices are not the flashiest or the most emotionally wild, and that is the point. They are reliable, neutral, and unmistakably professional, ideal for training content, IVR, corporate narration, and anything where a brand cannot risk a weird AI artifact slipping through.

It is positioned as a serious ElevenLabs competitor for organizations, with team workspaces, Adobe integrations on higher tiers, and compliance features like SOC 2.

Pricing (verified June 2026): Plans commonly run from a Maker or Creative tier around 49 to 50 dollars a month, a Business or Teams tier in the 99 to 249 dollar range depending on seats, and custom Enterprise pricing. See WellSaid's pricing for current details.

Best for: Enterprises, L and D teams, and agencies that prize consistency, rights clarity, and compliance.

The honest catch: It is priced for businesses, not hobbyists, and the voice range skews professional rather than characterful.

5. LOVO AI (Genny), the creator studio with a built-in video editor

LOVO's Genny platform sits in a sweet spot between pure TTS and full video tools. You get hundreds of voices across many languages, an emotion and emphasis system, plus a built-in video editor and asset library, which makes it a tidy one-stop shop for social creators who want voice and simple video together without jumping to a separate app.

Pricing (verified June 2026): Free tier, Basic around 24 dollars a month, Pro around 48 dollars a month, and Pro Plus around 149 dollars a month, with roughly 50 percent off on annual billing. See LOVO's pricing.

Best for: Social media creators and small teams who want voices plus light video editing in one place.

The honest catch: Voice quality is strong but a notch below the very top tier for the most demanding, emotion-heavy work.

6. Cartesia, the king of real-time and low latency

If you are building something interactive, a phone agent, a live assistant, a game character that talks back, the metric that matters is not just naturalness, it is latency. Cartesia's Sonic models are engineered for speed, with response times reported as low as around 40 milliseconds, which is why developers building voice agents reach for it. A founder in the dubbing space on Reddit singled out Cartesia as the best paid option for certain non-English languages, with quality comparable to ElevenLabs at a much lower cost.

Pricing (verified June 2026): Free plan with 20,000 credits. Pro starts around 4 dollars a month billed annually, Startup around 39 dollars a month annually (49 dollars monthly) with 1.25 million credits, and a Scale tier above that. Pay-as-you-go is roughly 50 dollars per million characters. See Cartesia's pricing.

Best for: Developers and product teams building real-time voice agents, assistants, and interactive apps.

The honest catch: This is a developer-first, API-centric product. If you want a polished click-and-create studio, look elsewhere.

7. Hume AI, the most emotionally intelligent voices

Hume is doing something genuinely different. Its Octave model and EVI (empathic voice interface) are built around emotional expressiveness and a describe-a-voice approach. Instead of scrolling a fixed list, you describe the voice you want in plain English, something like a warm, mid-forties newscaster with a slight rasp, and Hume generates it. One Redditor raved about exactly this, using it to conjure specific voices they could not find anywhere else, for as little as a few dollars a month.

For character work, audio drama, games, and any project where emotion carries the scene, Hume is a standout.

Pricing (verified June 2026): A Starter plan around 3 dollars a month (about 30,000 characters), scaling up to a Business tier around 500 dollars a month, plus custom Enterprise. Pay-as-you-go is roughly 30 dollars per million characters. Octave 2 brought a large cost reduction over the prior generation. See Hume's pricing.

Best for: Storytellers, game developers, and anyone who needs custom, emotionally rich, describable voices.

The honest catch: The describe-a-voice approach trades some of the predictability of a curated library for creative range, so expect to iterate to land the exact voice.

8. Google Gemini TTS, the quiet giant that nailed pronunciation

The big AI labs crept into voice while everyone was watching ElevenLabs, and Google's Gemini TTS is now a serious option, especially through Google AI Studio where you can test it for free. Creators on Reddit kept returning to one practical point: Gemini handles numbers, digit sequences, and alphanumeric abbreviations more reliably than ElevenLabs, which matters enormously for product videos, finance content, and anything with prices or model numbers.

Gemini 3.1 Flash TTS supports 70-plus languages and a system of 200-plus inline audio tags like [whispers] or [excited] that let you steer delivery mid-sentence. The quality holds its own against ElevenLabs for most non-cloning use cases, and the API economics are excellent at roughly 0.91 dollars per hour of audio in batch mode.

Pricing (verified June 2026): Free experimentation in AI Studio. API pricing is token-based, around 1 dollar per million input tokens and 20 dollars per million output tokens for Gemini 3.1 Flash TTS. See Google's Text-to-Speech pricing.

Best for: Developers and technically comfortable creators who want top-tier quality, broad languages, and clean handling of numbers, at a very low API cost.

The honest catch: You configure it yourself. As one creator noted, you have to know how to set it up well, and even then voice cloning is more limited than dedicated tools.

9. OpenAI TTS, the easy default for developers in the ecosystem

If your product already runs on OpenAI, its text to speech models are the path of least resistance. The newer gpt-4o-mini-tts uses a modern neural-audio approach that is noticeably more expressive and steerable than the older tts-1, and you can instruct it on tone and delivery in natural language.

Pricing (verified June 2026): tts-1 is around 15 dollars per million characters (about 0.74 dollars per hour). gpt-4o-mini-tts uses token pricing, roughly 0.60 dollars per million input tokens and 12 dollars per million audio output tokens. See OpenAI's pricing.

Best for: Developers already building on OpenAI who want simple, cheap, expressive speech in one ecosystem.

The honest catch: It is an API, not a studio, and it lacks the deep voice library and cloning of specialist tools.

10. Resemble AI, the voice-cloning and deepfake-defense specialist

Resemble AI positions itself directly as the number one ElevenLabs alternative for voice cloning, and it has the chops to back it up. Beyond high-quality cloning and real-time speech, Resemble offers tools for voice security and deepfake detection, which appeals to enterprises worried about misuse. It is also the team behind Chatterbox (next on this list), which tells you they know the model layer cold.

Pricing (verified June 2026): Resemble offers usage-based and subscription options with a free trial; check Resemble's pricing for current tiers, since they adjust by usage and seats.

Best for: Teams that need serious voice cloning plus security and authenticity tooling.

The honest catch: The security-and-API positioning means it is more of a platform than a casual creator studio.

11. Fish Audio, the ultra-cheap, open-source-powered API

Fish Audio is the value champion that keeps surfacing in cost-conscious threads. One Reddit user called it as good as ElevenLabs and noted it is based on an open model, so with some effort you can even run it yourself for free. The managed API is dramatically cheaper than ElevenLabs, and Fish's S-series models have topped open benchmarks, beating some closed systems on blind win-rate tests.

Pricing (verified June 2026): A Pro plan around 9.99 dollars a month for roughly 200 minutes, with API pricing in the region of 15 dollars per million characters, far below ElevenLabs on a per-character basis. See Fish Audio.

Best for: Budget-focused developers and creators comfortable with a more technical, API-first tool.

The honest catch: The "run it yourself for free" path requires real setup work, and as one skeptic on Reddit dryly noted, that work is rarely as small as people claim.

12. Kokoro TTS, the tiny free model that punches absurdly above its weight

Now we cross into open source, where free does not mean bad. Kokoro is the poster child for efficient TTS. At just 82 million parameters and a 300-megabyte footprint, it hit number one on the TTS Arena leaderboard, beating models 10 to 100 times its size, and it runs on basically anything, including a plain CPU. Released under the permissive Apache 2.0 license, you can use it commercially for free.

Pricing: Free and open source.

Best for: Developers, tinkerers, and privacy-conscious users who want quality speech they can self-host at zero cost.

The honest catch: No voice cloning. You get 54 preset voices and that is it, so if you need a custom or cloned voice, look at Chatterbox instead.

Chatterbox, from Resemble AI, is the model that made people stop saying ElevenLabs is unbeatable. In blind preference tests, listeners chose Chatterbox over ElevenLabs roughly 63 to 65 percent of the time. It clones a voice from about 10 seconds of audio, it is free under the MIT license, and it is the open-source pick when you need cloning rather than presets. Note that creators on Reddit pointed to Replicate as an easy way to run the resemble-ai/chatterbox model without local setup.

Pricing: Free and open source (pay only for compute if you run it on a host like Replicate).

Best for: Developers who want ElevenLabs-level quality with voice cloning, self-hosted and free.

The honest catch: English only for now, and all output carries an inaudible watermark for traceability.

14. Inworld AI, the new quality leader for developers

Rounding out the list is a rising name. Inworld's Realtime TTS climbed to the top of the Artificial Analysis TTS leaderboard based on thousands of blind preference comparisons, delivering that quality at sub-200-millisecond latency and a significantly lower cost than ElevenLabs. It started in game and character AI, so expressiveness is a strength. If you are a developer chasing the current quality frontier on a budget, it belongs on your test list.

Pricing (verified June 2026): Usage-based API pricing well below ElevenLabs per minute. See Inworld AI.

Best for: Developers who want benchmark-leading quality and low latency via API.

The honest catch: Like the other API-first picks, it is built for builders, not for a point-and-click content studio.

Honorable mentions worth knowing

A few more names came up repeatedly in community threads and deserve a quick nod. MiniMax (Hailuo) Audio is praised for strong quality at a low yearly price, popular with faceless-channel creators. NaturalReader has fans who rate its voice cloning highly for everyday reading. Descript bundles solid AI voices inside a beloved editing-by-text workflow for podcasters and video editors. And Camb.ai and Papla Media keep getting mentioned for affordable, realistic long-form narration. None displaced the main 14, but depending on your niche, one might be your perfect fit.

ElevenLabs alternatives compared side by side

Here is the whole field in one view. Prices reflect entry paid tiers verified in June 2026 and are rounded for comparison. Always confirm on the provider's page before buying.

Tool	Best for	Voice cloning	Starting paid price	Free tier	Standout strength
Fliki	All-in-one voice plus video	Yes (30s)	~21 USD/mo (annual)	Yes	Voice and video in one workspace, 8 engines
Murf AI	Product video and e-learning	Higher tiers	~19 USD/mo (annual)	Yes (10 min)	Polished editor, Canva and PowerPoint
Speechify	Reading and listening	Yes (Studio)	~19 USD/user/mo	Yes	Best consumer listening experience
WellSaid Labs	Corporate and brand-safe	Enterprise	~49 USD/mo	Trial	Consistent professional voices
LOVO (Genny)	Creators wanting light video	Yes	~24 USD/mo	Yes	Voices plus built-in video editor
Cartesia	Real-time voice agents	Yes	~4 USD/mo (annual)	Yes (20k credits)	Lowest latency (~40ms)
Hume AI	Emotional, custom voices	Describe-a-voice	~3 USD/mo	Yes	Emotional range and voice prompting
Google Gemini TTS	Developers, broad languages	Limited	API usage-based	Yes (AI Studio)	Pronunciation, 70+ languages
OpenAI TTS	OpenAI-ecosystem devs	No	~15 USD/1M chars	Trial credits	Simple, cheap, steerable
Resemble AI	Cloning plus security	Yes	Usage-based	Trial	Cloning plus deepfake defense
Fish Audio	Ultra-cheap API	Yes	~10 USD/mo	Yes	Lowest cost, open model
Kokoro	Free self-hosted TTS	No	Free	Free	Runs on a CPU, Apache 2.0
Chatterbox	Free cloning, self-hosted	Yes (10s)	Free	Free	Beat ElevenLabs in blind tests
Inworld AI	Benchmark-leading API	Yes	API usage-based	Trial	Top leaderboard quality, low latency

Which ElevenLabs alternative should you actually choose?

Tables are useful, but decisions are easier when someone just tells you what to pick. So here is our straight advice, by who you are.

Start with Fliki. The voice quality rivals what you came for, and folding voiceover, visuals, captions, and dubbing into one flow saves more time and money than swapping ElevenLabs for another standalone TTS tool. If your output is specifically polished corporate or product video, also trial Murf AI.

You are a developer building an app or voice agent

Latency and API quality rule here. Cartesia for real-time, Inworld and Google Gemini TTS for top quality at low cost, and OpenAI TTS if you already live in that ecosystem.

You need character, emotion, or storytelling

Hume AI for describable, emotionally rich voices, with Chatterbox as the free, self-hostable wildcard.

You are an enterprise or L&D team

WellSaid Labs for consistency, rights clarity, and compliance, with Resemble AI when cloning and security matter.

You want free, period

Kokoro if you do not need cloning, Chatterbox if you do. Both are open source and genuinely good now, not just "good for free."

The smartest move, echoed by nearly every experienced creator we read, is to test before you commit. Run the same script through your top two or three picks, listen on the device your audience uses, and trust your ears. The voice that makes you forget it is AI is the right one, whatever the spec sheet says.

How to switch from ElevenLabs without regretting it

Switching tools sounds scary and is usually not. A few tips to make it painless.

Keep a reference script

Use one consistent paragraph, ideally with a number, a name, and an emotional line, to A and B test every tool fairly. Pronunciation and emotion are where tools diverge most.

Watch the real cost, not the headline

A 19 dollar plan with tight limits can cost more than a 28 dollar plan that covers your actual monthly volume. Map your true usage in minutes or characters first.

Check commercial rights before you publish

Free tiers often forbid commercial use or add watermarks. If you monetize, confirm the license on your plan, an area where tools like Fliki, Murf, and WellSaid are explicit.

Do not abandon what works

Several creators run two tools, ElevenLabs for a hero voice and a cheaper option for bulk, and that hybrid is perfectly sensible. Switching is not all or nothing.

Mind your existing clones

A cloned voice on one platform does not transfer to another. If voice cloning is central to your brand, factor in re-cloning on the new tool, which usually takes only a short sample.

The bottom line

ElevenLabs earned its reputation. The voices really are that good, and if budget were no object, plenty of people would happily stay. But budget is an object, credits do run out, and in 2026 the alternatives have closed the gap to the point where switching is a strategic choice rather than a sacrifice.

If you make videos, Fliki is the most complete answer because it gives you ElevenLabs-grade voices and the entire video workflow in one place, at a flat price that does not punish you for creating more. If you live in long-form audio, Fish Audio stretch your budget furthest. If you build software, Cartesia, Inworld, and Google Gemini TTS lead on speed and value. And if you want free, Kokoro and Chatterbox prove that open source has truly arrived.

Pick the reason you are leaving, match it to the tool that fixes exactly that, run your reference script through your top two, and trust your ears. The best ElevenLabs alternative is not the one with the longest feature list. It is the one that lets you forget you are listening to a machine, and keep the money you used to spend worrying about it.

Ready to hear the difference for yourself? Try Fliki free and generate your first AI voiceover in minutes, no credit card required.

FAQs

For self-hosted and truly free, Kokoro (no cloning, runs on a CPU) and Chatterbox (cloning from 10 seconds of audio) are the strongest open-source options in 2026. For a free hosted experience with no setup, Fliki and Murf all have genuine free tiers, and Google AI Studio lets you experiment with Gemini TTS at no cost. Just check the commercial-use terms before you publish.

Yes, ElevenLabs has a free plan with a limited monthly credit allowance, enough to test the quality but not to produce much at scale. That cap is exactly why so many people search for free alternatives. If the free tier runs dry, Fliki, Murf, Cartesia, Fish Audio, and the open-source models above all give you more headroom without immediately paying ElevenLabs prices.

They are built for different jobs. Speechify shines as a listening and reading assistant, turning your documents and articles into audio you consume on the go, and its Studio product adds creator features. ElevenLabs is a production studio for generating polished voice content for an audience. If you want to listen to text, Speechify. If you want to create narration for videos or audiobooks, ElevenLabs or a creator tool like Fliki or Murf fits better.

For paid hosted tools, Hume AI (from around 3 dollars a month), Cartesia (Pro from around 4 dollars a month), and Fish Audio (around 10 dollars a month) are among the most affordable entry points. For zero cost, the open-source models Kokoro and Chatterbox are free to self-host. On a per-character API basis, Fish Audio and Google Gemini TTS are typically far cheaper than ElevenLabs.

Chatterbox by Resemble AI is the standout because it offers voice cloning and beat ElevenLabs in blind listening tests, all under a permissive MIT license. Kokoro is the best choice when you do not need cloning and want something tiny enough to run on a CPU. Fish Audio's models are also open and top several benchmarks. All three are legitimately good, not merely acceptable for free.

Many can. Fliki, LOVO, Resemble AI, Hume, and the open-source Chatterbox all offer voice cloning, usually from a short sample of 10 to 30 seconds. Quality and language support vary, and most reputable tools require consent to clone a voice. If cloning is core to your work, test the clone quality specifically, since it differs more between tools than the stock voices do.