This New AI Image Generator Is Giving Tough Competition To Midjourney, DALL-E 3!

Exploring Flux.1

If you've been keeping up with the AI image generation scene, you're probably familiar with the major players dominating the space—Midjourney, DALL·E 3, and Stable Diffusion. But now, there's a new contender in the arena, and it goes by the name Flux.

Today we release the FLUX.1 suite of models that push the frontiers of text-to-image synthesis. read more at https://t.co/49zTUK8Q5V pic.twitter.com/hmcKRIlizn
— Black Forest Labs (@bfl_ml) August 1, 2024

It's developed by Black Forest Labs, and there's been a lot of buzz around it—some even claim it's on par with MidJourney or, in some cases, better. So, naturally, I had to dive in and see what all the fuss was about. Before we get into the nitty-gritty of testing Flux.1, let me give you a quick overview of what makes this tool special.

The Team Behind Flux.1

First off, it's worth noting that the brains behind Flux.1 are no strangers to the world of AI and image generation. The team includes some of the key figures who were instrumental in developing Stable Diffusion—yes, the same folks who brought us VQGAN, Latent Diffusion, and models like Stable Diffusion XL and Stable Video Diffusion. This pedigree gives Flux.1 some serious credibility right out of the gate. If you're familiar with these models, you know we're dealing with a group of experts who know how to push the boundaries of what's possible with AI-generated imagery.

The Three Faces of Flux.1: Schnell, Dev, and Pro

Flux.1 comes in three different models, each with its own strengths and intended use cases. Let's break them down:

1. Flux.1 [Schnell]

This is the entry-level model designed for local development and personal use. It's the fastest of the three and is openly available under the Apache 2.0 license, which means it's open-source. You can create tools using Flux.1 Schnell and even sell them or use the images it generates for commercial or non-commercial purposes. If you're planning to run Flux on your home computer, this is probably the model you'll be using.

2. Flux.1 [Dev]

A step up from Schnell, this model is more efficient and performs better in terms of prompt adherence. However, it's limited to non-commercial applications. So, while you can't sell tools that use this model, it's perfect for more advanced, non-commercial projects.

3. Flux.1 [Pro]

This is the top-of-the-line model, offering state-of-the-art performance. It's designed for enterprise solutions, and it's where you'll see the best results in terms of realism and overall image quality.

Hands-On with Flux.1

So, what's it like to actually use Flux.1? The simplest way to get started is by heading over to Black Forest Labs on Hugging Face. They've made it super easy to test out the Schnell and Dev models right there on the platform.

Screenshot of Black Forest Labs on Hugging Face

You can input a prompt, tweak a few settings, and watch as your image materializes. For those of you wanting to dive deeper, the Pro model is available on Glyph, a cool AI workflow builder that lets you experiment even more.

For my tests, I used Fliki, an all-in-one content creation suite that helps create videos, designs, and audio files with the help of AI. Interestingly, Fliki utilizes Flux.1's API to produce its AI images, which makes it a convenient choice for evaluating the model's capabilities.

Where Flux.1 Shines: Realism and Text Handling

Flux.1 excels at creating realistic images, particularly when you're working with well-crafted prompts. The team behind Flux puts a lot of effort into aesthetic training, ensuring that the images it produces are visually pleasing right out of the box. In my tests, it delivered some impressive results, especially with photorealistic prompts. For instance, when I asked it to generate an image of a man with a beard eating ice cream on a city sidewalk, the result was pretty spot-on, though the guy looked like he might be sniffing the ice cream more than eating it—but hey, close enough!

Image of a man with a beard on a city sidewalk eating ice cream generated by Flux One

Prompt: a man with a beard on a city sidewalk eating ice cream

I also tested a prompt for a woman taking a selfie on a tropical island, and the result was equally impressive, though it had a slightly artificial feel—something that could probably be tweaked with better prompt optimization.

Image of a woman taking a selfie on a tropical island generated by Flux One

Prompt: a woman taking a selfie on a tropical island

What really blew me away was Flux.1's ability to handle text. Whether you're creating logos, memes, or anything involving text, Flux is ahead of the game.

Image of an image of an old, rusty television set with Fliki text on a workbench in a dimly lit room generated by Flux One

Prompt: Generate an image of an old, rusty television set on a workbench in a dimly lit room. The TV screen displays the word "Fliki" in a designer font in pink text. Tools and mechanical parts are scattered around, giving a steampunk vibe.

It's one of the few models that can generate text within images with a high degree of accuracy, something that's been a weak spot for many AI models, including Stable Diffusion.

Where Flux.1 Struggles: Artistic Styles and Prompt Adherence

However, Flux.1 isn't perfect. When it comes to generating illustrations or paintings—be it oil, watercolor, or hand-drawn—Flux.1 doesn't quite hit the mark. I tried generating a hand-drawn illustration of an angry penguin, an oil painting of the same, and even a watercolor version.

Image of a hand-drawn illustration of an angry penguin generated by Flux One

Prompt: A hand-drawn illustration of an angry penguin. The penguin should have a fierce expression, with furrowed brows and a slight frown.

Image of a watercolor painting of an angry penguin generated by Flux One

Prompt: A watercolor painting of an angry penguin. The penguin should have a visibly upset expression, with softer, more fluid lines typical of watercolor art.

While the images were decent, they didn't quite capture the essence of the specified art styles. MidJourney, for example, still has the upper hand in this department, producing images that more accurately reflect traditional art mediums.

Prompt adherence is another area where Flux.1 isn't quite there yet. While it does a decent job of incorporating multiple elements from complex prompts, it occasionally misses some details. For example, when I asked it to create a three-headed dragon watching TV while eating nachos and wearing cowboy boots, it got most of the elements right but stumbled on the dragon's heads—sometimes producing three different dragons instead of one with three heads.

Image of three dragons generated by Flux One

Prompt: A three-headed dragon lounging in a cozy living room, watching TV while eating nachos. The dragon is wearing oversized cowboy boots on its feet. Each head shows a different expression: one amused, one curious, and one bored. The scene is playful, with the dragon comfortably sprawled on a couch, surrounded by snacks and a remote control on a coffee table.

DALL-E 3, on the other hand, nails this kind of prompt adherence, but it falls short in realism, where Flux.1 shines. Here’s the DALL-E 3 Result:

Image of three-headed dragon generated by DALL-E 3

Future: Text-to-Video and Beyond

One of the most exciting things about Flux.1 is that it's not just a text-to-image model. Black Forest Labs has already announced plans to roll out a text-to-video model built on the same foundation. This means that in the near future, we could have an open-source alternative to tools like Fliki or Runway Gen-3, which currently lead the market in generative true AI video clips. Imagine being able to create high-quality, AI-generated videos right from your home setup—Flux.1 could be the key to unlocking that potential.

So, Is Flux.1 a MidJourney Killer?

Not yet. MidJourney still reigns supreme in terms of overall image quality and realism, and DALL-E 3 has the edge when it comes to capturing every detail of a complex prompt. But Flux.1 is a formidable contender, especially considering its open-source nature and the rapid pace of its development. It's already better than what we've seen from the latest Stable Diffusion models, and with the right prompts, it can produce images that are nearly on par with MidJourney.

What's truly exciting is the potential for Flux.1 to become a hybrid of the best features of MidJourney, DALL-E 3, and Stable Diffusion. Imagine a tool that combines the realism of MidJourney, the prompt adherence of DALL-E 3, and the open, uncensored creativity of Stable Diffusion—that's where Flux.1 is headed. And since it's open-source, we're likely to see a community-driven evolution that could push it even further.

Final Thoughts

Flux.1 is definitely one to watch. It's already impressive, and with continued development, it could soon become the go-to AI image generator. Whether you're a casual creator or a developer looking to build something new, Flux.1 offers exciting possibilities.

I highly recommend giving Flux.1 a try—especially since you can use it for free right now. And who knows? It might just become your go-to tool in the near future.