My name is Al the Graphics Guy, and I’ve been creating and manipulating digital images since the very first Mac computer.
Artificial Intelligence (AI) programs in the field of art, photography and design have advanced significantly over the past year. AI image generators like DALL-E, Midjourney, Stable Diffusion, and Adobe Firefly have the ability to generate incredible imagery by using text in a prompt box to describe what you would like to see.
There really is no historic parallel for me to draw on (pun intended) for how liberating this is, nor a way to overstate how important it is — not only commercially for content creators, but for literally anyone who has ever uttered the phrase “I wish I could show you what I see in my head.”
AI image generation is still an emerging technology that’s far from perfect. But with some slight adjustments to your expectations, the results you can achieve open up entirely new ways of thinking about not only the images you choose for your site but all of your content creation.
Having set the stage, let’s have a look at what we are looking at.
An Intentionally Brief Comparison of DALL-E, Midjourney, Stable Diffusion and Adobe Firefly AI Image Generators
There are many (many!) in-depth write-ups about the history of AI image generators, data models used for each, prompt engineering, etc. That’s not the focus of this particular article.
Image-generating solutions are being created and updated at breakneck pace, so there’s a decent chance that, by the time you read the article, features and capabilities may have already changed.
Here are the basic pros/cons, and impressions I’ve had using the four most popular platforms.
- User-friendly, simple and familiar web interface or (via API integration) within Microsoft Edge and other tools
- Has the ability to add, remove, edit and replace elements in existing images
- Typically provides what you expect to see from your prompt.
- Images limited to 1024×1024 size
- Not particularly artistic or imaginative
- The web interface sometimes lags, and/or times out.
- Provides stunning image quality across all media
- Offers multiple resolutions and aspect ratios
- Renders incredibly fine details
- Provides pro-level control of lighting, framing, scene angles, film-type, etc.
- Understands the essence of styling, brands, artists, photographers
- Uses image seeding (providing an existing image) consistently
- Advanced prompting commands (blending, weight, image prompts etc.) enabling greater control over output.
- Uses the Discord app as its user interface = a little clunky
- Has no direct image editing functions.
- Web interface or you can download and run it on your computer
- There are many different web-based offerings that have customized prompt assistance
- It’s less restrictive for commercial things like famous people
- It offers flexible editing (editing of a portion of the image).
- The average output tends to look artificial
- It’s not nearly as intuitive to natural language prompts as DALL-E or Midjourney
- It’s not easy to achieve photorealism.
Firefly is a different type of AI image program. It is far closer to an image editing application being assisted by AI, than it is to the above three, which are images conceived by AI based upon your prompts. It is still in beta and I found the web interface a bit buggy.
- You start from a base image selection so you can match style
- Good range of style choices, filters
- It offers flexible editing
- Offers multiple output sizes.
- Base image set selection is confined to Adobe stock photos
- Interpretation of adding items to scenes is not there yet
- It’s not easy to achieve photorealism.
My Impressions About These Four AI Image Generators
Content creators can achieve good, useful output from all four tools.
It’s no secret within the industry that Midjourney consistently produces the highest quality ready-to-use images. In my opinion, its competitors don’t even come close.
This is especially true with portraits of people. I’ve found Midjourney’s image composition quality (how well it puts things together) is far more believable, and more consistent.
I have a hard time getting the results that I’m after with Stable Diffusion. It does well with illustrations, and simple objects but for photography the images (especially scenes) look far too artificial and stitched together for my taste.
DALL-E falls between the two. For images of single objects (or non-complex scenes), it produces good results and the image quality is generally quite good as well. Being able to erase parts of the image and recreate them (flexible editing) is a useful feature.
As mentioned above, Firefly just came out so I found it to be very limited in prompt interpretation. The pick-and-go-prompt model will be useful to people who are used to selecting from stock photo banks, but this first release is quite limited in its usefulness.
Yes, that’s it. Well almost.
In my evaluation, I placed significant weight on two other factors: discovery, and what I call “accuracy-on-initial-prompt.” Simply put, “How close does the tool get with my first few natural language prompts, and then, where does it take me?”
This leads us to a more important part of the conversation…
How Are You Thinking About the Imagery for Your Content Work?
This might vary depending on your niche, but the most common way people think about images for their content pages is very literal and descriptive.
Your site is about chickens, so you naturally have pictures of chickens. Chickens feeding, chickens in the coop, chickens that escaped to the neighbor’s yard and are chasing the horrified kids, chickens, chickens and more chickens. After all, that’s why people have come to your site in the first place… right?
But that’s not why they’ll come back, subscribe to your newsletter, or make purchases. They’ll do those things because of the attachment they form with you based on your writing and imagery… your voice, or style.
If you’re a Solo Build It! member, you’ve noticed how much emphasis we’ve put on not only how to write effectively, but on your writing “voice” — what distinguishes your site from the thousands of other sites presenting similar information.
Your voice (or style) is you — an accumulation of a lifetime of experiences. Your family dynamic as you were growing up, every vacation, every class, every job, everything you’ve seen and read, and every person you’ve met. That’s the box that you draw from when you create.
When the internet went live, this box grew exponentially, and limitations like location and financial constraints went away. A.I. has just added another exponent to this.
The End of Many Limitations, Leaving Just One
Creating and/or selecting imagery for your content was limited to either your personal skills (drawing, painting, photography etc.), or to what you could find to use — finding copyright-free images, purchasing stock images, or hiring graphic designers.
You were confined to your abilities, or the limitations of already produced work — other people’s interpretations of the subject matter. That changed this year with the release of A.I. text-to-image generators. These applications are like having history’s greatest visual creators work for and with you.
You can now ask Monet (ok, maybe someone who has studied his work!!) to paint a skyscraper, DaVinci to draw a car, or NASA to photograph a squirrel on the moon — simply by typing those words. Instead of being limited to one individual interpretation of the subject matter, you have millions and millions of possible variations and mash-ups to play with.
The limitation that remains is a good one to have. Choice. How to get what’s in your head onto your page. Or select from an infinite set of images. And this is something that you can learn, control, and play with.
There are many different techniques to do this, and in Tai, SiteSell’s upcoming release, I’ll dive deeper into what I’ve found to be useful, productive, and enjoyable.
Conclusion of Part 1
For this article, I wanted to give you the broad strokes so you can start playing with these AI image generation tools. Only then will you begin to get an understanding of what they can do and what they’re currently having a hard time producing.
On that note, before you go and try them: Do not expect to be able to control specific details or you might end up very frustrated…
These programs produce interpretations, and the interpretations are often quite random. They are also different from AI text generators in that adding more (or specific) details to your prompts – sometimes even a single word – will completely change the image interpretation.
Subscribe to our newsletter to get a reminder for next week’s instalment where I will walk through the approach and techniques I’ve been using to generate the images in our AI blog posts.
Al the Graphics Guy
Images by Midjourney. You can find the prompts I used to generate the images in this article in the title tag of each (hover your mouse over the image).