AI image and text generation tools like OpenAI’s DALL-E 2 and GPT-3 are exactly what Arthur C. Clarke was talking about when he said “any sufficiently advanced technology is indistinguishable from magic”.
We all knew that AI was coming to take our jobs, we just thought that creativity was something uniquely human. Instead of the last thing to go before we reach singularity, it looks like white-collar design and writing jobs will be among the first to go. You might think this is a niche interest, but some AI art communities are already approaching the size of popular video game communities. This is bigger than anyone realizes.
The output you can get with a few minutes of work writing simple text prompts is shockingly good. Early adopters are regularly blowing minds as they share their creations on social media. Already a nascent community has appeared around “prompt engineering”: sharing tips and tricks for getting the AI to do what you want. Turning words into to art.
In this post I’ll start by sharing value up front: a prompt engineering template that you can use right away to get started. I’ll walk you through how the template works, so you can start experimenting right away. If you want to learn more about how these innovative tools will change the way we do creative work, keep reading. I’ll finish with a case study based on my own experience using AI art to illustrate my book.
***I have since released a free tool called "Visual Prompt Builder" that helps with prompt engineering by showing you what all the styles look like.***
Prompt Engineering Template
The best way to learn DALL-E and GPT-3 is trial and error. You’ll immediately see the power of AI generative tools, and will be hooked instantly. If you have access of course. If not, keep reading this post to learn more while you wait.
Even if you do have access, it can seriously help improve your prompts to learn from what other people have figured out. I pulled this guide together while I was learning, and I’m sharing it with you so you don’t have to learn the hard way.
This is a living document – so please tweet at me (@hammer_mt) with any suggestions, tips & tricks, and I’ll keep adding to it as best practices continue to emerge.
The meat of the template is the workflow for building up a prompt in DALL-E. With prompts you should start simple with the subject term, whatever it is that you’re trying to create (i.e. a “space whale”). Then increase the complexity of the prompt using optional additional modifiers to change the style, format, or perspective of the image. There are certain magic words or phrases that have been found help to boost the quality of the image (i.e. “trending on artstation”) or conjure up an interesting vibe (i.e. “control the soul”), which feature heavily in publicly shared examples.
Most of the use of the template will come from the list of hundreds of words that get interesting results when you use them in prompts. You probably aren’t aware of all the different art styles, or artists, and it’s helpful to know different modifiers that improve the results of your prompts. I gathered these from my own experience and research in looking at what everyone else is finding works. Keep trying them and seeing what resonates with you. Keep your own prompt library for each project you work on.
Finally I’ve kept track of various useful articles, links and tools in my journey of learning prompt engineering. As I’ve learned from my own experience and others, I’ve added hints and tips as well as various terminologies I’ve encountered, so you don’t have to learn the hard way. My best advice would be to follow some of these links and immerse yourself in what others are doing in this space, as well as reading the rest of this article for more insight into prompt engineering.
UPDATE: Thanks to @mostlynotworkin, who took my prompt template and made it randomly generate a prompt every time you refresh the page!
Tips and Tricks
This is a new section and I'll add more to this section as we go. Tweet at me (@hammer_mt) if you have any prompt engineering hacks and I'll put them here.
To make sure Dalle-2, Midjourney, or other AI Art tools really nail important characters when generating images, simple repetition works surprisingly well.i.e. prompt: "homer simpson, from the simpsons, eating a donut, homer simpson, homer simpson, homer simpson"
Invent Fictional Authors/Artists
One ethical concern of tools like GPT-3 and DALL-E is that we're copying the styles of famous authors and artists without attribution. Because of how the models work it's impossible to tell how much of the style we've copied, and where the limit should be. One surprising and yet elegant trick that works is to invent fictional authors and artists. The AI will imagine what style that person would have, and generate consistent results.
The rate of innovation in this space is tough to keep up with, and time will tell what will prove to be important. However one immediately obvious great leap forward was textual inversion. This is a new feature only currently available for Stable Diffusion (the open source competitor to DALL-E) allowing you to train the AI model on a specific concept, giving it only a handful (3-5) sample images to work with, then download that concept from a concepts library to use later in your prompts. What that means is that you can now introduce your own object, character, or style and get back consistent results that match. For example if you trained Stable Diffusion on the concept of Pikachu, you could reference it again later in your prompts as <Pikachu>.
It processes the images and finds the corresponding latent space where that representation would live, and essentially puts a marker there, in the form of a token. This opens up a world of creative uses for these AI models, because now they can move from fun toys to consistent, reliable tools. For example you can imagine a branding agency working to train Stable Diffusion on the concept of their client's brand, which when included would always produce art that followed brand style guides.
Video game designers or filmmakers could train Stable Diffusion on a character, and then use that character in various scenes. Product designers could train Stable Diffusion to recognize their product, and then easily show that product in different styles, scenarios, and perspectives. This is big.
To test it out, I trained Stable Diffusion on some concept art from a Reddit user who enjoys photoshopping things from Star Wars into old paintings. I admire this art style, but I would never in a million years have the talent to create something similar. So just for fun I tried to see if I can make the AI generate me a painting of Jar Jar Binks in the Roman Senate! The AI definitely didn't know the style before hand, as you can see below, but it picked up up from 4 hours of training (on a Google Colab GPU) with just 6 sample images. I could of course keep going and train Stable Diffusion to understand the concept of Jar Jar Binks, but I don't want Disney coming after me!
If you want to try out textual inversion, probably the easiest way is to check out the huggingface concepts library and either contribute to it or follow the instructions to create your own private concepts. They provide links to Google Colab notebooks which let you run the full code for free courtesy of the generous GPU access Google offers for no cost, and the fact that Stable Diffusion is completely open source.
The Future of Creative Work
What are all the artists and authors going to do for work when the full impact of this breakthrough technology is realized? Will they all be out of a job?
The answer is no, of course not. As remarkable as DALL-E and GPT are, they’re not magic. They’re tools like any other. After a tumultuous period of change from introduction of new technology, people retrain or switch jobs, and we continue along with higher productivity.
Disney no longer illustrates cartoons by hand, but they own Pixar where higher quality renderings can be done via computer. Many illustrators learned computers, a lot will have retired, some progressed to management, and others still do things manually.
Over a long enough time period, even technologies that obliterate whole categories of industry eventually become commonplace, and those people find other things to do.
In 1908, before the car dominated transport, New York alone had a population of 120,000 horses that had to be fed, groomed, and cared for. They produced 2.5 million pounds of manure every day on the city’s streets. Do you know any out of work horse crap shovelers today?
The truth is that most of the time creative professionals spend is on the equivalent of shoveling horse crap. Generative AI tools have the potential to handle that for them.
To capitalize on this opportunity, creatives just need to reframe the value they provide. Let DALL-E be the artist; you can be the curator. Let GPT-3 be the writer; you can be the editor.
Here are a few tangible examples of how AI promises to make creative work better:
- No more bad briefs: clients can use AI to show you exactly what they want, so there’s no time wasted second-guessing
- Unlimited variations: rather than charging for or limiting the number of variants, you can generate 10+ new versions with the click of a button
- Consistent brand guidelines: once you’ve designed a stable prompt that works, it’ll almost always replicate the right style and tone of voice for approval
- Self-service small jobs: rather than handle unprofitable commissions manually, you can encode your ‘style’ into a prompt library and sell that for passive income
- Unexpected inspiration: the tight feedback loop between prompt and results lets you take weird and wonderful routes to ideas you would never have thought of
Prompt Engineering as a Job
Much of the work of prompt engineering is persistence: these tools are still in beta, and working with AI takes a lot of trial and error. What makes a great prompt engineer is that they’re capable of communicating clearly. When you can create anything you want, the bottleneck becomes your ability to express exactly what that is.
There are some industry hints and tricks that you can adopt to immediately improve your results. For example, telling GPT-3 to be “helpful” increases the truthfulness of its answers. Adding “trending on artstation” tends to boost the quality of images for DALL-E 2.
However the vast majority of the work of prompt engineering is just being a student of history and literature. Since getting access to DALL-E 2 I’ve become somewhat of an amateur art historian. I’ve been learning the names of different art movements, experimenting with lighting and photography terms, and familiarizing myself with the work of artists I had never heard of before.
Now simply knowing the name of a style makes you instantly able to replicate it. Every style has its own ‘memes’ – units of cultural information – that taken together are what distinguishes one category or style from another. GPT-3 and DALL-E have seen enough images and descriptions of Van Gogh to know that the prompt “in the style of starry night” means something like:
“The painting is composed of thick brushstrokes of paint that are applied in a swirl-like pattern. The colors are mostly blues and greens, with some yellow and red. The stars in the night sky are rendered with white dots. The painting has a dreamlike quality, and the overall effect is one of intense emotion.”
Painting. Thick brushstrokes. Swirl-like pattern. Blues and greens. White dot stars. Dreamlike. Intense emotion. These are the memes associated with Van Gogh, together they constitute his style. If you asked any artist to paint you something “in the style of starry night”, their brain would conjure up these exact same associations from memory. This is precisely what DALL-E 2 is doing, except its brain is artificial, and its memory is the entire corpus of the internet. That gives it the power to imagine almost anything, like for example what the Mona Lisa would look like if you zoomed out to see the rest of the landscape.
So training for work as a prompt engineer is very similar to training to work as an artist or copywriter. You need to read great novels, learn about famous art movements, understand human nature, and what resonates with your intended audience. Map out all the memes of different categories. Mix them together to achieve something unique. None of that has changed: what has is the ability to translate at blinding speed from your imagination to a computer screen.
Prompt Engineering Case Study
For a tangible example of how this works, take the artwork for the book I’m writing on Marketing Memetics. One of the main themes of the book is how our brains haven’t evolved in the past 200,000 years: we’re essentially cave men trying to make sense of the modern world and all of its fantastic technology. So I had the idea of taking ancient historical people, and dumping them into a futuristic city.
How this would have worked before AI art generators:
- I would have to find an artist whose aesthetic I liked
- I’d have to brief them on what I wanted, despite having no art knowledge or background
- I might have to wait until they’re finished with their current commission
- I would have to pay them thousands, maybe tens of thousands of dollars
- It might take days, weeks, or months for me to see the final version
- Once done, there’s nothing I can do to change the painting
- If I wanted more than one painting, multiply the time and costs accordingly
I’m writing my book as a passion project. I don’t have the backing of a publisher, I plan to just put this on Gumroad and Amazon. I have no idea if anyone will buy it. So realistically I wouldn’t do any of the above: I’d have to download some free vector art online and keep daydreaming about the custom oil painting I had in my head.
One consideration is that OpenAI, the creators of DALL-E currently retain copyright over all of the images you produce on their platform, and don’t allow you to generate images for commercial use. Note that just the image itself is copyrighted, not the prompt or style, so you should be ok to use DALL-E to ideate then commission a final version based on the result (I’m not a lawyer, and this isn’t legal advice). I used Midjourney (a DALL-E competitor) in part because they allow commercial use of the content so long as you have an enterprise plan.
Here’s how it actually worked with Midjourney:
I started with the image in mind of Neal Stephenson’s “Snow Crash”, a novel about mind viruses that was a great inspiration for my own book.
Looking back, my first prompt for Midjourney was “a man with a samurai sword standing in front of an ancient babylonian gate, staring through to a futuristic cityscape”, which got me the following result:
I actually found this kind of discouraging, so I played around for a while and looked at what others were doing in Midjourney. Because it’s all run through a Discord server (similar to Slack) you can see what prompts everyone else was using and the results they’re getting.
So after some experimentation and learning the secrets of prompt engineering, I started getting far better results. The prompt “a hyper realistic photo of an ancient samurai warrior in modern day manhattan” is what got me onto the right path.
The top left image was actually pretty compelling, but it wasn’t quite right. I decided I wanted an oil painting, in part to hide the imperfections of the image. So I started researching different artistic styles and found something I was immediately drawn to “The Fall of Rome” by Tomas Cole, part of the “Course of Empire” series.
Thanks to Wikipedia I learned that this style of art was called the “Hudson River School”, of which Thomas Cole was considered the founder. I had no idea! I thought this was painted by an old Italian artist, not someone in North America in the 1800s. Now I had my aesthetic, the results dramatically improved. My prompt became “Faded oil painting on canvas in the Hudson River School style of an ancient samurai soldier arriving in a futuristic utopian shanghai city. From behind. Wide angle.”
Now I had the right look, it was just a case of repeatedly iterating by generating more variations of the ones I liked. Within just a few generations of evolving the images in this way, I got the following painting, which is making it into the final book when I self-publish at the end of the this year (December 2022 - sign up for updates).
I later replicated it in DALL-E when I got off the waitlist, just to see how it compares. Even if DALL-E looks a lot cleaner, I actually prefer the style of Midjourney better. Just for fun I decided to try the ‘outpainting’ in DALL-E, a unique undocumented feature people have figured out. It works by taking your original image, and editing it in Photoshop, Photopea or Figma to have whitespace around it. Then you upload to DALL-E and use their edit feature to erase and fill the extra space. The effect is that you can ‘zoom out’ from an image, with the AI filling in the gaps. The result was the picture that adorns the header of this blog post (I've highlighted the original image with a white border so you can see where it fits in).
It’s already remarkable how far we’ve come, and what we’re able to do. I literally know nothing about design, creativity, or art. I have spent most of my career in Excel spreadsheets and Python scripts. Yet here I am illustrating my own book, in the space of a day for $30.
That alone is wildly disruptive, but it’s what I was able to do next that really changes the game. Now I had a visual style, I could kind of treat my prompt as ‘brand guidelines’ for my book, and generate as many new images as I liked! All I had to do was change the prompt from “...of an ancient samurai soldier arriving in…” to whatever other ancient people I liked. Romans. Greeks. Babylonians. Egyptians. Mongols. I could also change the city to see what pairings looked best. London. New York. Los Angeles. Miami. Delhi. In the end I created over 30 combinations in my spare time over the course of the next week, all with the same concept and a consistent aesthetic, so I can pick one per chapter or even generate one per blog post if I liked, for a completely negligible cost.
It’s not just me. I know ad agencies experimenting with Midjourney (better licensing than OpenAI) to create ads for their clients. People are creating children’s books by combining GPT-3 and DALL-E. Vogue magazine created one of their latest covers with DALL-E 2. A friend of mine even used Midjourney to illustrate his short sci-fi novel to inspire more people to care about climate change. The future of creative work is coming, and it’s best to get ahead of it. Get some experience with prompt engineering.