Prompt Engineering: From Words to Art and Copy

April 4, 2023
Michael Taylor
Facebook Twitter WhatsApp LinkedIn

AI image and text generation tools like OpenAI’s DALL-E 2 and GPT-3 are exactly what Arthur C. Clarke was talking about when he said “any sufficiently advanced technology is indistinguishable from magic”

https://openai.com/blog/dall-e/ 

We all knew that AI was coming to take our jobs, we just thought that creativity was something uniquely human. Instead of the last thing to go before we reach singularity, it looks like white-collar design and writing jobs will be among the first to go. You might think this is a niche interest, but some AI art communities are already approaching the size of popular video game communities. This is bigger than anyone realizes.

http://midjourney.com/

The output you can get with a few minutes of work writing simple text prompts is shockingly good. Early adopters are regularly blowing minds as they share their creations on social media. Already a nascent community has appeared around “prompt engineering”: sharing tips and tricks for getting the AI to do what you want. Turning words into to art.

This is a topic I know well, and have created an online course on Udemy called "The Complete Prompt Engineering for AI Bootcamp (2023)" where you can learn over 4 hours of principles for working with AI. Or you can continue to read on to decide if I know what I'm talking about. :-)

https://twitter.com/karpathy/status/1273788774422441984?s=20 

In this post I’ll start by sharing value up front: a prompt engineering template that you can use right away to get started. I’ll walk you through how the template works, so you can start experimenting right away. If you want to learn more about how these innovative tools will change the way we do creative work, keep reading. I’ll finish with a case study based on my own experience using AI art to illustrate my book.

If you're primarily interested in templates for generating text with GPT-3 and ChatGPT, you can skip ahead to the section "The Five Pillars of Prompting", as that covers my main advice on how to work with Large-Language Models (LLMs). If you're looking to learn prompt engineering, you can check out my prompt engineering courses on Vexpower.

---

***I have since released a free tool called "Visual Prompt Builder" that helps with prompt engineering by showing you what all the styles look like.***

> Visual Prompt Builder

https://tools.saxifrage.xyz/prompt

Prompt Engineering Template

The best way to learn DALL-E and GPT-3 is trial and error. You’ll immediately see the power of AI generative tools, and will be hooked instantly. If you have access of course. If not, keep reading this post to learn more while you wait.

Even if you do have access, it can seriously help improve your prompts to learn from what other people have figured out. I pulled this guide together while I was learning, and I’m sharing it with you so you don’t have to learn the hard way.

> PROMPT ENGINEERING TEMPLATE

This is a living document – so please tweet at me (@hammer_mt) with any suggestions, tips & tricks, and I’ll keep adding to it as best practices continue to emerge.

The meat of the template is the workflow for building up a prompt in DALL-E. With prompts you should start simple with the subject term, whatever it is that you’re trying to create (i.e. a “space whale”). Then increase the complexity of the prompt using optional additional modifiers to change the style, format, or perspective of the image. There are certain magic words or phrases that have been found help to boost the quality of the image (i.e. “trending on artstation”) or conjure up an interesting vibe (i.e. “control the soul”), which feature heavily in publicly shared examples.

Most of the use of the template will come from the list of hundreds of words that get interesting results when you use them in prompts. You probably aren’t aware of all the different art styles, or artists, and it’s helpful to know different modifiers that improve the results of your prompts. I gathered these from my own experience and research in looking at what everyone else is finding works. Keep trying them and seeing what resonates with you. Keep your own prompt library for each project you work on.

Finally I’ve kept track of various useful articles, links and tools in my journey of learning prompt engineering. As I’ve learned from my own experience and others, I’ve added hints and tips as well as various terminologies I’ve encountered, so you don’t have to learn the hard way. My best advice would be to follow some of these links and immerse yourself in what others are doing in this space, as well as reading the rest of this article for more insight into prompt engineering.

UPDATE: Thanks to @mostlynotworkin, who took my prompt template and made it randomly generate a prompt every time you refresh the page!

> Randomized Prompt Engineering Template

The Five Pillars of Prompting

The pace of progress in AI is blindingly fast. It feels like something new and innovative drops every single week. According to analysis of the arXiv database where most AI papers are hosted, the number of papers on AI published every month is doubling every 24 months: it's going exponential.

https://www.reddit.com/r/singularity/comments/xwdzr5/the_number_of_ai_papers_on_arxiv_per_month_grows/

It's impossible to keep up, and that means the tips & tricks that work today aren't likely to work for long. Sam Altman, CEO of OpenAI, said “I don’t think we’ll still be doing prompt engineering in five years.” by which he meant “...figuring out how to hack the prompt by adding one magic word to the end that changes everything else”. The good news is that he followed up with this: "What will always matter is the quality of ideas and the understanding of what you want." What Sam is saying is that no matter what happens with AI, being able to have good ideas and communicate them effectively will continue to be important.

Instead of learning hacks, we should focus instead on ways to work with AI that are timeless: they've been useful in the past and will likely to stay useful far into the future. I call them the Five Pillars of Prompting.

1. Examples: Provide examples of how to answer your prompt

2. Direction: Give guidance on what kind of answer you want 

3. Params: Change what answers you get by adjusting settings

4. Format: Make it clear how you want to receive the answer

5. Chaining: Link multiple AI calls together to complete the task

To start with, let's look at a typical beginner's prompt for GPT-3.

How can we improve our communication? The AI is having to do a lot of guesswork, and that won't always give us the results we want. How can this prompt be engineered to reliably yield useful results?

The example above has had the 5 Pillars of Prompting applied to it. It provides and example of how to answer your prompt (a home milkshake maker), gives guidance on what kind of answer we want (seed keywords), it adjusts the model parameters to change the answers we get (temperature), makes it clear what format we expect (product names), and uses chaining to string multiple AI responses together. Let's dig into more detail:

1. Examples: Provide examples of how to answer your prompt

One of the things that makes GPT-3 so good is that it's capable of zero-shot reasoning, meaning giving you an answer without any examples. However that doesn't mean that giving examples can't radically improve the quality of the response. Giving examples is something we do regularly when briefing humans, so it stands to reason that even when AI gets superhuman, this will still help. Be careful however: AI has a tendency to learn too much from the examples, and providing too many similar ones can make the AI less creative in its answer. You can think of providing examples as 'fine tuning' the AI, so that it produces consistent results. Prompt length is limited so if you can't fit enough context, using GPTIndex with Pinecone DB can help inject context into each API call. If you want to regularly get consistent results, it might make sense to actually fine-tune the model by training it on lots of examples, which is available at a higher cost.

2. Direction: Give guidance on what kind of answer you want 

Giving examples is useful, but sometimes you need to direct the AI towards what sort of answer you want. In this prompt we do it with seed keywords, telling the AI what part of latent space to search in for an idea. In or original prompt we seeded with"adaptable", "fit", or "omni-fit", and in this case we changed it to "adjustable", "bigfoot", and "universal", so we got very different results. Providing direction really biases the AI results, as like any good intelligence (artificial or otherwise) it wants to give you exactly what you want and can take things too literally. Sometimes it takes you too literally, so be ambiguous with your feedback, just as you would with a human copywriter or designer: they can't surprise you if you tell them exactly how to do their job.

3. Params: Change what answers you get by adjusting settings

Most AI models have some degree of flexibility baked into them. In GPT-3 the main parameter that's important to adjust is "temperature", which is some measure of "randomness" the model will use to formulate a response. LLMs work by picking the next word that's likely to appear in a sentence, but it doesn't choose them uniformly: sometimes it picks words that still fit but are relatively rare. The temperature is how you control that: higher temperature means more 'creative' choice of words, and lower temperature will give you a boring answer (which is sometimes what you want). If a model doesn't offer the flexibility you want, it might even make sense to explore a different model, such as one of the open-source models like GPT-J and GPT-NeoX by EleutherAI, or something that's custom or finetuned.

4. Format: Make it clear how you want to receive the answer

When we're just playing around with AI, especially with something like ChatGPT, the structure of the responses doesn't really matter. However pretty soon you start wanting to plug AI into production tools and that's when structure gets important. If the AI doesn't give you consistently formatted results every time, it's impossible to depend on the results for any serious work. The way that you prompt the AI really matters: from the examples given to how you finish your prompt gives the AI guidance on how to respond. GPT-3 is a data formatting savant, and it's not just limited to comma separated or numbered responses, it can even give you back JSON or other structured data.

5. Chaining: Link multiple AI calls together to complete the task

Once you start using AI to complete real work, you'll find you often need multiple calls to complete a single task. As prompts are limited to around 4k tokens (around 1k words), so it works well to break tasks up into multiple prompts. The example given here is that we're coming up with multiple product ideas via one API call, and then pinging each product idea into a product description prompt to get the idea fully fleshed out. You might even then want to push these responses through another prompt to check for issues and maintain high quality, before reviewing them manually at the end. Tools such as Langchain can string multiple actions together, and keep prompts organized.

Tips and Tricks

Of course in the short term, there are some tips, tricks, and hacks that are working and can help you paper over the cracks in how the AI works today. Tweet at me (@hammer_mt) if you have any prompt engineering hacks and I'll put them here.

Simple Repetition

To make sure Dalle-2, Midjourney, or other AI Art tools really nail important characters when generating images, simple repetition works surprisingly well.i.e. prompt: "homer simpson, from the simpsons, eating a donut, homer simpson, homer simpson, homer simpson"

Ok not exactly what I wanted
Better results, but still not there
All four of the results are passible now

Invent Fictional Authors/Artists

One ethical concern of tools like GPT-3 and DALL-E is that we're copying the styles of famous authors and artists without attribution. Because of how the models work it's impossible to tell how much of the style we've copied, and where the limit should be. One surprising and yet elegant trick that works is to invent fictional authors and artists. The AI will imagine what style that person would have, and generate consistent results.

https://twitter.com/fabianstelzer/status/1554229352556109825/photo/1

Dreambooth and Textual Inversion

The rate of innovation in this space is tough to keep up with, and time will tell what will prove to be important. However one immediately obvious great leap forward was textual inversion and dreambooth. This is a new feature only currently available for Stable Diffusion (the open source competitor to DALL-E) allowing you to train the AI model on a specific concept, giving it only a handful (3-5) sample images to work with, then download that concept from a concepts library to use later in your prompts, or as a package in the case of Dreambooth. What that means is that you can now introduce your own object, character, or style and get back consistent results that match. For example if you trained Stable Diffusion on the concept of Pikachu, you could reference it again later in your prompts as <Pikachu>.

https://twitter.com/TomLikesRobots/status/1568916040599363586?t=Bmyz1UrXmna_Ds15E1GfCg&s=03

It processes the images and finds the corresponding latent space where that representation would live, and essentially puts a marker there, in the form of a token. Dreambooth is more resource intensive but it actually trains the model, rather than finding a point in the latent space, so it can be more accurate and reliable. This opens up a world of creative uses for these AI models, because now they can move from fun toys to consistent, reliable tools. For example you can imagine a branding agency working to train Stable Diffusion on the concept of their client's brand, which when included would always produce art that followed brand style guides.

https://textual-inversion.github.io/

Video game designers or filmmakers could train Stable Diffusion on a character, and then use that character in various scenes. Product designers could train Stable Diffusion to recognize their product, and then easily show that product in different styles, scenarios, and perspectives. This is big.

Image
https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb

To test it out, I trained Stable Diffusion on some concept art from a Reddit user who enjoys photoshopping things from Star Wars into old paintings. I admire this art style, but I would never in a million years have the talent to create something similar. So just for fun I tried to see if I can make the AI generate me a painting of Jar Jar Binks in the Roman Senate! The AI definitely didn't know the style before hand, as you can see below, but it picked up up from 4 hours of training (on a Google Colab GPU) with just 6 sample images. I could of course keep going and train Stable Diffusion to understand the concept of Jar Jar Binks, but I don't want Disney coming after me!

If you want to try out textual inversion, probably the easiest way is to check out the huggingface concepts library and either contribute to it or follow the instructions to create your own private concepts. They provide links to Google Colab notebooks which let you run the full code for free courtesy of the generous GPU access Google offers for no cost, and the fact that Stable Diffusion is completely open source. Many AI tools such as AvatarAI use this functionality under the hood in order to actually place your own face in pictures.

https://twitter.com/hammer_mt/status/1580198284186222592

Recursive Reprompting to Catch Mistakes

One common issue with LLMs is that they make mistakes. Sometimes they hallucinate wrong answers, get the context wrong, or output something in a bad format. You might thing the answer is to kick the problem to a human, but you'd be wrong: give the AI a chance to correct its mistake! Often with the right recursive feedback loop it can get to the right answer so long as your prompts are good. You can follow up each response with a prompt to check for common mistakes, and you can even feed errors back into GPT-3 and ask it to fix the mistake.

https://www.patterns.app/blog/2023/01/18/crunchbot-sql-analyst-gpt/

The Future of Creative Work

What are all the artists and authors going to do for work when the full impact of this breakthrough technology is realized? Will they all be out of a job?

The answer is no, of course not. As remarkable as DALL-E and GPT are, they’re not magic. They’re tools like any other. After a tumultuous period of change from introduction of new technology, people retrain or switch jobs, and we continue along with higher productivity.

Disney no longer illustrates cartoons by hand, but they own Pixar where higher quality renderings can be done via computer. Many illustrators learned computers, a lot will have retired, some progressed to management, and others still do things manually.

Over a long enough time period, even technologies that obliterate whole categories of industry eventually become commonplace, and those people find other things to do.

In 1908, before the car dominated transport, New York alone had a population of 120,000 horses that had to be fed, groomed, and cared for. They produced 2.5 million pounds of manure every day on the city’s streets. Do you know any out of work horse crap shovelers today?

The truth is that most of the time creative professionals spend is on the equivalent of shoveling horse crap. Generative AI tools have the potential to handle that for them.

To capitalize on this opportunity, creatives just need to reframe the value they provide. Let DALL-E be the artist; you can be the curator. Let GPT-3 be the writer; you can be the editor.

Here are a few tangible examples of how AI promises to make creative work better:

  • No more bad briefs: clients can use AI to show you exactly what they want, so there’s no time wasted second-guessing
  • Unlimited variations: rather than charging for or limiting the number of variants, you can generate 10+ new versions with the click of a button
  • Consistent brand guidelines: once you’ve designed a stable prompt that works, it’ll almost always replicate the right style and tone of voice for approval
  • Self-service small jobs: rather than handle unprofitable commissions manually, you can encode your ‘style’ into a prompt library and sell that for passive income
  • Unexpected inspiration: the tight feedback loop between prompt and results lets you take weird and wonderful routes to ideas you would never have thought of

Prompt Engineering as a Job

Much of the work of prompt engineering is persistence: these tools are still in beta, and working with AI takes a lot of trial and error, as well as some technical knowledge. Perhaps that's why some companies are paying $250k - $335k per year for prompt engineers!

https://jobs.lever.co/Anthropic/e3cde481-d446-460f-b576-93cab67bd1ed

What makes a great prompt engineer is that they’re capable of communicating clearly. When you can create anything you want, the bottleneck becomes your ability to express exactly what that is. 

https://twitter.com/DynamicWebPaige/status/1512851930837843970 

There are some industry hints and tricks that you can adopt to immediately improve your results. For example, telling GPT-3 to be “helpful” increases the truthfulness of its answers. Adding “trending on artstation” tends to boost the quality of images for DALL-E 2. 

However the vast majority of the work of prompt engineering is just being a student of history and literature. Since getting access to DALL-E 2 I’ve become somewhat of an amateur art historian. I’ve been learning the names of different art movements, experimenting with lighting and photography terms, and familiarizing myself with the work of artists I had never heard of before. 

Now simply knowing the name of a style makes you instantly able to replicate it. Every style has its own ‘memes’ – units of cultural information – that taken together are what distinguishes one category or style from another. GPT-3 and DALL-E have seen enough images and descriptions of Van Gogh to know that the prompt “in the style of starry night” means something like

“The painting is composed of thick brushstrokes of paint that are applied in a swirl-like pattern. The colors are mostly blues and greens, with some yellow and red. The stars in the night sky are rendered with white dots. The painting has a dreamlike quality, and the overall effect is one of intense emotion.” 

Painting. Thick brushstrokes. Swirl-like pattern. Blues and greens. White dot stars. Dreamlike. Intense emotion. These are the memes associated with Van Gogh, together they constitute his style. If you asked any artist to paint you something “in the style of starry night”, their brain would conjure up these exact same associations from memory. This is precisely what DALL-E 2 is doing, except its brain is artificial, and its memory is the entire corpus of the internet. That gives it the power to imagine almost anything, like for example what the Mona Lisa would look like if you zoomed out to see the rest of the landscape.

https://mixed-news.com/en/what-would-mona-lisa-look-like-with-a-body-dall-e-2-has-an-answer/

So training for work as a prompt engineer is very similar to training to work as an artist or copywriter. You need to read great novels, learn about famous art movements, understand human nature, and what resonates with your intended audience. Map out all the memes of different categories. Mix them together to achieve something unique. None of that has changed: what has is the ability to translate at blinding speed from your imagination to a computer screen.

Prompt Engineering Case Study

For a tangible example of how this works, take the artwork for the book I’m writing on Marketing Memetics. One of the main themes of the book is how our brains haven’t evolved in the past 200,000 years: we’re essentially cave men trying to make sense of the modern world and all of its fantastic technology. So I had the idea of taking ancient historical people, and dumping them into a futuristic city.

How this would have worked before AI art generators:

  • I would have to find an artist whose aesthetic I liked
  • I’d have to brief them on what I wanted, despite having no art knowledge or background
  • I might have to wait until they’re finished with their current commission
  • I would have to pay them thousands, maybe tens of thousands of dollars
  • It might take days, weeks, or months for me to see the final version
  • Once done, there’s nothing I can do to change the painting
  • If I wanted more than one painting, multiply the time and costs accordingly

I’m writing my book as a passion project. I don’t have the backing of a publisher, I plan to just put this on Gumroad and Amazon. I have no idea if anyone will buy it. So realistically I wouldn’t do any of the above: I’d have to download some free vector art online and keep daydreaming about the custom oil painting I had in my head.

One consideration is that OpenAI, the creators of DALL-E currently retain copyright over all of the images you produce on their platform, and don’t allow you to generate images for commercial use. Note that just the image itself is copyrighted, not the prompt or style, so you should be ok to use DALL-E to ideate then commission a final version based on the result (I’m not a lawyer, and this isn’t legal advice). I used Midjourney (a DALL-E competitor) in part because they allow commercial use of the content so long as you have an enterprise plan.

Here’s how it actually worked with Midjourney:

I started with the image in mind of Neal Stephenson’s “Snow Crash”, a novel about mind viruses that was a great inspiration for my own book.

Looking back, my first prompt for Midjourney was “a man with a samurai sword standing in front of an ancient babylonian gate, staring through to a futuristic cityscape”, which got me the following result:

I actually found this kind of discouraging, so I played around for a while and looked at what others were doing in Midjourney. Because it’s all run through a Discord server (similar to Slack) you can see what prompts everyone else was using and the results they’re getting.

So after some experimentation and learning the secrets of prompt engineering, I started getting far better results. The prompt “a hyper realistic photo of an ancient samurai warrior in modern day manhattan” is what got me onto the right path.

The top left image was actually pretty compelling, but it wasn’t quite right. I decided I wanted an oil painting, in part to hide the imperfections of the image. So I started researching different artistic styles and found something I was immediately drawn to “The Fall of Rome” by Tomas Cole, part of the “Course of Empire” series.

https://en.wikipedia.org/wiki/The_Course_of_Empire_(paintings)#/media/File:Cole_Thomas_The_Course_of_Empire_Destruction_1836.jpg 

Thanks to Wikipedia I learned that this style of art was called the “Hudson River School”, of which Thomas Cole was considered the founder. I had no idea! I thought this was painted by an old Italian artist, not someone in North America in the 1800s. Now I had my aesthetic, the results dramatically improved. My prompt became “Faded oil painting on canvas in the Hudson River School style of an ancient samurai soldier arriving in a futuristic utopian shanghai city. From behind. Wide angle.”

Now I had the right look, it was just a case of repeatedly iterating by generating more variations of the ones I liked. Within just a few generations of evolving the images in this way, I got the following painting, which is making it into the final book when I self-publish at the end of the this year (December 2022 - sign up for updates).

I later replicated it in DALL-E when I got off the waitlist, just to see how it compares. Even if DALL-E looks a lot cleaner, I actually prefer the style of Midjourney better. Just for fun I decided to try the ‘outpainting’ in DALL-E, a unique undocumented feature people have figured out. It works by taking your original image, and editing it in Photoshop, Photopea or Figma to have whitespace around it. Then you upload to DALL-E and use their edit feature to erase and fill the extra space. The effect is that you can ‘zoom out’ from an image, with the AI filling in the gaps. The result was the picture that adorns the header of this blog post (I've highlighted the original image with a white border so you can see where it fits in).

It’s already remarkable how far we’ve come, and what we’re able to do. I literally know nothing about design, creativity, or art. I have spent most of my career in Excel spreadsheets and Python scripts. Yet here I am illustrating my own book, in the space of a day for $30.

That alone is wildly disruptive, but it’s what I was able to do next that really changes the game. Now I had a visual style, I could kind of treat my prompt as ‘brand guidelines’ for my book, and generate as many new images as I liked! All I had to do was change the prompt from “...of an ancient samurai soldier arriving in…” to whatever other ancient people I liked. Romans. Greeks. Babylonians. Egyptians. Mongols. I could also change the city to see what pairings looked best. London. New York. Los Angeles. Miami. Delhi. In the end I created over 30 combinations in my spare time over the course of the next week, all with the same concept and a consistent aesthetic, so I can pick one per chapter or even generate one per blog post if I liked, for a completely negligible cost.

It’s not just me. I know ad agencies experimenting with Midjourney (better licensing than OpenAI) to create ads for their clients. People are creating children’s books by combining GPT-3 and DALL-E. Vogue magazine created one of their latest covers with DALL-E 2. A friend of mine even used Midjourney to illustrate his short sci-fi novel to inspire more people to care about climate change. The future of creative work is coming, and it’s best to get ahead of it. Get some experience with prompt engineering.

June 9, 2020
September 6, 2022

More to read