With the rise of search engines, we had a corresponding rise in SEO (Search Engine Optimization), the practice of making your content rank higher on Google and other search engines. Now we’re seeing AI chatbots like ChatGPT growing past 100m users as a potential Google competitor, we should expect to see an opportunity in the practice of making your content get mentioned by AI, which I’m calling AIO: Artificial Intelligence Optimization.

This field is a long way off existing, which is why I made the above joke. Bear with me here, this is actually semi-serious. Having thought about it more, I actually think this is going to be a thing, and it’s worth exploring this area today to establish a few common sense best practices. In this article I’ll cover what makes AIO potentially valuable, and list out some ways you can already curry favor with the current batch of AI tools.
I’m interested in this area as a self-proclaimed prompt engineer, the author of an O’Reilly course of Prompt Engineering, and because AI is unexpectedly making the book I’m writing on reverse-engineering creativity a whole lot more interesting.
I plan to keep this page updated as the field develops, but everything’s happening so fast I’ll need you to message me with anything I miss: @hammer_mt
What’s the Prize?
First off, what can AIO do for you? At the time of writing, only Microsoft Bing / Sydney has the potential to drive any serious traffic to your website, as most AIs don’t yet provide citations. There are early trials with You.com and Perplexity AI, but they’re unlikely to have the volume yet to really make a difference. However people are already reporting increased traffic from Bing.

That means so far AIO is more of a brand marketing play. For example how much would Converse, Vans, or Reebok pay to be in that list? What’s the benefit to Nike for having both their main brand and a sub brand (Jordans) listed? Being included for these types of consumer queries will be golden, especially as ChatGPT gets a mobile app, and AI gets incorporated into more and more products.

This is an informational query, which is surely the top prize as it’ll drive the most awareness. However, there will also be a long tail value in simply being known by the AI, when someone looks you up. Just like having your own Wikipedia page, or owning the top slot on Google for your name, being in the AI’s training data will be a status symbol and a form of brand protection. If the AI doesn’t know about you, it will surely hallucinate something terrible about you. Perhaps tools will develop to update the index and fact-check errors, but we all see how well that goes on Wikipedia and Google.

AI stands to be a valuable research tool. Yes AIs are biased, but they’re really a reflection of our own biases. All it has done is learned our existing associations (those available on the web, anyway). I know from attending a recent conference that social scientists are actively dreaming up ways to do primary research using models like GPT: understanding how the AI sees the world might tell us something about what our society thinks too. It also stands to be a useful tool for brands who want to do research into the zeitgeist, and understand what associations their brand has, that are reflected in the AI, as well as how similar different brands are.

Software developers will likely have a harder time than other creators to get value out of being used for AI training data. While tools like GitHub Copilot can already handle a significant amount of the work, there’s usually no attribution back to source, and there have even been cases of leaked passwords and API keys. AI code generators stand to benefit the authors of popular libraries and platforms that are now more accessible than ever, as even inexperienced developers can just tab tab tab to autocomplete what used to be a complicated implementation.

This isn’t just limited to Large Language Models (LLMs), Diffusion Models like DALL-E, Midjourney, and Stable Diffusion will also be key. Companies pay hundreds of millions of dollars every year for brand advertising campaigns to reinforce their distinctive brand assets. Now one of those avenues is going to be AI. For example Coke’s brand is so distinctive that when you ask DALL-E for a can of cola, it returns images that are quite clearly of the famous brand. If your visuals become synonymous with your category, expect to get a whole lot of free branding.

For individual artists and designers, they stand to go viral from AI art, like the suddenly famous Greg Rutkowski whose magical and otherworldly fantasy style became incredibly popular amongst early Midjourney and Stable Diffusion users. Because these communities are surprisingly massive – Midjourney has 13m users at the time of writing, up from 1m in just under 7 months – this can be as impactful as going viral on TikTok is for a musician.

Brands and artists will need to figure out their strategy with respect to AI. For some with valuable IP to protect, the answer might be litigation, or for those that can't afford a lawyer, data poisoning. I can’t imagine Disney wanting to allow fan-made Star Wars or Marvel movies, and they have sued for less. We’ll have our Metallica vs Napster moment I’m sure, before platforms are forced to mature and the AI equivalent of Spotify emerges, with a business model that aligns incentives for both sides.
However smaller, more enterprising start-ups and creators might be delighted for the exposure, and embrace fan communities that build on and extend their brand. We might even see some brands release open-source fine-tuned models for faithfully replicating the brand’s tone of voice, embedding characters and products, or following the brand style guide. Expect innovation in this space, with early innovators quickly skyrocketing to household names.
How do you get in the training set?
Both search engines and chatbots have the same goal: to help you get the information you’re looking for. That means most of the things you’re already doing to optimize for SEO will also be beneficial for AIO.
- Use natural language and avoid keyword stuffing - AI platforms are becoming more sophisticated in their ability to understand natural language, so focus on creating content that uses natural language instead of stuffing it with keywords.
- Leverage structured data to help AI platforms better understand the content on your website - Structured data such as schema markup can also help search engines better understand your content and improve your rankings.
- Optimize for featured snippets by answering common questions in a clear and concise manner - Featured snippets appear at the top of search results pages and can improve your visibility and click-through rates.
- Use descriptive and engaging meta titles and descriptions to attract clicks and improve click-through rates - This helps improve your rankings and visibility on search engines.
- Focus on user intent and creating content that meets the needs of your audience - This can help improve engagement, click-through rates, and ultimately, your rankings on search engines.
- Get backlinks from reputable websites to improve your authority - Backlinks are a key factor in improving your rankings on search engines, and being featured in lots of places online maximizes your chances of being picked up in training the AI.
- Monitor your website's performance metrics, such as bounce rates and click-through rates, to identify areas for improvement and optimize your content - This helps ensure that your content is meeting the needs of your audience and providing value.
- Optimize your website's loading speed to improve user experience and reduce bounce rates - A faster loading website can improve engagement and reduce bounce rates, which can in turn improve your rankings.
- Use header tags (H1, H2, H3, etc.) to organize your content and improve readability - This helps AI platforms and search engines better understand the structure of your content and can improve your rankings.
- Use multimedia content such as images and videos to make your content more engaging and shareable - This can improve engagement and click-through rates, which can improve your rankings and visibility on search engines, and if you have a unique style you might get featured in AI art communities.
It's unclear whether Bing can actually "read" the articles it searches on SEO, or if it's using some cached version, or a limited portion of the cache. I have seen instances where it seems to have provided answers based on content within the article, for example based on the heading sections. However when you ask, Bing says that it can only read the actual title and meta description from search results, and is unable to visit a specific URL.

Search engine datasets such as Google and Bing’s indexes, and the open-source CommonCrawl, already form the bulk of what these AI models are trained on. Google and Microsoft are the biggest players in both search and AI (Microsoft bought 49% of OpenAI), and anyone else training a model will start with CommonCrawl. For example the data for Meta’s recently leaked model LLaMA was 67% made up of CommonCrawl.

The other obvious place to get yourself inserted is Wikipedia. The open-source encyclopedia is almost always going to be included in anybody’s dataset, because it’s not just high quality and well structured data, it’s also freely available and easy to access. As is the case with SEO, becoming notable in real life is a great way to build authority online too. Coining new terms, publishing books, speaking at events, will all be ways to build your reputation not just with the public, but with AIs too.
I think PR is going to see a real boom with this shift in technology. Suddenly being featured in a major publication isn’t just worth it for the exposure: it potentially gets you into the corpus of data they use to train the next model. OpenAI’s ChatGPT is only trained up September 2021 as it stands today, but Bing can look things up on the internet, and it’s likely Google’s Bard will have the same functionality.

For developers making sure your code is on GitHub is likely enough, though answering questions on Stack Overflow should also be a big help. There has never been a better time to write that package or library, because if it gets widely used, it will create a feedback loop where more and more people will get it in autocomplete via CoPilot and ChatGPT. Perhaps we’ll also see libraries and products developed that are optimized for AI, for example using syntax that makes autocomplete easier. I’ve already found that for example CoPilot struggles with React’s nested components.
For artists that actually want to appear in open-source datasets, rather than sue the AI companies, having your work available in the public domain, and featured on websites such as ArtStation, is a good way to get noticed: “trending on Artstation” is a common way to enhance the quality of your generated image. There are also stock photo sites like Unsplash: particularly if it's open source, it's likely to be featured. The communities for AI Art are highly mimetic – everyone copies what everyone else is doing – so seeding each Discord and Reddit group with a few stunning examples of your work, replicated or enhanced by AI, can help spread the word, and check you made it into the index on Lexica Art.
One non-obvious place to insert yourself is in the prompts you use when working with AI. It’s widely expected that the words we input to tools like ChatGPT and DALL-E will be recorded and used to fine-tune the model. They even have a thumbs up / thumbs down button on the side to get feedback on whether the generation met expectations. It may be the case that simply using your own brand name and correcting the AI might be enough to make it into the next version.