What does 'multimodal' mean?
We unpack the shift from the “classic” text-only experience to a multimodal system that handles diverse inputs and outputs.
Hello, and welcome back. Jessie here, back from a rainy weekend in Toronto! Those kinds of days call for museum visits and working on a fibre arts craft. This month, it’s embroidery — but also sewing: sewing a project bag for embroidery projects. The fibre arts converge!
This week: Multimodal search! How people are searching for information is changing. We unpack the shift from the “classic” text-only experience to the evolution of using many inputs to get a diversity of outputs.
🚨 Happening soon! Our spring community call is Wednesday, May 20 at 11 a.m. ET/4 p.m. GMT! In partnership with Trisolute News Dashboard, Steven Wilson-Beales will join us to discuss all things podcast SEO! Register to receive a reminder and ask a question.

Let’s get it.
THE 101
Over the last five years, search behaviour has fundamentally changed. The rise of AI tools like ChatGPT changed the game — and forced the biggest player in town (Google) to adapt.
Now, with Gemini powering AI Overviews and AI Mode, Google is more conversational, pulling from more sources to better reflect the full breadth of a user’s needs. Meanwhile, the explosion in popularity of TikTok, Instagram and Reddit has provided more visual, snackable and diverse ways to search. The concept of “multimodal” reflects this shift in content discovery.
What is multimodal? How does it relate to search?
Multimodal refers to a system simultaneously using several, distinct inputs to process, understand and communicate information. This information is ingested across multiple content types, including text, images, video and audio. This means a system can process something with “senses” — by looking and hearing — to produce a more “relevant” answer.
Multimodal search is the act of using a variety of inputs to look for information. AI Mode, for example, is multimodal: users can type, use voice commands or upload an image to seek information. Sometimes, a person will use a variety of these approaches together to search, and receive multiple outputs, like data tables, links to maps, etc. That is multimodal search.
Multimodal is the ability to understand something through multiple formats, while multimodal search is the ability to find information about that topic in many ways. Modern AI usually does both.
How does multimodal relate to news SEO?
For audience and SEO editors, there’s both multimodal search and multimodal content optimization. One is how people look for information, and the other is how we prime content for people to find it.
In both cases, it requires us to change and evolve.
In classic SEO, we focused on matching a text-based input (i.e., keywords) to a text-based output (i.e., written content in search results). With multimodal, we look at many different types of inputs and outputs.
Now, discovery is so much richer. It’s blue links on Google search results, yes — but it’s also short-form content on TikTok, forum threads on Reddit and long-form videos on YouTube. It’s all the formats on all the platforms people use every day to understand the world around them.
The shift in how and where people look for information requires a change in how we optimize content for discovery across the web.
Redefining our concept of search also means redefining optimization. Remember that search is a behaviour. SEO is a discipline that puts people first. The goal of SEO is to connect users with information they need in the format(s) they want.
As audience and SEO editors, we need to ensure our content is as diverse as the platforms it lives on. The aim is simple: meet our audience wherever — and however — they choose to look for us.
SEO is evolving — and multimodal is an example of that shift.
What the experts are saying
Multimodal search is a relatively new concept. Here’s how some SEO experts define it.
Search Engine Journal: “Multimodal search isn’t just about the difference between typing a query, speaking it out loud or snapping a photo. It’s about the way users fluidly engage across different platforms and media types to explore, evaluate and decide.”
Lily Ray: “Multimodal content optimization is the notion of producing content in different formats because large language models (LLMs) can indeed ingest different types of content like video, audio and images.”
Louisa Frahm: Multimodal means “you need to be anywhere and everywhere that your readers are consuming content.”
Steve Wilson-Beales: “Multimodal SEO optimizes for discovery across multiple inputs and outputs which can include text, images, video, audio, voice, AI-generated queries and more. This reflects the rapid changes to both the algorithms over the last three years and the evolution of user expectations. In short, the ways in which people consume information have exploded and it’s our job as SEOs to understand how the information is being generated and how it’s being consumed.”
Semrush: “A multimodal content strategy is the process of turning one high-quality asset into multiple formats — text, video, audio and visuals — so your message connects with audiences in the ways they best absorb information.”
Tell us: For a future newsletter, we’re looking for expert takes on how to implement a multimodal approach to content discovery. In 200 words or less, tell us how your newsroom is making your journalism more broadly discoverable.
The bottom line: How people search has changed. As SEO and audience editors, how we optimize content for maximum visibility across the entire web ecosystem must also change.
#SPONSORED - The Classifieds
Get your company in front of more than 14,000 writers, editors and digital marketers working in news and publishing. Sponsor the WTF is SEO? newsletter!
THE JOBS LIST
Audience or SEO jobs in journalism. Want to include a position for promotion? Email us.
The Economist is hiring an Audience Editor (London, U.K.).
RECOMMENDED READING
Google news and updates
🤖 Google: A new way to explore the web with AI Mode in Chrome.
Even more recommended reading
🖱️ Luca Tagliaferro: Does Google use clicks as a ranking signal? Here is the definitive answer.
🔽 Vince Nero & Barry Adams: Is the news industry’s decline really Google’s fault?
Are publishers that block AI crawlers less cited in AI?
5️⃣ Cyrus Shepard: Five data-backed features of websites winning Google.
👏 Marie-Paule Kenmogne: AI agents make SEOs more valuable — not less.
🌍 Mike Blumenthal: Why is multi-location SEO so hard?
💻 C.J. Robinson: The identity crisis coming for news SEO [Jessie & Shelby are quoted!].
🤖 Reporters Without Borders: Google is claiming an editorial right it does not have by rewriting news headlines in its search results.
What did you think of this week’s newsletter?
(Click to leave feedback.)
Catch up: Last week’s newsletter
Have something you’d like us to discuss? Send us a note on Twitter (Jessie or Shelby) or to our email: seoforjournalism@gmail.com.
Written by Jessie Willms and Shelby Blackley






