What is a sitemap?
This week: Jessie explains what a sitemap is and how it can help SEO efforts for publishers.
Hello, and welcome back. Jessie here, back from a SEO-filled week in New York City. Shelby and I were so thrilled to speak at New York University as part of their Summer Publishing Institute. We talked about keyword research, headline writing and why creating great content that resonates with the audience requires understanding search intent.
Special thanks to someone in midtown Manhattan for throwing out a huge stack of pre-1990 National Geographic magazines – you made my abstract surrealist collage-ing heart full and my luggage heavy (Shelby: don’t let her lie to you – she put them in my carry-on)!
This week: In a bid to level up my tech SEO knowledge, I (content SEO-loving Jessie) am taking the wheel to explain sitemaps (don’t worry: this edition is still fact-checked by tech ace Shelby. Shout out ~ collaboration ~). We look at what a sitemap is and how they can best be used for news SEO.
Note: It’s a long weekend here in Canada and the U.S. July 4th holiday is Monday – what better excuse than statutory holidays to skip a new issue? We’ll be off Monday, but check our archives: How to audit your site for SEO or SEO for planned news.
Join our Slack community to chat SEO any time
Let’s get it.
In this issue:
What’s a sitemap?
Why is a sitemap useful for news SEO?
What should be in a sitemap?
THE 101
What is a sitemap?
A sitemap is a file, stored on the server for your website, that provides a list of your site’s URLs. It tells search engines where to find the important content on your site, making it easier for them to crawl, index and rank the pages.
Here’s how Google defines it: “A sitemap is a file where you provide information about the pages, videos, and other files on your site, and the relationships between them.”
Full documentation for sitesmaps is available at sitemaps.org.
Sitemaps can also contain additional information such as the last time a page was updated, how often the page is changed and the importance of the page.
Types of sitemaps
Websites can have multiple sitemaps. There are four types: a standard sitemap (links to pages on your site); video and image sitemaps (just image and video content hosted on your site, respectively); and the news sitemap (to help Google find content for Google News-approved sites).
Google says: “Google supports extended sitemap syntax for the following media types (Video, Images, Google News). Use these extensions to describe video files, images, and other hard-to-parse content on your site to improve indexing.”
You can have up to 50,000 URLs in one sitemap, and it must be an uncompressed file with a size limit of 50MB. If your site exceeds either limit, split your data into multiple sitemaps housed under your main sitemap (sitemap-index.xml). Larger sites can split URLs into content type-specific sitemaps, too.
For more information on sitemaps, check Google’s FAQ.
Submitting a sitemap
Google’s documentation says to add the sitemap to your robots.txt file or directly submit it to Search Console directly to make it available to Google. You will submit your sitemap once it’s created and if you create any new sitemaps in the future.
Ahrefs provides instructions for submitting a sitemap to Google.
There are plugins available for WordPress or third-party tools (such as XML-Sitemaps.com) that can be used to create a sitemap. For larger sites, check with your engineers – a custom CMS is likely to have this done in another way.
A content management system like WordPress or Wix will take care of updating your sitemap once it is published.
Find your publication’s sitemap at either of these web addresses: yourwebsite.com/sitemap.xml
or yourwebsite.com/sitemap_index.xml.
Google’s recommendation is to include the sitemap in the root directory (top-most folder) of your website. This makes it a primary file crawled, and makes it easier for spiders to find the URLs on your site.
Why are sitemaps important?
Sitemaps are a helpful tool that makes it easier to parse content and understand its relationship to the rest of your site. They are useful for several reasons, but are not explicitly necessary for a website to perform well in search. As Google explains: “If your site’s pages are properly linked, our web crawlers can usually discover most of your site.”
Three reasons a sitemap can be useful:
They can help get more content indexed– and faster;
They can help search engines better understand the structure of your website;
They can help search engines interpret your priority pages.
It’s worth noting that, even when a sitemap is available, there is no guarantee it will directly correlate to better rankings or increased visibility in search. Sitemaps can be an effective tool to organize the structure and content of your site, especially as your news publication grows.
Use the four types of sitemaps to ensure search engines understand and parse each type of content, while paying special attention to your most important pages.
However, as with many things in SEO, there’s no guarantee your effort will directly impact your rankings. Sitemaps can help – but they are no magic bullet. You still need to ensure all your other SEO efforts are in place: having a strong backlink profile, a well-executed internal linking strategy, a well optimized homepage and quality, relevant content.
In short: Be sure that all other ways Google finds your content are also in tip-top shape.
Index more content – faster
A sitemap can convey the pages and files your publication considers most important to be crawled and includes relevant information about those assets. The bottom line: it can make content discovery faster.
Remember: Google can only rank content in search engine results pages (SERPs) if it can crawl and index those pages.
For larger sites, sitemaps can be helpful in providing a wider crawl range. However, for smaller news or niche sites, they might not be necessary because Google can crawl your homepage or backlinks (links it finds from other sites).
If you want to rank ahead of your competitors, make sure your news file is in the sitemap as quickly as possible (so Google can crawl and index that content ahead of your competitors).
Read more: Barry Adams explains the three most important technical SEO concepts to know and explaining how Google works (both provide an overview of crawling, indexing and ranking).
As noted above, Google could – in theory – discover most of the content on your website without a sitemap if you have excellent internal linking in place.
Proper linking means that all pages that you consider to be important can be reached from some form of navigation: the site's menu or links within each page.
Sitemaps can help make Google’s job easier, and improve how effectively it crawls your website.
Optimizing sitemaps
To optimize your sitemaps, Search Engine Journal recommends only including SEO-relevant pages and excluding the less-important pages like non-canonical, duplicate or parameter-based URLs. (See SEJ for their full list of pages to exclude.)
Sitemaps help Google find important pages faster – something that is especially important if your publication adds new stories consistently or updates existing content. News publications tend to publish a higher volume of new pages, so sitemaps can help Google understand when an existing page is updated with new information or new pages are published.
Optimizing your sitemaps also maximizes your crawl budget (or the number of pages Google will visit in a single crawl). Speed and news value matter here. You want Google to maximize its time on your site by visiting URLs for stories you care about most (so that journalism can readily show up in search results).
Read more: How to use XML sitemaps to boost SEO
The bottom line: Sitemaps can help Google find and understand the most important content on your websites more quickly.
RECOMMENDED READING
Phew, this week was good for reading. Bookmark some of your favourites and cozy up this summer with all of the best SEO insights.
Barry Adams on Twitter: The line-up for NESS 2022 is out and it is excellent! Early bird tickets are now available for the news SEO conference happening virtually Oct. 4 and 5.
Barry Adams on Twitter: Here are the most common technical SEO problems Barry has encountered while performing 50+ audits for news publishing websites. Bug him (@badams) to publish his next newsletter edition soon!
Search Engine Land: Googlebot will crawl and index the first 15MB of content per page.
Moz: How we increased revenue with speed optimization.
Search Engine Journal: Seven methods to research and analyze your audience for SEO.
Ahrefs: Almost half of Google Search Console clicks go to hidden terms (!).
Wix: A complete SEO guide – what you need to know.
Jono Alderson on Twitter: “Is X a ranking factor?” Here’s why that’s not a useful frame when asking an SEO-focused question.
SEO Day: Catch up with the best of #SEOday, the world’s biggest SEO conference. Find talks by Aleyda Solis, John Mueller, Barry Schwartz, Mordy Oberstein, Claudio Cabrera, Lily Ray and more!
Have something you’d like us to discuss? Send us a note on Twitter (Jessie or Shelby) or to our email: seoforjournalism@gmail.com.
(Don’t forget to bookmark our glossary.)
Written by Jessie Willms and Shelby Blackley
Thank you Jessie and Shelby for this comprehensive article 👏
The sitemap is definitely very important for SEO!
I would also like to suggest an alternative tool for generating XML sitemaps: Octopus.do — https://octopus.do/sitemap/resource/generator
I hope you'll find it helpful.