5 HTML tags every news SEO should know

We’re giving one of our earliest issues a glow up and providing everything you need to know to succeed with HTML and your code structure

Jun 20, 2023

NewzDash: European Search Awards 2023 winner for Best SEO Suite

🚀 Revolutionize your workflow with personalized SEO tips, AI recommendations, ranking alerts, content gap analysis and more. Elevate your content game now!

Get a FREE trial!

Hello, and welcome back. Shelby here, currently somewhere in Ontario camping with Jessie and Iain, friend of the newsletter, to celebrate our favourite chaos muppet’s birthday. If you receive an issue to your inboxes next week, you know we survived.

This week: What is HTML and why is it relevant to news SEO. We’re giving one of our earliest issues a glow up and providing everything you need to know to succeed with HTML and your code structure.

Join our Slack community to chat SEO any time

macbook pro beside white ceramic mug on brown wooden table — Did you also briefly think the candles were lattes? Photo by Nathan da Silva on Unsplash

Let’s get it.

In this issue:

What is HTML?
Why is it important to news SEO?
Five tags that are important for news SEO.

THE 101

What is HTML?

Let’s go back to basics. HTML stands for hypertext markup language. Think about a website like a house: HTML is the structure (walls, floors, ceiling). CSS — cascading style sheets — is the interior design and decorating of the house.

HTML tells the browser what to display and search engines what the page is about.

Why does HTML matter for news SEO?

SEO is the process of optimizing your site for search engines. HTML is the foundation of your site. Proper HTML coding is the first stepping stone in that process. If the foundation is solid, you can build upon it, and Google will know exactly what you’re displaying. Proper code — using the correct HTML elements that tell Google exactly how your site is set up — is the simplest form of optimizing your site for search.

“Bad” or complicated HTML code makes it harder for search engines to crawl, index and rank your pages. Bad code includes using exclusively <div> and <span> tags instead of semantic markup like <h3> for headings, <li> and <ul> for lists or navigation, etc.

There are likely hygiene fixes you can make to better describe your site’s structure to search engines — whether that is through optimizing your homepage or implementing effective structured data.

🔗 Read more: HTML code & search engine success factors

THE HOW TO

Five HTML tags that are important for SEO

There are a ton of technical components that can help a site rank well in search. Here, we will focus on five of the most important HTML tags to consider when auditing a site for proper SEO. (Other technical considerations, like structured data or meta descriptions, are covered in previous editions.)

1. Title tag/Headlines

The title tag shows up in the browser tab and (usually) as the headline of your page on search results. It is contained in the <head> tag of the website’s code, which informs crawlers what the site is about. A title tag does not change the visual look of the website.

For example, this is The Globe and Mail’s title tag (fun fact: I wrote this title tag as the Newsroom SEO Specialist for The Globe in 2017. It is my claim to fame). Below is how it shows up for readers in search.

On an article page, the title tag should be the headline of the page followed by the brand information.

In SERPs, it shows up like this.

Note: Over the past few years, Google has been increasingly using different elements to determine what surfaces in SERPs as the title of the webpage. This makes it increasingly difficult to optimize your pages.

There are also five (5!) headline elements you can use on an article:

Title: Shows up in the browser tab wrapped in a <title> tag, as well as (usually) on search engine results pages;
Visible headline: Usually wrapped in an <h1> heading tag and found visually at the top of the article page;
Structured data headline: The value of the ‘headline’ attribute in the article’s structured data markup. Used commonly as the headline for Top Stories;
Open Graph title: Shows for social media (Facebook, Twitter), but seems to be pulled for stories on Google Discover, too;
Internal link headline: Shown when an article is linked on a homepage or section page (sometimes also called “reader title”).

Friend of the newsletter Barry Adams wrote a great guide on taking advantage of these many fields for your headline to ensure Google picks at least one that is effectively optimized for the target keywords. But sometimes, search engines will still pick something random — you can’t stop it.

Other best practices for your HTML/title tag:

Give it life. Your homepage title tag should never just say “Brand Name: Home.” Use this area to tell the audience what you cover often and what they should expect from your publication.
Be descriptive, always. If it’s an article headline, describe the story — otherwise, how will people know what it’s about?
Use every field you can. Is there a way to differentiate the visible headline from the open graph title from the homepage link? Take advantage.
Focus on the page’s purpose. The focus of the title tag should reflect the focus of the page, followed by the brand name. For example, The About Us page should reflect the page's contents ("About Us | The Guardian").
Brand recognition. Keep the brand name at the end of all title tags (except the homepage) for consistency. You never know if a page will rank for something based on your brand.
Write for a human. Even if your page ranks, it’s still up to the person on the other side of the computer to click your site. Make those title tags/headlines worth their time.

🔗 Read more: Headline tips for publishers and other tips for writing effective headlines.

2. Robots tags

The robots tag tells the search engines what they should do with the page. This is where website owners can outline a set of rules for robots to follow while indexing pages.

The robots tag works at the page level. Search engines will have already crawled the page, but the robots tag will direct spiders on how to treat the page in relation to the index.

Help! My story isn’t showing up on search! If a story is not showing up on search — even if you directly looked up the URL with the site operator — check the robots tag. Does it say ‘noindex’? If so, you’re telling crawlers not to surface the page in search results.

There are several directives the robots tag can provide crawlers.

noindex: Page should not be indexed;
follow: Links on this page should be followed, even if the page is not indexed;
nofollow: Links on the page should not be followed;
noimageindex: Images on this page should not be indexed;
noarchive: Search results should not show a cached version of the page;
unavailable_after: Page should be indexed after a certain date;
noodp/noydir: Old tags mostly used by Bing and Yahoo! that tell a search crawler not to use metadata from the Open Directory Project.

This is how the robots meta tag can look on your site.

Best practices for the robots tag:

Match the instructions to your intent. Ensure the directive on all pages are correct and explicit. Review pages that need a 'noindex’ robots tag, such as login pages, a story that’s under contract from a wire service, a staging site, duplicate, thin or internal search pages.
Know the different names. Robots can be used for all crawlers, but if you want something to show up on Google, but not Yahoo!, you can use Googlebot for the name instead.
It doesn’t need to be everywhere. The robots tag is helpful, but it does not need to be on every page. If you don’t care if the page is indexed, don’t overthink it. Generally, only very, very large publications need to worry about this.

3. Canonical URL

The canonical URL tells Google which version of a page is the master copy. This is used to distinguish which URL is the true version of something, and is a saving grace for duplicate content. When multiple versions of a webpage exist, you can tell search crawlers which is the main one and which URL should be indexed.

This is used for a variety of reasons, including (but not limited to):

Migrating from http to https;
Migrating from beta or staging to a new version of the site;
Migrating or hosting multiple domains (i.e., both .com and .ca);
A wire that is published automatically to an RSS feed but then republished elsewhere;
Any pages that have customized parameters or products;
Page available by different URLs (multiple versions of the same story);
Pages with similar content/stories (two versions of the same story — or an updated version of an earlier story — published at two different URLs, but not wanting to redirect one);
Dynamic pages that create their own URL parameters (merch or subscription pages);
AMP pages;
Many more.

Oh no! Five versions of my story are in SERPs! You recently published an investigation, but you notice in search that all of the versions with parameters (e.g. your AMP version, a tracking parameter and a dummy version). Review the duplicates and pick the true version (likely the one without any parameters) and specific this as the canonical through your code (done with web development or Yoast plugin help). Add a self-referencing canonical tag on all true versions of stories.

Best practices:

Make this as automated as possible. You don’t want to have to worry about canonical URLs. Try to ensure your CMS picks up the true, original version of a page every time the content is shown.
Keep it simple. You know the proper version of the URL. Don't overcomplicate it.

🔗 Read more: Ahrefs has a great guide on how to fix canonicalization errors that go beyond just duplication.

4. Heading tags

Heading tags are found on individual webpages. They are semantic HTML that helps crawlers — and humans! — understand the hierarchy of information on the page. Headings run from <h1> to <h6>, with Heading 1 being the most important, and Heading 6 being the least.

Heading tags are also useful for breaking up large chunks of text. This is important for readers (scanning an explainer or story looking for something very specific), and for Google to understand the content — and potentially rank those in search.

This is how Google sees a site. It will know which is the headline and which headings follow in hierarchical order.

This is how humans see a site. The difference in emphasis on each heading tells the reader the order of importance, and breaks up the text for easy scannability.

Best practices:

Do not use more than one H1 per page. The <h1> tag is the most important heading tag. It is the title of the page and tells Google the purpose.
Cascade your structure. Start with <h2> and move down from there with the second-most important subhead(s) being in a Heading 2, the third-most important being in an <h3> tag, and so on.
Keep the structure simple. Try not to go below <h3> or <h4> in an article unless absolutely necessary. H1 is the title. H2 is your section headings. H3 is your subsection headings.
Optimize your headings and subheadings. Each heading has the opportunity to rank if you’re answering a reader’s query. Keep this in mind while creating headings – what is the purpose or the heading? It is very useful (and in fact, encouraged!) to use commonly searched questions or key phrases as your subheadings.

5. Time element

The time element tells browsers, web crawlers and other smart devices that a web page — or a specific set of content, like a blog post update — was modified at a time on a 24-hour clock or a specific calendar date. The <time> element represents a specific time, and may include the ‘datetime’ attribute to translate dates into readable formats.

This is especially important for news SEO, as Top Stories — one of the most coveted spots for our stories — considers published and updated time as ranking factors.

The element can be used for the published or updated time of the article or a specific date or time within the story — such as a live blog post update.

The <time> element should be used once in your HTML per page — the time you want Google to care about. Any other date or time references can be in <div> or <span> tags.

If you do not use the <time> element — even if you specify when the article was published another way — search engines may choose a different time altogether.

Above are three examples of how to use the <time> element to identify the date on an article.

The top identifies the time on the universal clock, but then gives a relative response of updated 6 minutes ago for the reader.
The second example pulls the published time into the timestamp using the universal clock, and includes a relative response of 5 hours ago for the reader.
The third example includes the date the piece was published as well as the timestamp using the universal clock, but returns the time in P.M. for the reader.

All three examples are enclosed in the <time> element.

In news SEO, we want the most up-to-date information captured in the <time> element. That enables search engines to distinguish exactly when a piece was updated and serve it in results for the appropriate keywords. Reflect on your site’s usage of this tag and how it can be better incorporated, especially on your live blog files, as these require higher levels of updating.

Read more: The (Date) time element

✔ ️ Action item: Using Chrome, open a page on your site. Right-click, then choose "View page source." From there, search for an HTML tag — the title tag, the meta description, the robots directive, the time element or some structured data. Can you optimize any to better fit reader needs?

The bottom line: Strong HTML is an important technical component of having solid SEO and ranking well in search. Understanding the basic tags and how they work, as well as how to optimize them, will help you better reach your search audience.