Skip to content

Lesson 2A: Show a search engine the direction with an XML sitemap

Linda van den Brink edited this page Jun 6, 2017 · 16 revisions

Finding something is much easier if you know where to go. Use XML-sitemaps to direct search engines to pages or data objects which are available for crawling.

Why?

XML-sitemaps are an easy way to inform search engines about pages or data object you want to be crawled. An XML-sitemaps lists URLS and additional metadata about each URL.

Intended outcome

Short indexation period, and periodical re-crawling (based on indicated indexation frequency).

Possible approach

Create an XML-sitemap for each URL (<uri> » <loc>) you want to be crawled. Add additional metadata to each URL:

  • <lastmod>: denotes when the data is last updated here. (for spatial data, put the date it is collected / measured).
  • <changefreq>: denotes how often the data is updated.
  • <priority>: priority related to other URL's in the sitemap

Example:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://geo4web.apiwise.nl/gemeente/GM0307</loc>
<lastmod>2016-01-01</lastmod>
<changefreq>yearly</changefreq>
<priority>0.8</priority>
</url>
</urlset>

See also: sitemaps.org

However, XML-Sitemaps can be come very large for large (spatial) datasets.

A sitemap has a max size of 2500 web pages, the sitemap specification however supports the concept of pagination. In the previous version of the specification, a sitemap allowed to contain a spatial extent for a resource, but this functionality is not supported operational anymore.

The second phase of the testbed suggests that, even though a sitemap is important, a human-readable "datamap" is at least as important, so that developers understand which data is available when they reach the site.

How to test

After registering an XML sitemap at Google or BING, from that point in time the crawling and indexation status is monitored on Google and Bing Webmaster tools.

Google Search Console

Evidence

Some examples of XML sitemaps for spatial datasets can be found here:

Clone this wiki locally