Migrate Your URLs Gracefully

When URLs on your website become invalid, it’s good practice to deprecate rather than remove them, such as to give time to your visitors and the search engines to adapt to the new links. Personally, I have three versions of my website, (Version 1, Version 2, you’re looking at Version 3 now) and I’ve been changing links like crazy.

When you rename a URL, link your old URL to your new URL. If you use Apache Server and PHP as scripting language, you’ll find code samples here. First, create a redirect script, like this one. Note that the script issues a client redirect and reports that the document has permanently moved. This enables well behaved browsers to update their links.

<?php
header("HTTP/1.1 301 Moved Permanently");
header('Location: ' . $path);
?>
...
The location of the resource has changed. Please update your links.
...

Next, edit your .htaccess file, make any old links point to this script, using the following Rewrite rules. Assuming you’ve just changed your post paths from /year/month/day/name/ to /year/name/, here’s your .htaccess file. When processing an URL of the form /2007/10/21/a-blog-post/, Apache translates it to /redirect.php?path=2007/a-blog-post, yielding control to the redirect script.

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(200[0-9])/[0-9][0-9]/[0-9][0-9]/(.*)$ redirect.php?path=$1/$2 [L,QSA]
</IfModule>

Now that your links are safe, edit your robots.txt file to help search engines “forget” about these entries. My robots.txt file looks like this.

# Year/Month/Day style URLs generated by WordPress.
# These have been replaced with Year-only style URLs.
Disallow: /2007/01
Disallow: /2007/02
Disallow: /2007/03
Disallow: /2007/04
Disallow: /2007/05
Disallow: /2007/06
Disallow: /2007/07
Disallow: /2007/08
Disallow: /2007/09
Disallow: /2007/10
Disallow: /2007/11
Disallow: /2007/12

When removing an URL without providing an alternative, you should at least deprecate it for search engines, so that they get a chance to remove their entry from their cache. You can do so by adding a Disallow entry in your /robots.txt file just like before. It is also nice to let your visitors know that the page is about to be removed, by adding text on the page that they can read. You should also monitor the websites that link to the page and inform them of the change. Monitoring tools are readily available. Your 404 page can track the HTTP Referrer header for all the missing pages.

Now you’ll be respecting your visitors and you’ll avoid those 404 Not Found annoyances.

Post a Comment

You must be logged in to post a comment.