Today I migrated one of my sites which has around 150k pages in the Google index and thought I’d share a relatively quick and easy way to check the migration went smoothly.
Migrating to a new platform or server is always a risky time for any site that relies on organic traffic. There’s a big risk of pages going missing and redirects not working properly.
The new site featured a new design plus a different CMS platform but essentially had an identical URL structure, so ensuring existing URLs still worked was the primary goal.
I wanted to do the following on both the staging site and post-migration on the live site:
- Get the URLs indexed by search engines
- Batch testing redirections
Indexed URLs
On a database driven site with thousands of pages, it’s not always possible to get a complete list of possible URLs, so we need to prioritise the URLs that search engines are aware of.
For smaller sites (under 1,000 pages), GSiteCrawler does a reasonable job. The downsides are that it puts unnecessary load on you web server and secondly I find it crashes for larger sites.
My preferred method is to get it from a search engine index. Grabbing index data from the major engines can be a hassle. Scraping the engines is cumbersome and it’s a hassle when you get thrown a captcha.
I prefer to use Majestic SEO which provides data from a smaller search engine they run. It uses similar crawl algorithms to Google, so it’s going to be a very similar dataset, and best of all it’s free to use on your own site.
Once you’ve validated your site, go to Domain URLs > Download All and all the URLs you’ll need to redirect will be in the first column.
Note: I recommend against using the sitemap XML as it’s likely to be an incomplete picture.
Batch testing URLs
When migrating a site, the kinds of errors you don’t want to see previously working URLs giving are 404 not found, 401 unauthorized and 500 internal server errors.
I was using a sub-domain on the staging site, so once I had my list of URLs all I needed to do was search and replace “http://www.” with “http://dev.” in Excel then get a good cross section and run it through an HTTP header checker.
I put through 500 URLs at a time through my own batch HTTP header checker and fixed up any pesky 404s I found.
Post migration, I picked another set of URLs to test and again got positive results. To be 100% sure, I will be logging into Google Webmaster Tools tomorrow morning to check for 404s.
Good luck with your site migration!

[...] implemented 301 redirects to handle the old site’s URLs. Tip: Read this tutorial on implementing 301 redirects on your new [...]