Quantcast
Viewing all articles
Browse latest Browse all 78

Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit

For a little while now I’ve been wrangling with an interesting problem in site architecture diagnostics. Do I have any orphaned pages on my site, and how do I find them?

Frequently, web developers will write code to generate an XML sitemap that will include all URLs in the database, linked to or no. How can we use this fact to our advantage?

I thought I’d share a strategy using Excel to extract all the URLs on your site, comparing them to the raw export from the amazing IIS Toolkit. (PS – word is that Daniel’s writing a “how to” for IIS, including the installation…)

Import your XML sitemap URLs into Excel

Did you know you can import an XML sitemap into Microsoft Excel? It can be a little crash-tastic, but oh-so rewarding when it works. Let’s use SEOgadget’s XML sitemap as the example:

Image may be NSFW.
Clik here to view.
Import an XML feed into Excel

Then, pop your XML URL into the dialogue box:

Image may be NSFW.
Clik here to view.
new source

I’d recommend you only do this if you have reasonably powerful machines, we rock i7′s with 4gbs of ram as a system minimum. Big sitemaps take a while!

The end result is something like this:

Image may be NSFW.
Clik here to view.
excel data

By the way, you could import Twitter search results via the RSS feed into Excel using the same method, if you’re into that kind of thing.

Get your IIS site crawl data

Dan’s working on his next post, so I’m not going to steal any thunder, I promise. Assuming you already have IIS Toolkit, go ahead and export the URL listfrom your site crawl:

Image may be NSFW.
Clik here to view.
IIS Rocks

Import the CSV export into Excel, and turn the data into a table called IIS. If you haven’t played with exporting data from IIS, you should – it’s amazing.

Image may be NSFW.
Clik here to view.
IIS Data Rocks

There’s a ton of brilliant data in the IIS export, including the contents of the server header reponse, internal links counts and the directory the page is hosted in. We’ll talk more about that later on, use for now we’re interested in orphaned pages.

Use VLOOKUP to find pages in your sitemap that have no internal links

At the end of the day, a page in your website with no internal links is very unlikely to rank. Many pages with no internal links indicates some problems with your site architecture that you’re going to need to fix. Alas, how to find them? With a trusty VLOOKUP, that’s how.

We’re most interested in the “linked by” column, which contains the total number of internal links a page has. All you have to do is pull this data into your imported sitemap XML list. Assuming you name your IIS table “IIS”, the query will look like this:

=IFERROR(VLOOKUP(XML[[#This Row],[ns1:loc]],IIS,18,0),”Not crawled”)

Why is this good?

The end result will be a list of URLs in your XML sitemap with corresponding internal link numbers. That’s cool for doing the following:

- Finding URLs in your XML sitemap with no internal links, and are as a result, “Not crawled”
- Finding URLs in your sitemap with very low numbers of internal links

Have fun, and have a great Bank Holiday weekend Image may be NSFW.
Clik here to view.
:-)

=IFERROR(VLOOKUP(XML[[#This Row],[ns1:loc]],IIS,18,0),”Not crawled”)

Image credits:
Nick Johnson

The post Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit appeared first on SEOgadget.


Viewing all articles
Browse latest Browse all 78

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>