SEO Friendly URL Encoding for Search Results

This tome series was originally authored as part of an introduction to Apache SOLR – an open source faceted search engine. However the theories presented in this article should be applicable to most site search appliances.

First things

Before delving deeper into the types of encodings and the impact that they have on search engine optimisation, it is probably a good idea to get the three letter acronyms (TLAs) out of the way.

  • SEP – Search Engine Provider – a provider of world wide web based search such as Google, Yahoo! and MSN1
  • SEO – Search Engine Optimisation – constructing URLs and page content2 to provide maximum spidering ease for SEPs and rank improvement for a particular web site. I will use the term ‘SEO friendly’ in the context of ‘more SEO friendly’ or ‘less SEO friendly’ to indicate possibly better or worse practices respectively for the URL encoding
  • SSA – Site Search Appliance – A specific search engine for a site, in this case SOLR would be considered the SSA for the site.

URL and its encoding

Any URL is made up of the following parts3:

<scheme name> : <hierarchical part> <hostname> <path> [ ? <query> ]

Where the scheme name <scheme name> (or protocol as it is known) generally4 consists of a combination of letters terminated by a colon (“:”). In our investigations the two protocols that we will see almost exclusively are http: and https:.

The <hierarchical part> starts with a double forward slash (‘//’)

The <hostname> determines the address of the host either in IP address format (4 dot separated groups of numbers – e.g. 127.0.0.1) , or a domain name (more human readable dot separated address – e.g. www.example.com).

The <path> is a sequence of segments which is conceptually similar to a directory structure on a computer separated by a forward slash (‘/’).

The [ ? <query> ] part starts with a question mark (‘?’) followed by a query key (normally a well chose name) followed by an equals sign (‘=’) followed by a query value.

This document will focus mainly on the <path> and [ ? <query> ] parts of the URL and various methods to change these around to increase SEO.

The following URL has been marked up to show the break-up of the various parts:

 http://www.example.com/some/path/segments/?key1=value1&key2=value2
 \__/  \_______________/\_________________/\______________________/
  |        hostname        path segments           query
  |
scheme

Now that we have this covered, we will start looking at how the URL is parsed from an SEO perspective.

On SEO

Search Engine Optimisation (SEO) is having a greater impact on how content is authored and the URL at which it can be referenced. In order for search engines such as Google, Yahoo and MSN to index the maximum number of relevant pages, it has become prudent to encode search page URLs in a certain way. This should increase the content that is accessible to search engines when spidering a site. Whether or not you agree with the practice of SEO, it is a commercial reality that is here to stay. Furthermore there are other gains to be had from providing a SEO friendly site especially in the realms of accessibility. On the other end of the spectrum, there are some sites that attempt to manipulate the system by providing both URL rich and keyword rich results with the sole purpose of providing SEO without any regard for the users of the system. In my view, a balance should be maintained with the focus firmly on the users of the system rather than purely for greater search engine rankings. Keeping the users in mind whilst creating a search appliance will not only provide for a happier customer experience, but should also provide more SEO friendly URLs.

Although SEO can be considered a dark art as all search engine providers (SEPs) keep their algorithms a closely guarded secret, some information is available and generally agreed upon. Even so called ‘experts’ of SEO cannot always agree as to an approach that will gain the best rankings for a particular URL. Furthermore, the SEPs continually refine their algorithms in response to those that attempt to exploit the algorithms with no extra benefit to the sites’ users. Still, SEO can be seen as a helper for both the users of the site and the search engines in surfacing new and more relevant content easily.

There is more to SEO than simply the URL encoding method. In fact, entire industries have been spawned to deal with this challenge. Some other optimisations which need to be taken into account for a more SEO friendly site include:

  • Content layout
  • Semantic layout
  • Keyword and description relevancy.

The actual mechanics of the optimisation process is far too complex, change too rapidly and not always publicly available to be covered here and this coupled with the continual shifting of algorithmic interpretations, would date the advice rather quickly. In fact over the years I have had many discussions with various SEO providers whose advice will contradict advice from other SEO providers. It is though the SEO practice is dependent on the time of day and phases of the moon. Moreover, the ‘experts’ continually change and update their ‘best practice’ standards in the pursuit of greater rankings for specific sites. Apart from touching on the URL encoding practices, other facets of SEO is left as an investigation exercise for the reader.

However, distilling the combined knowledge, SEO ‘experts’ mainly agree that URL encoding is preferable to request parameters as not all SEPs will parse and recognise key/value request parameters5.

Whilst I have heard arguments that an SEO friendly URL is more memorable to users, I find that this is at best a spurious argument, and if anything is done more for readability of URLs than for memorability. As an example, the following URL from a typical blog:

http://blog.example.com/2008/07/dos-and-donts-for-SEO-friendly-URLs

From a users perspective, reading the above URL provides a lot more information than a URL with a query parameter:

http://blog.example.com/post.php?idPost=892

The first URL can be read and information gleaned from it – it is no great stretch of the imagination that the post was made in the year 2008, in the month of July (07) and the title would be similar to “Dos and Don’ts for SEO Friendly URLs”. However it is a far stretch that this could be considered memorable, as the user would need to remember that the post was made in July 2008, apostrophes removed, spaces replaced with hyphens, and all characters lower-cased apart from the characters of the SEO and URL part of the SEO friendly URL.

This then leads to search engines, when told of the title of the post on the particular blog, it becomes easier to find this through search engines by searching for the title, and if available restricting the search to a particular URL, rather than hitting the front page of the blog and browsing through the many posts (which becomes more difficult if the post is old and is buried deep within the site). Of course the site’s search functionality would also allow a search on the title (if it exists). This becomes more difficult if the original blog web site cannot be remembered, or is spelt incorrectly.

If anything, when emailing a link to others, a quick scan of the more SEO friendly URL will provide hints as to whether it would be worth reading, especially in our time limited lives. Whether or not the article will contain useful information is another question entirely.

Investigation

In order to investigate various URL encoding strategies, I will be using an example of a DVD site search appliance (SSA).

The examples will all be base on a search for DVDs with the the following criteria:

  • The title and/or body text contains the term ‘super’ – i.e. the query string is ‘super’
  • The DVD must be in the category of ‘action’ and ‘drama’
  • We want to sort on price
  • The DVD edition must be a special
  • We want 25 results per page

Up next

We will look at the site search appliance and how it works, and how it all starts to fit together. Using the most basic of URLs we will investigate the thought processes that are needed to implement an SSA and extensions to make it SEO friendly.

Here we go… The Site Search Appliance »

Navigation:

Use the following links to skip straight to a page, or browse through the pages one by one.

  1. The Site Search Appliance
  2. Type 0 – Request Parameters
  3. Segue into URL binding
  4. Type 1 – Throwaway URLs with Request Parameters
  5. Type 2 – Parsing Hint Positional URLs
  6. Type 3 – Hard-coded Positional Parameters
  7. Type 4 – Positional Parameters with Encoded Parsing Hints
  8. Type 5 – Extra Information Positional Parameters with Encoded Parsing Hints
  9. Encoding Type Showdown
  10. Final Note

Footnotes:

  1. Naturally, there are many other search engine providers, far too many to be listed to be listed here. The major ones that seem to come up in conversations about SEO are listed here in order of precedence. This is valid for the Australian market, other markets may not give the same weighting to the list, however the principles behind SEO should be transferrable from one SEP to another.
  2. I will not delve into the concepts behind SEO friendly page structure as advice and implementation differ greatly depending on whom you converse with. Suffice it to say, a well structure page separating content and presentation (as is the case with CSS and HTML) is good practice and goes far beyond SEO into accessibility – see the Web Accessibility Initiative for further details (http://www.w3.org/WAI/).
  3. This is not an entirely complete representation as there is also an optional port number and fragment part of the URL
  4. I use the term ‘generally’ as this is not entirely correct. it can also contain number, full stops (‘.’) (or periods in some countries), pluses (‘+’) and the hyphen (‘-‘) characters
  5. There is speculation that some search engines will begin to post data through forms in an attempt to dig deeper within the site to discover more information from the site. It can be seen as a natural progression that SEPs will continue to attempt to refine their spiders so that more of the site is surfaced to users.

Like my footnotes?
Want to add footnotes to your blog?
They can be added easily to your WordPress installation