I have been asked a few times lately about session IDs – what they are, why they’re important and if they have any bearing on search engine optimization activity. A lot of the time, these more technical or code specific questions are asked too late, i.e., the site has been built, tested and launched and the owner is pretty much stuck with them unless they have any budget left to re-write whole chunks of their site code.
In simple terms, a session ID is simply an identifier that is assigned to you. Much like your own name, a session ID is given to you on first sight – in this case when you log on to a website – and is used to track your journey from page to page for the duration of your visit. A session ID will be generated as a list of numbers and is assigned to you the visitor by the website’s servers. The factors used to determine a session ID depend on the site itself and the preferences or needs of the administrator. They could be generated to include the date and precise time of visit for example, should such data be important to a particular website.
If your site is an ecommerce one, you’re likely most familiar with seeing session IDs when a shopping cart is used. As visitors to your site add products to your cart, they can also continue to browse the site and add new products from time to time until their shopping is complete. A session ID would be used here to keep track of the correct contents of the cart.
Typically, a session ID will only last for the length of the visit. So, if you open a page on a site and are assigned a session ID, then accidentally close your browser window, when you navigate back to that same page you’ll see a different session ID tacked on to the end of the URL. While this won’t have any impact on your user experience, it can be a quagmire of problems for search engine spiders, those wishing to link back to a particular page on the site and therefore, ultimately cause problems for the site owner.
Imagine you visit the products page of your local supermarket, which uses session IDs as part of its online shopping cart. When you log in on Monday evening to do your weekly shop, a session ID is assigned to keep track of you, creating the following unique URL: www.yourlocalsupermarket.com/products/milk_sessionid=11111112. You then shut down your browser by accidentally clicking on the cross, and need to re-launch your browser then navigate back to the site to start your shopping all over again. Now you’re back on the site, a session ID is assigned to once again keep track of you and the following unique URL is created: www.yourlocalsupermarket.com/products/milk_sessionid=898989789. Although you are looking at exactly the same page as before, a totally new URL has been created for you. Herein lies the first SEO issue with session IDs; duplicate content. Although the two pages are the same, the two different URLs mean the search engine spiders can interpret them as two unique pages which it then finds are the same. This causes a problem for the search engines because each time their spiders visit, a new session ID will also be created, causing it to mistakenly add the same pages to its index on each visit. When these pages are investigated more thoroughly, the search engine discovers that what it thought was 40 different pages are actually 10 unique pages indexed three times too many.
The same problem arises when users start to link back to the pages on your site. When they first tap in your web address, they are assigned a session ID. Now, most website users won’t know to remove this identifier when they post a link back to your site on a blog or bookmarking site or similar. The next user comes along then arrives at the site via a link that contains a session ID – this can make the link appear to be broken, when in reality the session ID token has simply expired. Additionally, that same user may be assigned their own session ID to replace the outdated one, creating more pages of duplicate content.
Large sites suffer particularly badly in the search engine rankings when using session IDs to track visitor progress through a site because of the sheer volume of duplication that is generated through regular activity. Often, the search engines will simply stop spidering new pages or, will start to remove pages from their index. So, what to do?
Google does offer a parameter control tool through their suite of webmaster workarounds which will allow you to tell the search engine to ignore anything after the session ID field. This simply advises the search engine to ignore anything in the URL after that parameter which helps with the duplicate content issue but not with inbound linking. There are also some CMS systems that offer a similar capability and will simply withhold the session ID portion of the address from the search engines. Again though, this doesn’t totally solve the inbound link problem.
A far better longer term solution is to stop the use of session IDs on your website. While it may be too late to invest in coding to resolve the issue on this incarnation of your website, it is something to keep at the forefront of your development wish list next time around.