As you’re browsing the web, you’ll likely find that many product catalogs, press clippings, manuals and handbooks are presented in PDF rather than HTML format. Particularly if you started your site with little SEO knowledge, you may have uploaded many PDF documents to your domain and be reluctant or unable to translate these into HTML pages. For product catalogs and flyers, PDF is often an easy file structure to work with and it’s very simple to save a document created in a format such as Adobe InDesign for print into a PDF for web upload.
While it’s not advisable to inundate your website with PDF documents, the good news is that you can use them sparingly, safe in the knowledge that Google is better than it’s ever been at extracting their meaning and indexing them as they would a HTML file. There are just a few things to keep in mind…
1. Duplicate Content Rules Still Apply
The rules relating to duplicate content still apply for PDF files so it’s important to make a choice about your content and then stick to it – is it going to be presented as a PDF file or a HTML one? Don’t be tempted to try and increase your page count by having a HTML version of every PDF page in the hope that both will be indexed. PDF formats are best for design-led documents that may lose their impact when translated to HTML (for example, your new Fall catalog) so there shouldn’t be a need to have two versions.
If you want to offer a HTML version of a PDF page for usability purposes, be sure to indicate which version of the page (either the PDF or the HTML) is your preferred one for search engine purposes. You can do this at sitemap and document level.
2. Titles are still all powerful
Anyone who has spent any time consulting with a search engine optimization consultant or tackling their own on-page optimization tasks will know that the title is the single most important part of the page. This emphasis still holds true for a PDF file residing on your domain. Google advises that is pulls information from two sources when determining the title of a PDF. The first is the title attribute, within the metadata of the file, and the second the anchor text of any links pointing to the PDF page.
While you can’t control inbound links, you can take the time to specify an optimized, descriptive and unique title when first creating the PDF document and follow this through with a considered approach to inter-site links. Even if you only link to the PDF from the sitemap, be sure to use relevant, keyword-rich anchor text, applying the same link rules as you would when creating any other internal links on your site.
3. Links can still pass PageRank
You’ll likely know that it’s possible to include links in PDF files but did you know that it isn’t possible to specify a link as ‘nofollow’. This quirk is something to keep in mind as any link you include in a PDF page will be capable of passing on PageRank.
While you may not mind sharing PageRank out amongst your own pages, this could be a problem if you refer to external sources or third party sites. Publishing a white paper in PDF and having links in the reading list of bibliography is one instance when this would be a problem. A second may be having a review of a product and then a link to the original publisher of the review – while both instances will add credibility to your work , they also mean passing on PageRank.
Keep the above tips in mind when using PDFs on your site and you’ll be well on the way to making the most of their presence.