
Avoid Theme Bleeding - How to
In the last tutorial you learned about a phenomenon
called theme bleeding.
In this lesson you will learn how to avoid theme bleeding
by using a method that keeps the search engines from
following links that may cause this.
User navigation is one of the main ways that theme
bleeding occurs.
Say that you have a pet website and you have written a
page of content about cat toys. Because you want your
visitors to be able to quickly find other information on
your site you will have a user navigation menu.
In the user navigation you have a link that leads to dog
beds which are not related to cat toys.
The search engine spiders will follow this link from cat
toys to dog beds and now your theme of cat toys is bleeding
into dog beds. From the perspective of a search engine
robot, this could play foul on your rankings.
You may not rank as high as you could if theme bleeding
was eliminated.
How can you eliminate theme bleeding?
In The Master Plan I discussed using the nofollow
attribute on links that you don't want followed.
This is a normal link:
<a href="http://www.yoursite.com/dogbeds.html">Dog
Beds</a>
This is how the link would look after the nofollow attribute
has been added.
<a href="http://www.yoursite.com/dogbeds.html"
rel="nofollow">Dog
Beds</a>
The nofollow attribute is shown in bold red above.
The only problem with the nofollow is the fact that
search engine spiders do still follow these links.
I went to work looking for a solution to this issue and
after collaborating with several Master Plan members we
concluded that the Robots.txt method is the best method to
use to keep the robots at bay and protect your themes.
I will discuss this method shortly but first I want you
to understand a point.
Many people ask me why would you not want a robot to
follow all the links in your site.
My answer usually goes something like this...
You do want the robots to follow all the links in your
site so they become indexed and ranked in the search
engines. However, you must control when and where the robot
follows a link to pages in your site.
The robots.txt method gives you power over how the robots
spider your site.
You are simply saying, "hey robot, I don't want you to
see a link from my cat toys page to dog beds within the
context of cat toys".
You will show the robot the dog bed information only when
the robot is spidering your dog bed information... not while
it is spidering cat toy information.
This is the basis of theme bleeding and you should avoid
it to gain a boost in your search engine rankings.
The Robots.txt Method
Robots.txt is a simple text file that sits on your server
in the same directory where your home page exists.
It is easy to see examples of robots.txt all over the
Internet. To see one simply go to any websites domain and
then type "/robots.txt".
For example:
To see the robots.txt file for seo2020.com go to:
http://www.seo2020.com/robots.txt
Here is the robots.txt file for seo2020:
User-agent: *
Disallow: /seo-articles/tag-and-ping-primer/
Disallow: /cgi-bin/
Disallow: /img/
Disallow: /web-samples/
Disallow: /directory-submissions/
Disallow: /download/
Disallow: /d/
Disallow: /co-op/
Disallow: /seo-video-tutorials/
Disallow: /tmp/
We are basically telling the search engine robots not to,
under any circumstance, to ever spider and URL in any of the
directories listed.
Now on to the next part of this tutorial...
You see the line I have in bold red above.
I just happen to have a URL redirect script inside my /cgi-bin/
directory.
Say I want to link to a totally irrelevant site from this
site without being penalized in any kind of way.
Say I want to link to a cat website that has nothing to
do with the theme of this tutorial: fanciers.com
The HTML would look like this:
<a href="http://www.fanciers.com">Cat Fanciers</a>
I would surely get penalized for linking to non related
information if I posted the link this way.
So instead I will use my robots.txt and link to a script
inside my /cgi-bin/ directory which will redirect the click
to fanciers.com.
The HTML looks like this:
<a href="http://www.seo2020.com/cgi-bin/redirect2.cgi?http://www.fanciers.com>Cat
Fanciers</a>
To prove my confidence in this method I will post a link
to Cat Fanciers website right here:
Cat Fanciers
As you can see, the URL in the link points to the /cgi-bin/
directory which the robots are not allowed to follow so no
theme bleeding will occur.
You can use this same strategy to control when a robot
follows an internal link within your website. Use it in your
user navigation links to avoid theme bleeding or any other
link in your site that is not directly related to the
subject of the page where the link exists.
Here is the cgi code to place a simple redirect script in
your cgi-bin. To use this code simply upload this code to
your /cgi-bin/ directory on your server. Then any link you
want to point to but don't want the robots to follow is
easy.
Above I used the following URL:
http://www.seo2020.com/cgi-bin/redirect2.cgi?http://www.fanciers.com
The cgi script will redirect to any URL you place after
the ? mark.
Here is the code:
| |
#!/usr/bin/perl
@date = localtime(time); $date[4]++;
$Time = "$date[4]/$date[3]/$date[5]";
$Query_File = $ENV{QUERY_STRING};
$Query_File =~ s/%([0-9A-F][0-9A-F])/pack("C",oct("0x$1"))/ge;
$Query_File =~ tr/+/ /;
($url) = split(/\&/,$Query_File);
if ($url =~ /=/) {
($name, $url) = split(/=/, $url);
}
if ($url =~ /^(ht|f)tp:\/\//) {
print "Location: $url\n\n";
} else {
&Error("Your URL sould be begining by http:// or ftp://\n");
}
sub Error {
my($ErrorText) = @_;
print "Content-type: text/html\n\n";
print "Error: ".$ErrorText;
exit;
} |
|
| |
Just copy the code above into notepad and then save your
new notepad document as "redirect.cgi" then upload it to
your /cgi-bin/ directory.
Another Strategy
Those of you who don't run cgi scripts on your server
will want to use a simple meta redirect.
In this case you could create a new directory on your
server and call it whatever you want. For this example I
will call it "r" for redirect.
Then you want to disallow this new directory in your
robots.txt
Just add the line:
Disallow: /r/
Now for each link that you want to redirect you simply
create a simple HTML page with a meta refresh or java
redirect.
Here is an example of the meta refresh:
| |
<html>
<head>
<meta http-equiv="refresh" content="0;url=http//www.url.com">
</head>
<body>
</body>
</html> |
|
| |
Place the web page above in your new /r/ directory and
replace the http://www.url.com with the site you want to
link to.
The only drawback to using this method is that you have
to create a new redirect page for every link that you don't
want the robots to follow.
In the next lesson I will be discussing very basic web
site silo design and the concept behind it.
Until next time,
Charles Heflin
Professional SEO Advisor / Consultant
SEO 20/20
|