Co-occurrence, Predictive Phrases and LSI
Co-occurrence is the percentage of websites that contain
both the main theme keyword (or keyphrase) and a secondary
keyword (synonym) as well.
Does this sound a little complicated?
Well it can be if you want to dig through mountains of
technical jargon, Google patent applications and geeky
mathematical calculations.
Here is a simple explanation:

Theme of Page - The main keyword or theme of any given
web page in your website
Keyword (synonym to the theme of page keyword) - Using
the LSI concept not only would you use the main theme
keyword in your pages but you would also include synonyms of
that keyword in your content.
Co-occurrence is the percentage (%) of web pages that
contain both the Theme of the page (keyword) AND the keyword
(sysnonym).
For example...
Let's say you have a page whose main keyword (theme) is
"cats". Also you have included a synonym for this keyword in
your content to establish "theme density" to take advantage
of the LSI theory. The synonym you choose is "kittens".
Here is a snippet of what the content of your page may
look like:
Cats
Cats
are wonderful pets that have been co-existing with
humans for thousands of years. Cats were first
domesticated in ancient Egypt where they were raised
from kittens
to co-exist with their owners...
As you can see in the sample above, the main theme of the
page is "cats" and in the body of the content a synonym
"kittens" is added to give more theme density
to the article.

Co-occurrence Explained
Watch this video:
co-occurrence
In order to "roughly" figure out the co-occurrence you
can use Google.
- Find the number of competing pages for the word
"cats"
- Find the number of competing pages for the word
"kittens"
- Find the number of competing pages that contain the
word "cats" and "kittens"
- Divide the number of pages that contain the word
cats and kittens by the number of pages that come up for
the word "cats"
Pages with "cats" & "kittens" DIVIDED BY pages with
"cats" = co-occurrence
At the time of this writing I did a search on Google to
reveal:
"cats" has 102,000,000 pages
"kittens" has 12,200,000 pages
"cats" and "kittens" has 1,860,000 pages
I then divide 1,860,000 (number of pages that contain
cats and kittens in the content) by 102,000,000 (number of
pages that have the word cats). This gives me a
co-occurrence percentage of 12%.
This means that of all the pages indexed by Google 12% of
the pages that are themed for the word "cats" also have the
word "kittens" within their content.
The higher the co-occurrence... The better and more
relevant you secondary keywords (synonyms) will be.
Watch this video:
Figuring Co-occurrence
To dig a bit deeper into the theory of co-occurrence and
the Google Patent Applications behind them please visit:
Phrase Based Information Retrieval
Carefully choose the theme synonyms for your content
Be sure that the co-occurrence is as high as possible and
avoid using the same core keyword over and over again as
this may tilt Google's spam flag.
Remember... This is all theory but the theory is based on
research of Google's patent applications. So in order to
protect yourself against future or present algorithm changes
it is best to adhere to this theory.
This theory makes your content more relevant which has
been proven to produce higher rankings so what do you have
to lose?
The only reason we call all of this "theory" is because
Google has not publicly stated that these algorithms are in
effect. However, careful testing reveals that it is in
action and the theory stands up with the real-world results.
Predictive Phrases
Phrases (or keywords) that have a co-occurrence of other
words (synonyms) may be a "predictive phrase". A predictive
phrase is a phrase (or keyword) that "predicts" the
occurrence of other words or phrases.
As in the "cats" example above, the word "cats" may
predict that the word "kittens" will also appear 1.5% of the
time on pages that are themed for the word "cats".
William Slawski from SEO By The Sea Says:
| |
An example of the predictive ability of good
phrases: The phrase “President of the United
States” predicts other phrases such as “George Bush”
and “Bill Clinton.”
Other phrases may not be predictive, such as
“fell down the stairs” or “top of the morning,” “out
of the blue.” Idioms and colloquisms like these are
widely used, and often appear with many other
different and unrelated phrases. Looking at how
frequently phrases co-occur on individual pages,
within the whole collection of indexed pages, can
tell us whether or not the appearance of one phrase
might be used to predict the appearance of another.
|
| |
I will not go into any more discussion about this
publicly. We will be teaching this within a structured
environment at University 20/20.
Theme Density
As you may have already learned in The Master Plan, the
game of high rankings is no longer about Keyword Density
(the number of times a specific keyword appears on a page to
make the page "relevant" to that keyword). It IS
about Theme Density which is the amount of
synonyms related to the "core theme" that appear on a page.
A new plan for 2007 has been created. It discusses the
combining of Pay-Per-Click Siloing with SEO Siloing. This is
a strategy that has been working successfully and it is
revealed here:
The Plan 2007
To see all lessons in this series (so far) please visit:
LSI
Until next time,
Charles Heflin
Professional SEO Advisor / Consultant
SEO 20/20 / University 20/20
|