Detecting Link Spam


These days there are many attempts to artificially manipulate "PageRank," which I'm loosely defining as a rough guide to how many incoming links there are to a website, and which can have a strong correlation to search engine ranking for keywords. But search engines are by no means stupid. My friend Mike Grehan pointed me to the following article:

Zoltбn Gyцngyi, Pavel Berkhin, Hector Garcia-Molina, and Jan Pedersen, "Link Spam Detection Based on Mass Estimation," Technical Report, Stanford University, October 31, 2005. PDF document. This is based on a Stanford student's summer internship at Yahoo! http://dbpubs.stanford.edu:8090/pub/2005-33

Essentially, Gyцngyi concludes that sites with an artificially high PageRankcan be easily detected by estimating the contribution to their PageRank from "spam mass."

While this article does not report what search engines currently do, it's very existence tells us which direction search engine scientists are thinking -- many steps beyond today's spammers. BTW, my calculus was at a C-minus level in college many years ago and is by no means up to the math in this article. But I thought that scanning the article would give you a greater appreciation of the level of sophistication at which search engines are working to improve search results and discount spamming attempts.

The lesson here: Make sure the incoming links to your site are from legitimate directories and complementary sites, not from a network of sites that only exist to improve PageRank. If you're careful not to take shortcuts, you'll be on firmer ground when the next round of search engine algorithm changes shakes up rankings yet again.



Published: Feb 8th, 2007 / 06:27pm
Source: http://www.wilsonweb.com/linking/link-spam.htm


Related articles: