A Comprehensive Look at Google PageRank: Past and Present
Table of Contents
PageRank is an algorithm that revolutionized the way search engines evaluate and rank web pages. Developed by Larry Page and Sergey Brin, the co-founders of Google, it played a pivotal role in Google’s early success and continues to be an essential component of their search engine’s ranking system.
At its core, PageRank is designed to improve the quality of search results by measuring the importance of a web page based on the links it receives from other pages. The underlying concept is akin to a popularity contest – the more incoming links a page has, the more “votes” it receives, indicating its significance and relevance on the web.
Similar to the concept of the “impact factor” used for journals, PageRank considers the number of links pointing to a web page as a measure of its importance. Nevertheless, unlike a straightforward citation count, PageRank introduces a nuanced perspective by assigning varying levels of importance to links based on their sources. By incorporating both link analysis and content evaluation, Google gained a competitive edge in delivering more relevant search results, turning links into the prime currency of the web.
Want to know more about what my google page rank? Let’s dive in more
PageRank remains a part of Google's algorithms.
When it comes to modern SEO, PageRank is one of the algorithms that make up Experience, Expertise, Authoritativeness, and Trustworthiness (E-E-A-T).
PageRank has also been proven to play a crucial role in determining crawl budgets. It stands to reason that Google prioritizes more frequent crawling of significant pages.
PageRank serves as a crucial signal for canonicalization. Pages with higher PageRank are prioritized as the definitive versions that are indexed and presented to users.
Dissecting the Errors in the PageRank Formula
Unbelievable but true: The formula stated in the original PageRank paper turned out to be incorrect. Let’s explore the reasons behind this.
The founding paper defines PageRank as the likelihood distribution—showing how probable it is for a user to find themselves on any specific web page. Thus, adding up the PageRank values for all the web pages should ideally equal 1.
The complete PageRank formula, as presented in the original 1997 paper, is as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
Assuming a damping factor (d) of 0.85, as mentioned in Google’s paper (I will explain the damping factor shortly), the content can be simplified.
PageRank calculation involves adding 0.15 to 0.85 times the portion of PageRank passed from linking pages to their connected pages.
According to the paper, it states that the sum of the PageRank for every page should equal 1. However, this seems impossible when using the formula provided. Each page would have a minimum PageRank of 0.15 (1-d), and even with just a few pages, the total would exceed 1. It is not feasible to have a probability greater than 100%. Clearly, something is amiss here!
To ensure the formula functions as intended, it should incorporate the division of (1-d) by the total number of web pages. Here is the revised version:
The PageRank of a page can be calculated as follows: it is the sum of two components. The first component is a fraction of 0.15 divided by the total number of pages on the internet. The second component is 0.85 multiplied by the PageRank of each linking page, which is then distributed across its outbound links.
Let’s understand with example:
Each page is assigned an initial PageRank score based on the incoming links it receives. For instance, in a scenario where we have five pages without any links, each page will have a PageRank of (1/5) or 0.2.
The score is subsequently distributed to other pages via the links present on the page. By adding some links to the aforementioned five pages and recalculating the PageRank for each, the resulting outcome is as follows.
The PageRank formula incorporates a “damping factor,” denoted as “d” in the formula, to simulate the likelihood of a random user persistently clicking on links while navigating the web. This factor accounts for the probability of continued user engagement during web browsing.
As you start your visit to a webpage, the probability of clicking a link on the initial page is notably high. Nevertheless, as you advance to subsequent pages, the likelihood of clicking another link gradually decreases. This behavior persists as you continue your navigation.
The strength of a page’s link to another page plays a pivotal role in value transmission. If the link is in close proximity, it carries a substantial value. However, when the link is situated four clicks away, the damping factor causes a notable reduction in the value passed.
Exploring The History Of Page Rank
On January 9, 1998, the first patent for PageRank, named “Method for node ranking in a linked database,” was filed. This patent’s validity came to an end on January 9, 2018, without undergoing renewal.
Google publicly introduced PageRank when they rolled out the Google Directory on March 15, 2000. This directory, which utilized PageRank for categorization, was based on the Open Directory Project. Unfortunately, it was terminated on July 25, 2011.
The Google toolbar, featuring PageRank, was unveiled on December 11, 2000, and this version of PageRank garnered keen attention from SEO enthusiasts.
PageRank data within the toolbar underwent its final update on December 6, 2013, and it was eventually phased out, disappearing from the toolbar on March 7, 2016.
The PageRank exhibited in the toolbar differed slightly, employing a simplified 0-10 numerical scale for representation. Nevertheless, it’s worth noting that PageRank is fundamentally structured on a logarithmic scale, where the difficulty of attaining higher values increases progressively.
On November 17, 2005, PageRank became a part of Google Sitemaps, now recognized as Google Search Console, and was presented in categories like high, medium, low, or N/A. Nevertheless, this feature was eliminated on October 15, 2009.
SEO experts have, over the years, employed a range of techniques to game the system, seeking to increase PageRank and achieve superior search rankings. Google has compiled an extensive list of link schemes, which encompasses various strategies and tactics.These include:
- The act of buying or selling links, which entails the swapping of links in exchange for money, goods, products, or services.
- Abundant link exchanges.
- Automation of link creation through software.
- Requiring links as a component of terms of service, contracts, or other formal agreements.
- Text-based advertisements that omit nofollow or sponsored attributes.
- Advertorials or native advertising material featuring links that convey ranking value.
- Content articles, guest submissions, or blog posts with carefully crafted anchor text links.
- Inbound links from low-quality directories or social bookmarking sites.
- Concealed links with an abundance of keywords or low-quality links that are enclosed within widgets and deployed on various external websites.
- Links that are broadly deployed in footers or templates.
- Forum contributions that include thoughtfully optimized links, visible in either the main post or the user’s signature.
Let’s examine the major developments in the ongoing evolution of link spam prevention systems. These updates have played a crucial role in combating the persistence of spammy links on the internet.
1. No Follow
In a significant development on January 18, 2005, Google unveiled a strategic partnership with leading search engine players. This partnership was established to introduce the rel=”nofollow” attribute, encouraging internet users to incorporate it into blog comments, trackbacks, and referrer lists as an effective countermeasure against spam.
Below is a snippet from Google’s formal statement about the introduction of the nofollow attribute.
The nofollow attribute is utilized on blog comment links by nearly all up-to-date systems.
SEO experts were quick to exploit the nofollow attribute, using it for a technique called PageRank sculpting, wherein they selectively applied nofollow to specific links on their pages to enhance the importance of others. In response, Google implemented changes to the system to deter this kind of manipulation.
Matt Cutts of Google made an important announcement in 2009, confirming that this approach would no longer be viable. He stated that PageRank distribution would occur across all links, regardless of the nofollow attribute, but the actual flow of influence would be restricted to the following link.
On September 10, 2019, Google unveiled a pair of new link attributes, namely “ugc” and “sponsored,” as an extension of the traditional nofollow attribute. “ugc” is used to identify user-generated content, while “sponsored” is employed to signify links that involve payments or affiliate associations.
Strategies to Combat Link Spam
As the SEO community continued to explore innovative link gaming techniques, Google remained proactive in devising new algorithms designed to identify and combat link spam effectively.
When the original Penguin algorithm was first launched on April 24, 2012, it inflicted significant harm on many websites and their owners. Nevertheless, in October of the same year, Google provided a lifeline by introducing the disavow tool, offering website owners a way to recover.
With the launch of Penguin 4.0 on September 23, 2016, Google implemented a beneficial transformation in its approach to combating link spam. Rather than negatively impacting websites, this update focused on devaluing spammy links, resulting in reduced reliance on the disavow tool for most websites.
Google made significant advancements in combating link spam, starting with the launch of its inaugural Link Spam Update on July 26, 2021. Building upon this progress, Google introduced a subsequent Link Spam Update on December 14, 2022, featuring the utilization of an AI-based detection system known as SpamBrain, which effectively neutralizes the impact of unnatural links.
Changes in PageRank Over Time
A former employee at Google has stated that the original version of PageRank was retired in 2006. Instead, Google adopted a different algorithm that demanded fewer computational resources, marking a significant shift in their approach.
In 2006, Google swapped out its original PageRank algorithm for a faster alternative, maintaining a similar name in the toolbar. The new algorithm provides roughly similar results but significantly speeds up computation. Both algorithms have a complexity of O(N log N), but the replacement has a much smaller constant factor in the log N component. This efficiency was vital as the web grew from 1-10 million pages to a massive 150 billion.
Reflecting on the past, PageRank used to undergo frequent changes with each iteration, but it seems Google has now made the system more straightforward and less variable.
What other developments have occurred?
Not all links carry the same value.
Google moved away from the equal distribution of PageRank to a system where certain links held more significance than others. This change, suggested by patent information, hints at a shift from a reasonable surfer model to a reasonable surfer model, which factors in the likelihood of specific links being clicked.
A few links go unnoticed or are not considered.
Numerous mechanisms have been implemented to devalue specific links, and some of these have been previously discussed, encompassing:
The utilization of attributes like nofollow, UGC, and sponsored.
- Google’s Penguin algorithm.
- The availability of the disavow tool.
- Updates targeting link spam.
Furthermore, Google does not take into consideration links present on pages that are excluded via robots.txt directives. The search engine’s inability to access these pages means that any linked content remains uncounted and unindexed. This practice is presumed to have been operational since the outset of Google’s existence.
Ready to Chat About
A Comprehensive Look at Google PageRank: Past and Present
Drop us a line today!
Links are sometimes aggregated.
Google utilizes a canonicalization mechanism to identify the preferred version of a page for indexing and to amalgamate signals from duplicate pages into that primary version.
Canonical link elements were unveiled on February 12, 2009, granting users the ability to designate their preferred version.
In the beginning, it was stated that redirects passed the same level of PageRank as a standard link. Nevertheless, this process eventually underwent a change, and as of now, no PageRank is lost through redirects.
There are still aspects that are not fully understood.
When pages are designated as “noindex,” the precise manner in which Google manages the associated links remains ambiguous, and there is conflicting information, even among Google’s own representatives.
As per John Mueller’s guidance, pages bearing the “noindex” tag will eventually be considered as “noindex, nofollow,” leading to the eventual halt of value transfer through their links.
These statements, while not in direct opposition, indicate that, according to Gary, Google’s continuous crawling and link counting could endure for a considerable duration, potentially extending indefinitely.
Is there still a way to check your PageRank status?
Access to Google’s PageRank data is not currently an option.
URL Rating (UR) emerges as a suitable replacement metric for PageRank due to its resemblance to the PageRank algorithm. UR assesses the potency of a webpage’s link profile using a 100-point scale, where a higher numerical value indicates a more robust link profile.
PageRank and UR are metrics that incorporate both internal and external links when determining a webpage’s strength. In contrast, many other widely-used industry metrics ignore internal links entirely. My assertion is that link builders should place greater importance on UR as opposed to metrics such as DR, which exclusively take into account links from external sources.
Nonetheless, it’s worth noting that UR doesn’t replicate the exact behavior of PageRank. URL overlooks certain links and excludes those marked as nofollow.The precise links that Google excludes and the influence of user-initiated link disavowals on PageRank calculations remain unknown. Additionally, our approach to handling canonicalization signals, such as canonical link elements and redirects, may lead to different outcomes.
It’s advisable to incorporate UR into your analysis, with the awareness that it might not replicate Google’s system in every detail.
What steps can you take to boost your PageRank?
Since PageRank’s foundation lies in links, the key to elevating your PageRank lies in acquiring improved links. Let’s delve into the available possibilities.
1. Implement redirects for pages that are broken
Redirecting obsolete pages on your website to relevant new pages can help recapture and consolidate important signals like PageRank.Websites undergo changes over time, and it’s apparent that many neglect to implement proper redirects. This strategy might offer the most straightforward improvement, considering these links already point to your site but presently don’t contribute to its impact.
Here’s a method to identify these prospects:
- Insert your domain into Site Explorer (also accessible at no cost within Ahrefs Webmaster Tools).
- Proceed to the Best by links report.
- Implement a filter for HTTP responses that indicate “404 not found.”
2. Internal Links
Backlinks are not consistently manageable. Visitors have the liberty to link to any page on your site and utilize their preferred anchor text as they see fit.
When it comes to internal links, you wield absolute control.
Apply internal linking where it serves a purpose. For example, consider increasing the number of links to pages that carry higher importance for your goals.
3. External Links
You also have the option of obtaining additional links from other websites to boost your PageRank.
Google’s ongoing utilization of links and PageRank is unmistakable, even in the face of PageRank’s transformation.Although we may not have comprehensive insight into all the underlying details, the impact of links remains readily observable. Additionally, Google once experimented with the idea of excluding links from its algorithm but opted not to pursue that course.
About the Author
My name’s Semil Shah, and I pride myself on being the last digital marketer that you’ll ever need. Having worked internationally across agile and disruptive teams from San Fransico to London, I can help you take what you are doing in digital to a whole next level.