The Yandex Leak: How Their Ranking Factors Affect You
Blog Home News & Fun
In late January 2023, a former employee at Yandex leaked part of their source code repository. Amongst other things, this revealed over 1900 factors used for search rankings.
This was the biggest leak in the search community since AOL’s search logs leaked in 2006, which gave some hilarious, dramatic, and quite disturbing insight into searcher behaviour. For observant SEOs, the Yandex leak provided a first of its kind; an under-the-hood exposé of how a search engine works.
True, Yandex isn’t Google, but the insights gained from studying the leaked ranking criteria can help any SEOs understand how search engines truly interact with the websites and content they produce.
Read on for our key takeaways from the Yandex leak and how they can help make you a better SEO.
What Is Yandex?
Yandex is the biggest tech company in Russia and is best known for its search engine of the same name. Much as with Google, the company actually has many related interests and investments alongside search from advertising and emails to self-driving cars and smart-home devices.
Yandex has a global market share between 0.85% and 1% as of the end of 2022.
This, of course, pales in comparison to Google’s 93% market share, but it’s not too far off Bing’s 3% and Yahoo’s 1.5% share.
Within Russia, however, the numbers tell a different story, with Yandex accounting for 62.6% of all searches within Russia between October 2022 – December 2022.
Importantly for those who are still unsure about the relevance of this leak in light of Google’s dominance, Yandex was set up by a group of ex-Googlers.
They hire ex-Googlers all the time, attend the same conferences… Cross-pollination of ideas and methods is inevitable with this in mind.
What Secrets Did The Yandex Leak Reveal?
Well, it revealed a lot and not a lot. Typical SEO answer, eh?
Amongst the 1,922 leaked search ranking factors, there were some surprising and not-so-surprising inclusions.
Want to review all 1,922 ranking factors? Check out this spreadsheet, including a breakdown of each factor here.
Yandex Scraped Google
The first key takeaway is that Yandex scraped Google and other large sites.
This is big, and might cause some SEOs to dismiss the results. After all, if it’s just a copy of Google why not just focus on Google?
The thing is Yandex isn’t Google, and there’s no indication that they directly used the data within their own results. Instead, it’s much more likely that they used the data to compare their own results to Google.
Yandex shares structural similarities to Google (again, unsurprising given the ex-Googlers involved) meaning these revelations can still be applicable even when optimizing for other engines than Yandex.
The Specific Ranking Factors Were Revealed
Amongst the least surprising factors were things like:
- Page Rank
- Link Age
- Content Age
- Link Relevancy
- Textual Relevancy
- Website Reliability
- User Experience
- Dwell Time
- Preferential Sites (eg. Wikipedia)
Some of the more surprising and useful results were those that revealed just how much emphasis Yandex places upon quality-related metrics.
Malte Landwehr handily broke it down in this thread:
In the leaked Yandex source code, there were 40+ quality-related ranking factors.
There are mainly divided into quality of:
Let’s dive in 🧵👇#YandexLeak pic.twitter.com/sfrATl0ROo
— Malte Landwehr (@MalteLandwehr) January 29, 2023
‘Quality’ was primarily focused upon three key pillars:
- Host Quality
- Page Quality
- Text Quality
Host quality refers to the determined quality of the site generating the link. This is assessed across a number of different sub-points, including the freshness and quality of the linking content.
Page quality includes the 404 status of pages, broken links, and broken embedded material.
Text quality includes the grammatical correctness of the copy, contextual relevance, and whether or not it has been generated by AI.
Ranking Factors Have Defined Weighting
Not only did the leak reveal the ranking factors, it revealed the defined weighting for them too.
Louder for the people in the back:
WEIGHTS FOR YANDEX’S RANKING FACTORS ARE IN THIS THING! https://t.co/HLrQ2z5UeJ
— Mic King (@iPullRank) January 29, 2023
We’ve long known that not all ranking factors are created equal but it’s fascinating to be able to see this so definitively laid out.
We’ve created a handy takeaway graphic of Yandex’s Top 5 positive and negative search factors based on Michael King’s excellent summary here.
What Should SEOs Learn From The Yandex Leak?
Search Is Even More Dynamic Than You Think
We’ve focused on the 1,900 leaked factors but there is far more to the Yandex database than just this. Taken together the total number of available factors is beyond 18,000 and may even pass 20,000.
The thing is, not all of these are in play at the same time. Sure, some have been deprecated, but many have not, and they are used dynamically to best relate to the user’s query.
Different factors can be given different prominence based on the structure and inferred intent of the user’s query.
This shouldn’t really be surprising, but it’s worth internalizing. With this many factors, it’s simply impossible for anyone to fully understand. It’s functionally a black box at this stage.
This shouldn’t get SEOs downhearted, though. With this leak, you can see where they start from, and you can see the effort they put in to ensure quality is prioritized above all else.
There are metrics for site quality, link quality, and content quality.
It’s reassuring to know that, in time, good quality content should always rise to the top and be given due prominence both in search results and the authority rewarded by links that come from such good quality content.
Over Optimization Is An Active Detriment To Your Domain
This one is big.
The aforementioned weightings not only have a preferential scale, they also have defined upper bounds.
It’s actually possible to score too highly in certain factors causing them to loop around to being a negative influence on your site.
The factors with upper bounds are classic ones targeted by SEOs for optimization. Anchor texts, CTR, keyword density – all of these have upper bounds at which point they become a net negative for your domain.
This should actually be reassuring to SEOs who do not engage in black hat or spammy techniques. Optimization to this extreme extent can only occur if you are actively manipulating your inbound links and over-stuffing your site. Smart SEOs know not to do this and will be relieved to see other sites will be actively punished for engaging in this behavior.
I know Yandex isn’t Google, and I know I’m a week late to the party, but the thing that’s stuck with me the most from the Yandex algorithm leak is the ‘percentage of site views from search’ ranking factor
Must. Diversify. Traffic Sources.
— Luke Jordan (@lr_jordan) February 6, 2023
Over-optimization should be avoided with a diverse backlink profile being well-rewarded.
For example in your link building campaigns, you should always look to mimic natural acquisition with your link building strategy.
Local Results Hit Close To Home
Of the 1,922 leaked factors, 319 can be directly attributed to local search or localization services.
Not all of these factors will be relevant with every search, of course, but it demonstrates the great pains the engine goes to when ensuring geographic relevance for searchers’ results.
SEOs need to bear this in mind when optimizing their client’s sites and links. Google Business Profile listings, local citation building and on-page optimization are all vitally important to ensure you’re serving locally-relevant results to local searchers.
Google Isn’t The Be-All-End-All Of Search
Between this timely reminder and the ChatGPT-powered resurgence of Bing it’s important to remember that Google isn’t the only search engine. America Online, Yahoo, even Ask Jeeves. All of these are search engines that have come and gone. There’s no such thing as ‘too big to fail’ and that applies even to Google.
Every few months we hear about potential monumental changes to silicon valley, such as the on-going Section 230 hearings. Any one of these Supreme Court challenges has the potential to fundamentally change search, and even the very essence of the internet, at any time.
SEOs should always stay on their toes for potential changes, both within Google and without and it’s important to keep your finger on the pulse of what is happening elsewhere too.
This potential for political intervention leads onto the next point…
The Politics Of Search
One perhaps underappreciated aspect of the Yandex leak is the insight into the politics of search.
Notably, Yandex considers a user to be Russian if they use the Russian language version within Ukraine, Belarus, or Kazakhstan.
More interesting stuff from #YandexLeak . Yandex considers you ruzzia if you’re in Ukraine, Belarus or Kazakhstan and you use rus interface and rus language pic.twitter.com/S040gjEbMI
— RayzRazko13 🇺🇦 🇨🇦 (@RayzRazko13) February 6, 2023
In the context of the current geopolitical situation, that is a big statement that they are fundamentally making on behalf of their users.
This comes alongside reports that Yandex actively shuts down queries that involve Putin in a derogatory or obscene manner.
Perhaps confusingly one of the positive search factors is FI_IS_NOT_RU: +0.08128946612. This factor actually marks it as a positive if the domain isn’t Russian. This seems odd, but it certainly shows rankings are a confusing confluence of factors.
Google has been at pains before to state that it does not allow political sentiment to unduly sway its results but the Yandex revelations are an important reminder of the power search companies hold, acting as they do as vital highways towards information, trusted blindly by many users to be providing them the most relevant results.
The point here isn’t to get bogged down on the politics of any specific matter but rather to remember the sway these engines have over our lives and how interactions with them should be approached accordingly.
We’re seeing how AI programs embraced not only by Google but also by Bing can be incorrect, manipulated, and even actively turned towards harmful or misleading content. We must bear this in mind, whether it comes from accidental or intentional behavior from those behind the engines.
Should The Yandex Leak Affect Your SEO Strategy?
Updates in the search world should never mean you make knee-jerk reactions and this is no different.
Even if Yandex is smaller than Google the insight offered here is still useful to take forwards, particularly seeing how ranking factors influence each other and act dynamically, not statically.
Yandex’s ranking factors alone are unlikely to indicate a grave error in your plan to optimise for Google, but they certainly are food for thought. The only reason this should cause you to adjust your strategy is if these leaks have suddenly made you realise you’ve made a specific mistake in your project.
Instead use this as an opportunity to learn what search engines are looking for. If you’re concerned about how to navigate SEO while avoiding negative repercussions you can look to outsource your SEO deliverables to experts for peace of mind.