Nearly 45GB of source code files, allegedly stolen by a former employee, have revealed the underpinnings of Russian tech giant Yandex’s many apps and services. He also revealed key ranking factors for the Yandex search engine, the kind that are almost never revealed in public.
“Yandex git sources” was released as a torrent file on January 25 and shows files apparently taken in July 2022 and dating back to February 2022. Software engineer Arseniy Shestakov claims he checked with current and former Yandex employees that some files “certainly contain modern source code for the company’s services.” Yandex told security blog BleepingComputer that “Yandex was not hacked” and that the leak came from a former employee. Yandex stated that it did not “see any threat to user data or platform performance.”
The files date notably to February 2022, when Russia began a full-scale invasion of Ukraine. A former Yandex executive told BleepingComputer that the leak was “political,” noting that the former employee had not tried to sell the code to Yandex’s competitors. The antispam code was also not leaked.
While it’s unclear whether Yandex’s source code disclosure has structural or security implications, the leak of 1,922 ranking factors in Yandex’s search algorithm is certainly causing a sensation. SEO Consultant Martin MacDonald described the hack on Twitter as “probably the most interesting thing to happen in SEO in years” (as noted by Search Engine Land). In a thread detailing some of the more notable factors, researcher Alex Buraks suggests that “there is also a lot of useful information for Google SEO”.
Yandex, the fourth-ranked search engine by volume, reportedly employs several former Google employees. Yandex tracks many of Google’s ranking factors, identifiable in its code, and competes heavily with Google. Google’s Russian division recently filed for bankruptcy after losing its bank accounts and payment services. Buraks notes that the first factor on Yandex’s list of ranking factors is “PAGE_RANK,” which is apparently tied to the fundamental algorithm created by Google’s co-founders.
As detailed by Buraks (in two threads), the Yandex engine favors pages that:
- they are not too old
- Have a lot of organic traffic (unique visitors) and less search-based traffic
- Have fewer numbers and slashes in your URL
- Have code optimized instead of “hard pessimization”, with a “PR=0”
- They are hosted on trusted servers
- It happens that they are Wikipedia pages or are linked from Wikipedia
- Are hosted or linked from top level pages on a domain
- Have keywords in your URL (up to three)
You can search and click on all the factors in Rob Ousbey’s compiled search tool. You may notice that almost 1000 of the ranking factors are labeled “TG_DEPRECATED” and more than 200 are listed as “TG_UNUSED”. Because the code is from February 2022 and was obtained in July 2022, Yandex search has certainly changed since then. But the leak provides an unusual look at how search rankings are built on a site that serves one of the world’s largest countries.
Yandex saw its search engine code walk out the door in 2015, when a former employee tried to sell it on the black market for $28,000 to fund his own startup. The surprisingly low figure for Yandex’s core product core code suggested that it was unaware of its real value. That employee was sentenced to a two-year suspended prison sentence, and the code was never seen publicly.
#Massive #Yandex #Code #Leak #Reveals #Russian #Search #Engine #Ranking #Factors