Getty Images

GitHub is facing challenges in controlling an ongoing onslaught that’s inundating the site with millions of code repositories. These repositories contain obscured malware that pilfers passwords and cryptocurrency from developer devices, researchers disclosed.

The deceitful repositories are duplicates of legitimate ones, rendering them difficult to differentiate to the untrained eye. An unidentified party has mechanized a procedure that branches out legitimate repositories, implying the source code is duplicated so developers can employ it in an autonomous project that advances the original one. The outcome is millions of branches with names identical to the original one that append a payload that’s enclosed under seven layers of obfuscation. To exacerbate the situation, some individuals, oblivious to the malevolence of these mimickers, are branching the branches, thereby contributing to the deluge.


“The majority of the branched repos are swiftly eradicated by GitHub, which detects the automation,” Matan Giladi and Gil David, researchers at security firm Apiiro, remarked in a post published on Wednesday. “Nevertheless, the automation detection appears to overlook numerous repos, and those that were uploaded manually endure. Because the entire attack chain appears to be mainly automated on a large scale, the 1% that persist still amount to thousands of deceitful repos.”

Owing to the constant rotation of new repos being uploaded and GitHub’s eradication, it’s challenging to precisely gauge the quantity of each type. The researchers stated that the number of repos uploaded or branched before GitHub eliminates them likely reaches into the millions. They mentioned the attack “impacts over 100,000 GitHub repositories.”

GitHub officials did not contest Apiiro’s estimates and refrained from responding to other queries sent via email. Instead, they issued the subsequent statement:

GitHub boasts over 100M developers constructing across more than 420M repositories and is devoted to supplying a secure and reliable platform for developers. We have teams specifically dedicated to identifying, examining, and eliminating content and accounts that breach our Acceptable Use Policies. We utilize manual assessments and large-scale detections that leverage machine learning and continually evolve and adjust to adversarial strategies. Furthermore, we urge customers and community members to report instances of abuse and spam.

Supply-chain attacks targeting users of developer platforms have been present since at least 2016 when a university student uploaded customized scripts to RubyGems, PyPi, and NPM. The scripts bore names akin to widely-used legitimate packages but were otherwise unconnected to them. A phone-home feature in the student’s scripts revealed that the imposter code executed over 45,000 times on more than 17,000 distinct domains, and more than half the time, the code was granted extensive administrative privileges. Two of the impacted domains had .mil extensions, indicating that individuals within the US military executed the script. This form of supply-chain attack is commonly known as typosquatting, as it hinges on users making minor errors when selecting the package name they wish to utilize.

In 2021, a researcher utilized a similar method to successfully run counterfeit code on networks belonging to Apple, Microsoft, Tesla, and numerous other companies. This method—referred to as a dependency confusion or namespace confusion attack—commenced by depositing malicious code packages in an official public repository and affixing the same name as dependency packages utilized by Apple and the other targeted companies in their products. Automated scripts within the package managers used by the companies then autonomously retrieved and installed the sham dependency code.

The technique observed by Apiiro is identified as repo confusion.

“Analogous to dependency confusion attacks, malicious actors entice their target to retrieve their malevolent version instead of the authentic one,” elucidated Wednesday’s post. “However, dependency confusion attacks capitalize on the functioning of package managers, whereas repo confusion attacks depend merely on humans mistakenly selecting the pernicious version over the genuine one, sometimes employing social engineering tactics as well.”