Boosting software security with a binary approach
The discovery of a leaked access token earlier this year, that could have opened the door to malicious code being injected into one of the world’s most widespread programming languages, has shone a light on why the method we use to search for security issues in software matters.
The access token, discovered by the JFrog Security Research team in July, had administrator access to Python’s, PyPI’s and Python Software Foundation’s software repositories in software developer platform GitHub. It was leaked in a public Docker container hosted on Docker Hub.
The research team, which scans public software repositories for malicious packages and leaked secrets as a service to the coding community, reported it immediately. PyPI’s security team revoked the token in just 17 minutes. Fortunately, no malicious usage of the token was detected. But the incident showcased, once again, just how easily vulnerabilities can make their way into the code of widely-used software. These things have the potential to lead to extensive software supply chain security issues that cybercriminals can exploit for financial gain.
Where’s the token?
The token discovery also served as a timely reminder that scanning for secrets in source code, even text files, isn’t always enough to spot security issues. Often, you need to scan the binary code associated with the software to fully unpack potential security risks and vulnerabilities.
You see, the authentication token in question wasn’t found in the source code file. Instead, it was found inside a Docker container, in a compiled Python file with the binary of the code — that is, the machine-readable version of the software code, made up of zeros and ones.
What seems to have happened is that the original software author behind the access token briefly added it to their source code. That Python script-based source code was then compiled into a binary file with the token still embedded. The author then removed the token from the source code but didn’t clean it from the binary.
That meant that when it came time to push everything to the Docker image, the source code was clean — and so did not contain the token — but the binary file was unclean, meaning it still had the authorisation token embedded in it.
Scanning for secrets in zeros and ones
Clearly, the best way to have avoided such a potentially dangerous authorisation token ending up in a publicly-available binary file would have been to audit both the source code as well as the binary data inside the published Docker image. This is why binary scanning is so important.
Even though searching for leaked secrets in binary files is more difficult than in text-based files, the critical data may only reside in the binary data. But when it comes to scanning software for security issues as it is being developed and deployed, not everybody does it with binaries. Part of the reason for this is that not all of the software development platforms being used in modern software development operations processes scan code at the binary level, sticking to the text-based source code instead. While an essential step, this is useful only up to a certain point.
Businesses using the software that arrives at the end of the development process typically don’t use the source code underpinning that software. Rather, they run applications which are usually a composite of many binaries from multiple sources. If those binaries have security issues, they can impact the businesses using the software.
A binary approach to software supply chain security
This is one of the key reasons why businesses should integrate binary scanning into a broader software supply chain approach. In fact, the research team was able to spot the troublesome access token that put the Python language in jeopardy because it was using binary scans to search for leaked secrets in both text files and binary files.
It’s also important to scan binaries in both upstream environments, such as a developer’s integrated development environment, and downstream, in places like Docker containers. A platform that does this spreads the opportunities to spot problematic software artefacts — such as the access token in question — across more points along the development lifecycle.
Additionally, integrating the binary scanning process into widely-used software development platforms such as GitHub extends the visibility even further, to catch security issues in more places again and minimise potential risks.
When it comes to securing the software supply in its entirety, making sure binaries are part of your scanning regime — from one end of the software development process to the other – can make a world of difference. And maybe, just maybe, it can put a stop to a potentially global software threat before it becomes a major problem.
Not all cyber risk is created equal
The key to mitigating cyber exposure lies in preventing breaches before they happen.
How AI can help businesses manage their cyber risks
Artificial intelligence can be a powerful ally in the fight against cyberthreats.
Safeguarding against security risks in AI agents
The chain of events and interactions initiated by AI agents can be vast and complex, often...