The Java Virtual Machine (JVM) is a seasoned and very popular system for running applications, especially in the cloud and also on Android devices (technically, Android devices run a clean-room reimplementation of the JVM).
The JVM executes an abstract stack-based machine code (byte code) the same way on all platforms. Thus, one can author and compile Java and JVM Compatible language programs on a Windows machine running an ARM processor and run that code, unmodified, on a Linux machine with an x86 processor. Java’s ongoing claim to fame is “Write Once, Run Anywhere”.
JVM byte-code and general operations are very well defined and specified.
For the purposes of this discussion, we will focus on how JVM programs are distributed and how JVM programs use open source libraries.
Each Java and JVM language source code file is compiled into a series of JVM “Class” files. There is one Class file for each JVM object-oriented class.
Even the most trivial JVM program will have dozens of classes and typical business application will have thousands of classes.
In order to make distribution of the collection of class files that make up a Java program easier, classes are typically packaged together in a “JAR” (Java ARchive) file. A JAR file is simply a Zip file with particular extra files in it.
Most JVM programs also use open source libraries to perform functions in the program using common code and common patterns.
For example, most programs write status information in text format. For example, “Started processing financial transaction XYZ for customer PDQ.” The writing of status information is termed “logging” and there are many libraries, including Apache’s log4j library.
In order to manage libraries, the Apache Maven project was created. Maven allows each artifact, wether that be a runnable program or a library, to define the libraries that the the artifact depends on. This is called a “dependency.” It’s also possible for one artifact’s dependency to have other dependencies. Dependencies that are defined by direct dependencies are called “transitive dependencies.” Maven resolves all the dependencies and fetches the artifacts from corporate artifact repositories as well as public open source artifact repositories including Maven Central.
For example, if your program relies on “Log version 1.2” and that
library relies on “Common version 2.7” Maven (and other Maven compatible
build/package systems) will fetch the appropriate artifacts and make
those artifacts available to the compiler and packaging system. The
compiler translates human readable source code into binary code that the
machine (e.g, the JVM) can execute efficiently. The compiler can also
identify certain errors in code (e.g., treating a number as a string of
characters). The packaging system takes artifacts from the compiler
(e.g., .class
files) and bundles or packages them together
(e.g., into a JAR file).
Software does not always run the way the programmer intended it to run. Failure to run as intended is called a “bug”.
Some bugs could allow an actor to perform unwanted activities on the machine where the software is running. These bugs are called “vulnerabilities”.
NIST and other organizations have created a common way of describing and communicating information about vulnerabilities: “Common Vulnerabilities and Exposures” or “CVEs”.
Security researchers will identify vulnerabilities in software and assign a CVE number to the vulnerabilities. There are databases that allow anyone to learn about the vulnerabilities in software.
Vulnerabilities that allow an adversary to take control of a machine running software are called “Remote Code Execution” or RCE vulnerabilities and they are typically graded as “Critical.” Thus, anyone using libraries that contain a CVE that describes an RCE should immediately upgrade to a version of the library that does not contain the critical vulnerability.
Because package managers, e.g., Maven, know the specific version of all the dependencies and transitive dependencies, it’s easy to determine if a particular library has a Critical or High CVE.
In the above diagram, the Log Class 2
class has a
Critical vulnerability. A developer can update the library version in
Maven to “Log v1.2.1” which remediates the vulnerability:
Sometimes, for valid technical reasons, developers may choose not to use Maven or another build tool to define some of a program’s dependencies. Instead the developer will copy the source code from a particular library into their project. This will result in code being part of an executable artifact, but hidden from common tool like Dep-Scan or Snyk as well as other security vulnerability scanning and management tools (e.g., Checkmarx, GitHub, Veracode, etc.)
For example, if the vulnerable version of “Log V1.2” was compiled into an application, the JAR would look like:
And this would mean the commonly used security tools would miss the dependency… it would be a hidden vulnerability… a Hidden Reaper.
Rather than relying on the build tool to generate a list of dependent packages, building a set of connections among every part of every artifact allows for the use of math to irrefutably demonstrate that many artifacts share common files and components.
We can calculate when an artifact shares part of its dependency graph with another artifact.
This is similar to DNA tests for pets. The breeder may have told you you have a labradoodle, but somehow there’s a lot of chihuahua in your dog.
What a breeder tells you is in your dog is what the build tool tells you. What a DNA test tells you is what the Artifact Dependency Graph tells you. And sometimes a DNA test allows you to uncover that your pet may be genetically predisposed to a health issue or risk.
Spice Labs has generated an Artifact Dependency Graph, the SaLAD, that has DNA markers for more than 2B items and the relationship among those items.
With the SaLAD, Spice Labs can identify hidden dependencies… Hidden Reapers… in open source and proprietary code.
This allows Spice Labs to use an open source tool to build an Artifact Dependency Graph of proprietary artifacts and then compare the DNA of the corporate artifacts against a "Grim List" of markers from JVM artifacts (packages + version) with critical CVEs.
Thus, Spice Labs can find the Hidden Reapers in a company’s proprietary code… Vulnerabilities that leading security tools (Checkmarx, GitHub, Snyk, Veracode, and many others) cannot find.
Spice Labs’ SaLAD contains the gitoids for artifacts in many different open source ecosystems including the JVM/Java ecosystem.
An artifact is an array of bytes. An artifact may be a Java
.class
file, it may be an image, it may be a Java JAR
file, it may be a .deb
package, etc.
Spice Labs has created a graph of more than 2B vertices describing the relationship among many open source artifacts.
The most prevelant mechanism for for building and packaging Java (and Scala/Kotlin/etc.) programs involed using Maven or a compatible tool (e.g., sbt, Gradle).
These tools allow engineers to define the packages their programs depend on. These packages are fetched from an artifact repository (e.g., Maven Central) and made available as part of the build and packaging process.
When an engineer declares dependencies in the build tool, those dependencies are listed and can be used to check if a particular artifact depends on any other artifacts with known vulnerabilities.
However, sometimes engineers copy/paste source code from projects into their project. When projects with copied/pasted code is built, the dependencies are hidden.
When these hidden dependencies have critical vulnerabilities, we call this “Hidden Reaper”.
“Markers” are the artifacts unique to a particular artifact. What?
Let’s say we have a package, Foo that has three versions, 1.0, 1.1, and 1.2. Package Foo contains three artifacts: Foo:1.0, Foo:1.1, Foo:1.2.
The artifacts are made up of artifacts… class files, license files, etc. all rolled together in a JAR file (we’re in JVM-land).
The three artifacts will share many artifacts: class files derived from source files that are stable across all three versions, etc.
However, there will be artifacts (e.g., .class
files)
that are unique to each version of Foo. These are “markers.”
If these “marker” artifacts are contained by other artifacts, those containing artifacts will very likely contain a particular version of Foo, even if that version of Foo was not declared as a dependency.
Spice Labs has taken GitHub’s Advisory Database and found all Java/JVM packages that have critical vulnerabilities.
For each artifact (package + version), the vulnerabilities for that artifact have been pulled from OSV.
Artifacts with critical vulnerabilities are Reapers.
For each artifact in each package with critical vulnerabilities, Spice Labs has computed the “markers” for that artifact.
Spice Labs then uses these markers to locate other artifacts that Hide the Reapers.
Goat Rodeo is published as a Docker container that you can use to unmask Hidden Reapers in your code.
Create a directory (e.g., test_data
) and copy the
artifacts you want to test for Hidden Reapers (JAR files, tar files, .deb packages, Docker image layers (in tar
format),
etc) into that directory. Then run:
docker run --platform linux/amd64 --quiet --rm --network none -v $(pwd)/test_data:/jars_to_test:ro spicelabs/goatrodeo:latest --unmask --out stderr 2> hidden_reapers.json
The above command:
docker
--platform linux/amd64
uses the "amd64" container which ensures compatibility with Apple Silicon--quiet
Ensures that Docker will not emit status messages to stderr
--rm
Removes the container at the end of execution--network none
Ensures there is no network access during the container's execution. For security oriented folks
this is a double-check (beyond being able to inspect the Goat Rodeo source) that running the container will not contact any external systems.-v $(pwd)/test_data:/jars_to_test:ro
Mounts, as "read only" the directory where Goat Rodeo will inspect artifacts (files)spicelabs/goatrodeo:latest
The Docker image to run--unmask --out stderr
The Goat Rodeo commands to run the "unmasking" operation and output to stderr
2> hidden_reapers.json
Send the output of stderr
to the hidden_reapers.json
file
The above command uses Docker to download and run the
Apache 2.0 licensed
open source Goat Rodeo container to build
inherent identifiers for each file
in the tested file. For example, a tar file
contains other files. Some of those files (e.g., a JAR)
may contain other files. Goat Rodeo recursively opens all the containers files
and builds inherent identifiers for every file encountered. Goat Rodeo then compares the inherent identifiers with the
"Grim List" of markers from packages with known Critical vulnerabilities. If any markers are encountered,
the information will be output to the hidden_reapers.json
file.
The contents of the hidden_reapers.json
file will look something like:
"data":{
"/jars_to_test/core-0.7.0.RELEASE.jar":{
"mainArtifactGitOID":"gitoid:blob:sha256:e12a8c51d6ae770b34dc87851745b2f743552144284d5f8f2fd46a3803085b37",
"foundMarkers":{
"gitoid:blob:sha256:357c06bfdbfcc2b90130494387dcc349af3951f859239d806cf361e7a86a07ef":{
"containingArtifact":"gitoid:blob:sha256:dbe7dd1bcb617833ca36ddd76c40f855d41d2358f0cd23bc92ed89e4a1d4b794",
"markerArtifact":"gitoid:blob:sha256:357c06bfdbfcc2b90130494387dcc349af3951f859239d806cf361e7a86a07ef",
"overlapWithContaining":1.0,
"path":[{
"file":"/jars_to_test/core-0.7.0.RELEASE.jar",
"gitoid":"gitoid:blob:sha256:e12a8c51d6ae770b34dc87851745b2f743552144284d5f8f2fd46a3803085b37"
},{
"file":"org/jerkar/api/depmanagement/ivy-2.4.0.jar",
"gitoid":"gitoid:blob:sha256:dbe7dd1bcb617833ca36ddd76c40f855d41d2358f0cd23bc92ed89e4a1d4b794"
},{
"file":"org/apache/ivy/osgi/repo/AggregatedRepoDescriptor.class",
"gitoid":"gitoid:blob:sha256:357c06bfdbfcc2b90130494387dcc349af3951f859239d806cf361e7a86a07ef"
}]
},
The meaning of the fields is as follows:
data
The field that contains the items that match contents of the "Grim List"
/jars_to_test/core-0.7.0.RELEASE.jar
The name of the file on the filesystem that has contents that
match the "Grim List."
mainArtifactGitOID
The GitOID SHA256 of the file.
foundMarkers
The markers that have been found that match the "Grim List". The
keys in this object are the gitoids of the marker that were found.
containingArtifact
The artifact that contains the marker.
markerArtifact
The artifact that was found in the tested file and also in the "Grim List"
overlapWithContaining
Markers are the inherent identifiers (gitoids) that are unique to a particular
version of an artifact (packege + version) that has a Critical CVE. For various reasons, sometimes not all markers
are copied into another artifact (e.g., an Uber JAR
tool may trim classes that are not accessed during execution). The overlap is the number of markers identified
in the tested artifact divided by the number of markers
belonging to the Grim artifact. This can assist in determining the confidence that the found file is an
indication of a Hidden Reaper.
path
The path and associate inherent identifier (gitoid) to the found marker. In the above example, the "core...jar" file
contains the "ivy-2.4.0.jar" file which contains the "AggregatedRepoDescriptor.class" file, which is a marker file
for the vulnerable Ivy 2.4.0 file.
It's possible for you to double-check the results by Verifying Hidden Reapers. Spice Labs is currently working on an online tool to convert the output of the "Unmasking" tool into addition information that identifies the specific vulnerable packages and, if available, lists a version of the package that no longer has the vulnerabilities.