Code Monkey home page Code Monkey logo

log4j-sniffer's Introduction

Autorelease

log4j-sniffer

log4j-sniffer crawls for all instances of log4j on disk within a specified directory. It can be used to determine whether there are any vulnerable instances of log4j within a directory tree and report or delete them depending on the mode used.

Scanning for versions affected by CVE-2021-44228, CVE-2021-45046, CVE-2021-45105 and CVE-2021-44832 is currently supported.

What this does

log4j-sniffer will scan a filesystem looking for all files of the following types based upon suffix:

  • Zips: .zip
  • Java archives: .jar, .war, .ear
  • Tar: .tar, .tar.gz, .tgz, .tar.bz2, .tbz2

Archives containing other archives will be recursively inspected up to a configurable maximum depth. See the log4j-sniffer crawl --help output for options on nested archive inspection.

It will look for the following:

  • Jar files matching log4j-core-<version>.jar, including those nested within another archive
  • Class files named org.apache.logging.log4j.core.net.JndiManager within Jar files or other archives and check against md5 hashes of known versions
  • Class files named JndiManager in other package hierarchies and check against md5 hashes of known versions
  • Matching of the bytecode of classes named JndiManager against known classes (see below for more details)
  • Matching of bytecode within obfuscated or shaded jars for partial matches against known classes (see below)

Installing

If Go is available on the host system, the following command can be used to install this program:

go install github.com/palantir/log4j-sniffer@latest

This repository also publishes binaries that can be downloaded and executed.

Downloads

log4j-sniffer executables compiled for linux-amd64, darwin-amd64, darwin-arm64 and windows-amd64 architectures are available on the releases page.

Running

This tool is intensive and is recommended to be run with low priority settings.

On Linux:

ionice -c 3 nice -n 19 log4j-sniffer crawl /path/to/a/directory

Output for vulnerable files looks as follows:

[INFO] Found archive with name matching vulnerable log4j-core format at examples/single_bad_version/log4j-core-2.14.1.jar
[INFO] Found JndiManager class that was an exact md5 match for a known version at org/apache/logging/log4j/core/net/JndiManager.class
[INFO] Found JndiLookup class in the log4j package at org/apache/logging/log4j/core/lookup/JndiLookup.class
[MATCH] CVE-2021-44228, CVE-2021-45046, CVE-2021-45105, CVE-2021-44832 detected in file examples/single_bad_version/log4j-core-2.14.1.jar. log4j versions: 2.14.0-2.14.1, 2.14.1. Reasons: JndiLookup class and package name matched, jar name matched, JndiManager class and package name matched, class file MD5 matched
Files affected by CVE-2021-44228 or CVE-2021-45046 or CVE-2021-45105 or CVE-2021-44832 detected: 1 file(s)
1 total files scanned, skipped identifying 0 files due to config, skipped 0 paths due to permission denied errors, encountered 0 errors processing paths

Getting started

Mac

Download the latest version
  1. Locate releases.
  2. You will need a different asset depending on the generation of your Mac.
    1. Select the asset with “macos-amd” in the file name for older Intel Macs.
    2. Select “macos-arm” for newer m1 Macs
  3. Confirm that the file is downloading to your “Downloads” folder.
Install log4j-sniffer on your machine
  1. Once the download is complete, click on the file to open.
  2. Drag and drop the “log4j-sniffer” icon into your Downloads through the Finder.
    1. Open a Finder window by searching for “Finder” using the magnifying glass on the top right of your screen, or selecting the icon in your Dock.
    2. Drag and drop the “log4j-sniffer” icon into Downloads. dragging and dropping
Use log4j-sniffer
  1. Open the Terminal by searching for “Terminal” using the magnifying glass in the top right corner of the screen.
  2. Run ~/Downloads/log4j-sniffer crawl / --ignore-dir="^/dev" to crawl the entire system
    1. Run ~/Downloads/log4j-sniffer crawl /PATH/TO/YOUR/FOLDER to crawl specific folders
    2. If your computer is unable to locate log4j-sniffer, you may have to make it executable before using it. In your terminal, run the following:
      1. chmod +x ~/Downloads/log4j-sniffer
      2. ./log4j-sniffer crawl /PATH/TO/YOUR/FOLDER

Windows

Download the latest version
  1. Locate [releases] (https://github.com/palantir/log4j-sniffer/releases).
  2. Select the Windows asset.
  3. Confirm that the file is downloading to your “Downloads” folder.
Use log4j-sniffer
  1. Type "Command Prompt" into the search bar at the bottom and in the right pane click "Run as administrator".
  2. Navigate to your Downloads folder, e.g. cd C:\Users\yourname\Downloads
  3. Run log4j-sniffer-1.1.0-windows-amd64.exe crawl C:\ to crawl the entire system, substituting the drive of your choice, e.g. C:\, D:\
    1. Run log4j-sniffer-1.1.0-windows-amd64.exe crawl C:\PATH\TO\YOUR\FOLDER to crawl specific folders

Primary Usage

The primary command for the tool is crawl which takes an argument that is the path to the directory to be crawled. Thus, the standard usage is:

log4j-sniffer crawl [pathToDirectory]

The standard mode prints output in a human-readable format and prints a summary that states the number of vulnerabilities found after running.

Specifying the --json flag makes it such that the output of the program is all in JSON: each line of output is JSON that describes the vulnerability that is found, and if summary mode is enabled then the final summary is also output as a line of JSON.

Here is an example of the output with --json:

{"message":"CVE-2021-44228, CVE-2021-44832, CVE-2021-45046, CVE-2021-45105 detected","filePath":"examples/inside_a_dist/wrapped_log4j.tar.gz","detailedPath":"examples/inside_a_dist/wrapped_log4j.tar.gz!log4j-core-2.14.1.jar","cvesDetected":["CVE-2021-44228","CVE-2021-44832","CVE-2021-45046","CVE-2021-45105"],"findings":["jndiLookupClassPackageAndName","jarNameInsideArchive","jndiManagerClassPackageAndName","classFileMd5"],"log4jVersions":["2.14.0-2.14.1","2.14.1"]}
{"filesScanned":1,"permissionDeniedErrors":0,"pathErrors":0,"pathsSkipped":0,"numImpactedFiles":1}

The JSON fields have the following meaning:

  • message: information about the output
  • filePath: the path to the file on disk where the vulnerability was detected
  • detailedPath: the path to the vulnerable contents, which may be an archived file multiple levels nested. Nesting levels are separated by a '!'
  • cvesDetected: CVEs matched against the version found
  • log4jVersions: the versions detected at this location based on all applied detections, note that some detections are more accurate than others and so a range of versions might be reported

The findings array reports the following possible values:

  • jndiLookupClassName: there was a .class file called JndiLookup
  • jndiLookupClassPackageAndName: there was a .class file called JndiLookup with a package of org.apache.logging.log4j.core.lookup
  • jndiManagerClassName: there was a .class file called JndiManager
  • jndiManagerClassPackageAndName: there was a .class file called JndiManager with a package of org.apache.logging.log4j.core.net
  • jarNameMatched: the file scanned was a .jar file called log4j-core-<version>.jar
  • jarNameInsideArchiveMatched: there was a .jar file called log4j-core-<version>.jar inside the archive
  • classFileMd5Matched: there was a class file called JndiManager that matched the md5 hash of a known version
  • classBytecodeInstructionMd5: the bytecode of a class file called JndiManager exactly matched a known version, see the Bytecode matching section for more details
  • jarFileObfuscated: the jar the match was found in appeared to be obfuscated
  • classBytecodePartialMatch: the bytecode of a class file called JndiManager, or a class within an obfuscated jar, partially matched the bytecode of a known version, see the Bytecode partial matching section for more details

The summary outlines:

  • filesScanned: the total number of files crawled
  • permissionDeniedErrors: the number of directories or files that could not be read due to permissions
  • pathErrors: the number of paths where an unexpected error occurred while trying to identify bad log4j versions
  • pathsSkipped: the numbers of paths skipped from full identification of bad log4j versions due to the config options set
  • numImpactedFiles: the total number of files impacted
  • findings: the total number of findings previously output. For file path only mode, this will equal the number of impacted files.

Specifying --summary=false makes it such that the program does not output a summary line at the end. In this case, the program will only print output if vulnerabilities are found.

Detections quick match table

The following tables shows when each finding is reported based on our testing:

Unmodified log4j-core-2.14.1.jar JndiLookup removed log4j-core-2.14.1.jar Inside a fat jar (no renaming)
jndiLookupClassName
jndiLookupClassPackageAndName
jndiManagerClassName
jndiManagerClassPackageAndName
jarNameMatched
jarNameInsideArchiveMatched
classFileMd5Matched
bytecodeInstructionMd5Matched
jarFileObfuscated
classBytecodePartialMatch
Detected
Shaded (packages renamed only) Shaded (all renamed) Shaded/obfuscated (bytecode optimised)
jndiLookupClassName
jndiLookupClassPackageAndName
jndiManagerClassName
jndiManagerClassPackageAndName
jarNameMatched
jarNameInsideArchiveMatched
classFileMd5Matched
bytecodeInstructionMd5Matched
jarFileObfuscated
classBytecodePartialMatch
Detected
log4j-core-2.14.1.jar inside a .tgz file Heavily obfuscated (additional bytecode instructions inserted)
jndiLookupClassName
jndiLookupClassPackageAndName
jndiManagerClassName
jndiManagerClassPackageAndName
jarNameMatched
jarNameInsideArchiveMatched
classFileMd5Matched
bytecodeInstructionMd5Matched
jarFileObfuscated
classBytecodePartialMatch
Detected

Some of these detections may require non-default settings depending on the file scanned. Some detections may not identify all cases, common shading and obfuscation has been tested but it is not possible to cover every possible change that could be made by such a tool.

Bytecode matching

If a class is shaded (for example, to build a fat jar), then the bytecode is rewritten to update the package. This means the hash of the class will no longer match against known versions, nor will the class appear where expected within a jar.

To account for this, we perform a less accurate hash of a class file: we only hash the fixed parts of the bytecode defining each method, ignoring all parts that might vary upon shading. We take an md5 hash of the resulting bytecode and compare against known versions.

Testing against shaded jars shows this matches when the package version has been changed but the class otherwise left intact. Shading which further modifies classes, such as by removing methods, will not be found with this approach.

Bytecode partial matching

If the event of more aggressive shading, which deletes unused methods, or obfuscated being applied in order to compress the Jar size bytecode matching will not produce matches. The class of interest may also no longer be called JndiManager or appear under a package hierarchy that indicates the presence of log4j.

For these cases we have implemented partial bytecode matching based on signatures generated from obfuscated versions of the log4j jar compared against the version as shipped.

By default we apply this detection to classes called JndiManager which have not been matched by md5 or full bytecode matching, or to all classes in Jar files with an average package and class name length both under 3.

A match here is likely but not guaranteed to be log4j within the reported version range. Obfuscation aimed to preventing reverse engineering rather than simply compressing the Jar size may also cause false negatives.

Testing against know shaded jars in open source products, such as the Hive jdbc driver, has shown this approach to produce matches.

Crawl command

Crawl filesystem to scan for jars vulnerable to CVE-2021-45046.
Root must be provided and can be a single file or directory.
If a directory is provided, it is traversed and all files are scanned.
Use the ignore-dir flag to provide directories of which to ignore all nested files.

Usage:
  log4j-sniffer crawl <root> [flags]

Flags:
      --archive-open-mode string                             Supported values:
                                                               standard - standard file opening will be used. This may cause the filesystem cache to be populated with reads from the archive opens.
                                                               directio - direct I/O will be used when opening archives that require sequential reading of their content without being able to skip to file tables at known locations within the file.
                                                                          For example, "directio" can have an effect on the way that tar-based archives are read but will have no effect on zip-based archives.
                                                                          Using "directio" will cause the filesystem cache to be skipped where possible. "directio" is not supported on tmpfs filesystems and will cause tmpfs archive files to report an error. (default "standard")
      --archives-per-second-rate-limit int                   The maximum number of archives to scan per second. 0 for unlimited.
      --directories-per-second-rate-limit int                The maximum number of directories to crawl per second. 0 for unlimited.
      --disable-cve-2021-44832-detection                     Disable detection of CVE-2021-44832 in versions up to 2.17.0
      --disable-cve-2021-45105-detection                     Disable detection of CVE-2021-45105 in versions up to 2.16.0
      --disable-detailed-findings                            Do not print out detailed finding information when not outputting in JSON.
      --disable-flagging-jndi-lookup                         Do not report results that only match on the presence of a JndiLookup class.
                                                             Even when disabled results which match other criteria will still report the presence of JndiLookup if relevant.
      --disable-unknown-versions                             Only output issues if the version of log4j can be determined (note that this will cause certain detection mechanisms to be skipped)
      --enable-obfuscation-detection                         Enable applying partial bytecode matching to Jars that appear to be obfuscated. (default true)
      --enable-partial-matching-on-all-classes               Enable partial bytecode matching to all class files found.
      --enable-trace-logging                                 Enables trace logging whilst crawling. disable-detailed-findings must be set to false (the default value) for this flag to have an effect.
      --file-path-only                                       If true, output will consist of only paths to the files in which CVEs are detected
  -h, --help                                                 help for crawl
      --ignore-dir strings                                   Specify directory pattern to ignore. Use multiple times to supply multiple patterns.
                                                             Patterns should be relative to the provided root.
                                                             e.g. ignore "^/proc" to ignore "/proc" when using a crawl root of "/"
      --json                                                 If true, output will be in JSON format
      --maximum-average-obfuscated-class-name-length int     The maximum class name length for a class to be considered obfuscated. (default 3)
      --maximum-average-obfuscated-package-name-length int   The maximum average package name length a class to be considered obfuscated. (default 3)
      --nested-archive-disk-swap-dir string                  When nested-archive-disk-swap-max-size is non-zero, this is the directory in which temporary files will be created for writing temporary large nested archives to disk. (default "/tmp")
      --nested-archive-disk-swap-max-size uint               The maximum size in bytes of disk space allowed to use for inspecting nest archives that are over the nested-archive-max-size.
                                                             By default no disk swap is to be allowed, nested archives will only be inspected if they fit into the configured nested-archive-max-size.
                                                             When an archive is encountered that is over the nested-archive-max-size, an the archive may be written out to a temporary file so that it can be inspected without a large memory penalty.
                                                             If large archives are nested within each other, an archive will be opened only if the accumulated space used for archives on disk would not exceed the configured nested-archive-disk-swap-max-size.
      --nested-archive-max-depth uint                        The maximum depth to recurse into nested archives.
                                                             A max depth of 0 will open up an archive on the filesystem but not any nested archives.
      --nested-archive-max-size uint                         The maximum compressed size in bytes of any nested archive that will be unarchived for inspection.
                                                             This limit is made a per-depth level.
                                                             The overall limit to nested archive size unarchived should be controlled
                                                             by both the nested-archive-max-size and nested-archive-max-depth. (default 5242880)
      --per-archive-timeout duration                         If this duration is exceeded when inspecting an archive,
                                                             an error will be logged and the crawler will move onto the next file. (default 15m0s)
      --summary                                              If true, outputs a summary of all operations once program completes (default true)

Archives

Archives can both contain many files listed and have other archives nested within them. If either is followed without limit then it would be possible to cause scans to take either an unacceptable amount of time or consume an acceptable amount of memory.

--per-archive-timeout allows for control of the maximum time a single archive will be examined. --nested-archive-max-size and --nested-archive-max-depth control how many nested archives are opened, and how large each one may be. The total memory required will be on the order to the product of these two values.

Rate limiting

While nice and ionice on Linux and other platform specific tooling can be used to limit the priority of processes it may be desired to further limit resource use. One example is when running on AWS instances with a burst balance for IO, which might be consumed by running this tool leading to a negative impact on other processes running on that instance.

--directories-per-second-rate-limit and --archives-per-second-rate-limit limit how quickly directories will be traversed and how quickly archives will be processed.

Obfuscation detection

For partial bytecode matching we apply a heuristic for whether a Jar is obfuscated. This is because the partial matching is expensive to apply and as such running it on every matching Jar on a system may not be feasible.

The heuristic used is that both the average package name length and class name length need to be below a certain value for a Jar to be considered obfuscated. By default this is 3. You can tune these values using --maximum-average-obfuscated-class-name-length and --maximum-average-obfuscated-package-name-length.

If you wish to turn off obfuscation detection entirely then --enable-obfuscation-detection can be used. If instead you wish to apply partial matching to all Jars, regardless of whether they appear obfuscated, then you can use --enable-partial-matching-on-all-classes.

CVE-2021-45105 and CVE-2021-44832

If you do not wish to report results for CVE-2021-45105 or CVE-2021-44832 then pass the --disable-cve-2021-45105-detection or --disable-cve-2021-44832-detection flags to the crawl command.

By default, both CVE-2021-45046 and CVE-2021-45105 will be reported.

Examples

Running on Linux against the entire filesystem at a low priority:

ionice -c 3 nice -n 19 log4j-sniffer crawl / --ignore-dir "^/dev" --ignore-dir "^/proc"

Running against a specific Jar with all limits removed:

log4j-sniffer crawl jar-under-suspicion.jar --enable-partial-matching-on-all-classes --nested-archive-max-depth 255 --nested-archive-max-size 5242880000

Delete command

Most of the flags are shared with the crawl command and so it is recommended that delete be run using configuration that is known to not negatively impact performance of a host.

Delete files containing log4j vulnerabilities.

Crawl the file system from root, detecting files containing log4j-vulnerabilities and deleting them if they meet certain requirements determined by the command flags.
Root must be provided and can be a single file or directory.

Dry-run mode is enabled by default, where a line will be output to state where a file would be deleted when running not in dry run mode.
It is recommended to run using dry-run mode enabled, checking the logged output and then running with dry-run disabled using the same configuration flags.
Use --dry-run=false to turn off dry-run mode, enabling deletes.

When used on windows, deleting based on file ownership is unsupported and skip-owner-check should be used instead of filepath-owner.

Usage:
  log4j-sniffer delete <root> [flags]

Examples:
Delete all findings nested beneath /path/to/dir that are owned by foo and contain findings that match both classFileMd5 and jarFileObfuscated.

log4j-sniffer delete /path/to/dir --dry-run=false --filepath-owner ^/path/to/dir/.*:foo --finding-match classFileMd5 --finding-match jarFileObfuscated

Flags:
      --archive-open-mode string                             Supported values:
                                                               standard - standard file opening will be used. This may cause the filesystem cache to be populated with reads from the archive opens.
                                                               directio - direct I/O will be used when opening archives that require sequential reading of their content without being able to skip to file tables at known locations within the file.
                                                                          For example, "directio" can have an effect on the way that tar-based archives are read but will have no effect on zip-based archives.
                                                                          Using "directio" will cause the filesystem cache to be skipped where possible. "directio" is not supported on tmpfs filesystems and will cause tmpfs archive files to report an error. (default "standard")
      --archives-per-second-rate-limit int                   The maximum number of archives to scan per second. 0 for unlimited.
      --directories-per-second-rate-limit int                The maximum number of directories to crawl per second. 0 for unlimited.
      --disable-cve-2021-44832-detection                     Disable detection of CVE-2021-44832 in versions up to 2.17.0
      --disable-cve-2021-45105-detection                     Disable detection of CVE-2021-45105 in versions up to 2.16.0
      --dry-run                                              When true, a line with be output instead of deleting a file. Use --dry-run=false to enable deletion. (default true)
      --enable-obfuscation-detection                         Enable applying partial bytecode matching to Jars that appear to be obfuscated. (default true)
      --enable-partial-matching-on-all-classes               Enable partial bytecode matching to all class files found.
      --enable-trace-logging                                 Enables trace logging whilst crawling. disable-detailed-findings must be set to false (the default value) for this flag to have an effect.
      --filepath-owner strings                               Provide a filepath pattern and owner template that will be used to check whether a file should be deleted or not when it is deemed to be vulnerable.
                                                             Multiple values can be provided and values must be provided in the form filepath_pattern:owner_template, where a filepath pattern and owner template are colon separated.

                                                             When a file is deemed to be vulnerable, the path of the file containing the vulnerability will be matched against all filepath patterns.
                                                             For all filepath matches, the owner template will be expanded against the filepath pattern match to resolve to a file owner value that the actual file owner will then be compared against.
                                                             Owner templates may use template variables, e.g. $1, $2, $name, that correspond to capture groups in the filepath pattern. Please refer to the standard go regexp package documentation at https://pkg.go.dev/regexp#Regexp.Expand for more detailed expanding behaviour.

                                                             If no filepaths match, the file will not be deleted. If any filepaths match, all matching filepath patterns' corresponding expanded templated owner values must match against the actual file owner for the file to be deleted.

                                                             Architecture-specific behaviour:
                                                             - linux-amd64: libc-backed code is used, which is able to infer the owner of a file over a broad range of account setups.
                                                             - linux-arm64: only the user running or users from /etc/passwd will be available to infer file ownership.
                                                             - darwin-*: only the user running or users from /etc/passwd will be available to infer file ownership.
                                                             - windows-*: file ownership is unsupported and --skip-owner-check should be used instead.

                                                             Examples:
                                                             --filepath-owner ^/foo/bar/.+:qux would consider /foo/bar/baz for deletion only if it is owned by qux.
                                                             --filepath-owner ^/foo/bar/.+:qux and --filepath-owner ^/foo/bar/baz/.+:quuz would not consider /foo/bar/baz/corge for deletion if owned by either qux or quuz because both would need to match.
                                                             --filepath-owner ^/foo/(\w+)/.*:$1 would consider /foo/bar/baz for deletion only if it is owned by bar.

      --finding-match strings                                When supplied, any vulnerable finding must contain all values that are provided to finding-match for it to be considered for deletion.
                                                             These values are considered on a finding-by-finding basis, i.e. an archive containing two separate vulnerable jars will only be deleted if either of the contained jars matches all finding-match values.

                                                             Supported values are as follows, but can be provided case-insensitively:
                                                             - ClassBytecodeInstructionMd5
                                                             - ClassBytecodePartialMatch
                                                             - ClassFileMd5
                                                             - JarFileObfuscated
                                                             - JarName
                                                             - JarNameInsideArchive
                                                             - JndiLookupClassName
                                                             - JndiLookupClassPackageAndName
                                                             - JndiManagerClassName
                                                             - JndiManagerClassPackageAndName

                                                             Example:
                                                             --finding-match classFileMd5 and --finding-match jarFileObfuscated would only delete a file containing a vulnerability if the vulnerability contains a class file hash match and an obfuscated jar name.
                                                             If a vulnerable finding contained only one of these finding-match values then the file would not be considered for deletion.

  -h, --help                                                 help for delete
      --ignore-dir strings                                   Specify directory pattern to ignore. Use multiple times to supply multiple patterns.
                                                             Patterns should be relative to the provided root.
                                                             e.g. ignore "^/proc" to ignore "/proc" when using a crawl root of "/"
      --maximum-average-obfuscated-class-name-length int     The maximum class name length for a class to be considered obfuscated. (default 3)
      --maximum-average-obfuscated-package-name-length int   The maximum average package name length a class to be considered obfuscated. (default 3)
      --nested-archive-disk-swap-dir string                  When nested-archive-disk-swap-max-size is non-zero, this is the directory in which temporary files will be created for writing temporary large nested archives to disk. (default "/tmp")
      --nested-archive-disk-swap-max-size uint               The maximum size in bytes of disk space allowed to use for inspecting nest archives that are over the nested-archive-max-size.
                                                             By default no disk swap is to be allowed, nested archives will only be inspected if they fit into the configured nested-archive-max-size.
                                                             When an archive is encountered that is over the nested-archive-max-size, an the archive may be written out to a temporary file so that it can be inspected without a large memory penalty.
                                                             If large archives are nested within each other, an archive will be opened only if the accumulated space used for archives on disk would not exceed the configured nested-archive-disk-swap-max-size.
      --nested-archive-max-depth uint                        The maximum depth to recurse into nested archives.
                                                             A max depth of 0 will open up an archive on the filesystem but not any nested archives.
      --nested-archive-max-size uint                         The maximum compressed size in bytes of any nested archive that will be unarchived for inspection.
                                                             This limit is made a per-depth level.
                                                             The overall limit to nested archive size unarchived should be controlled
                                                             by both the nested-archive-max-size and nested-archive-max-depth. (default 5242880)
      --per-archive-timeout duration                         If this duration is exceeded when inspecting an archive,
                                                             an error will be logged and the crawler will move onto the next file. (default 15m0s)
      --skip-owner-check                                     When provided, the owner of a file will not be checked before attempting a delete.

Identify command

Produces hashes to identify a class file within a JAR.
The entire class is hashed to allow for matching against the exact version.
The bytecode opcodes making up the methods are hashed, for matching versions
with modifications.
Use the class-name option to change which class is analysed within the JAR.

Usage:
  log4j-sniffer identify <jar> [flags]

Flags:
      --class-name string   Specify the full class name and package to scan.
                            Defaults to the log4j JdniManager class. (default "org.apache.logging.log4j.core.net.JndiManager")
  -h, --help                help for identify


Identify runs against a Jar and produces both an md5 hash and a bytecode instruction hash. The primary purpose for this command is generating signatures for the crawl command.

Example output:

$ log4j-sniffer identify examples/single_bad_version/log4j-core-2.14.1.jar 
Size of class: 5029
Hash of complete class: f1d630c48928096a484e4b95ccb162a0
Hash of all bytecode instructions: 8139e14cd3955ef709139c3f23d38057-v0

Compare command

Compares the classes specified within source_jar and target_jar.
Outputs the parts the jars have in common in order to build signatures for matching.
The class names must be fully qualified and not end with .class.

Usage:
  log4j-sniffer compare <source_jar> <class> <target_jar> <class> [flags]

Flags:
  -h, --help   help for compare

Compare two classes and output similar bytecode in hex encoding. The primary purpose of this command is to produce signatures for partial bytecode matching. It is not intended to provide useful output if run against Jar files of unknown construction.

log4j-sniffer's People

Contributors

arminaaki avatar ashrayjain avatar bmoylan avatar gcampbell12 avatar glynternet avatar hpryce avatar jared-gs avatar nkeoleian avatar nmiyake avatar rootulp avatar svc-autorelease avatar svc-excavator-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

log4j-sniffer's Issues

Add a way to suppress flagging good log4j files as potentially vulnerable if renamed

It would help if we had a way to suppress these matches (produced by the crawl command) when the application owner has upgraded to log4j-core-2.17.1.jar but renamed the file.

In this example, I scanned examples/good_version/log4j-core-2.17.1.jar but removed "core-" from the name.

[MATCH] invalid version - unknown CVE status detected in file log4j-2.17.1.jar. log4j versions: unknown. Reasons: JndiLookup class and package name matched

Ignoring directories on Windows

The tool doesn't ignore subdirectories on Windows hosts, using the --ignore-dir argument as currently documented:

      --ignore-dir strings                 Specify directory pattern to ignore. Use multiple times to supply multiple patterns.
                                           Patterns should be relative to the provided root.
                                           e.g. ignore "^/proc" to ignore "/proc" when using a crawl root of "/"

Test setup

two subdirectories containing vulnerable jars:

PS > tree /f
Folder PATH listing
C:.
│   log4j-sniffer-0.8.0-windows-amd64.exe
│
├───test1
│       apache-log4j-2.14.0-bin.zip
│
└───test2
    └───subdir
            apache-log4j-2.14.0-bin.zip

Supplied syntax doesn't ignore test1 (with either \ or /)

PS > .\log4j-sniffer-0.8.0-windows-amd64.exe crawl .\ --ignore-dir '^/test1'
CVE-2021-45046 and CVE-2021-45105 detected in file test1\apache-log4j-2.14.0-bin.zip. log4j versions: 2.14.0. Reasons: jar name inside archive matched
CVE-2021-45046 and CVE-2021-45105 detected in file test2\subdir\apache-log4j-2.14.0-bin.zip. log4j versions: 2.14.0. Reasons: jar name inside archive matched
Files affected by CVE-2021-45046 or CVE-2021-45105 detected: 2 file(s) impacted by CVE-2021-45046 or CVE-2021-45105
3 total files scanned, skipped 0 paths due to permission denied errors, encountered 0 errors processing paths
PS > .\log4j-sniffer-0.8.0-windows-amd64.exe crawl .\ --ignore-dir '^\test1'
CVE-2021-45046 and CVE-2021-45105 detected in file test1\apache-log4j-2.14.0-bin.zip. log4j versions: 2.14.0. Reasons: jar name inside archive matched
CVE-2021-45046 and CVE-2021-45105 detected in file test2\subdir\apache-log4j-2.14.0-bin.zip. log4j versions: 2.14.0. Reasons: jar name inside archive matched
Files affected by CVE-2021-45046 or CVE-2021-45105 detected: 2 file(s) impacted by CVE-2021-45046 or CVE-2021-45105
3 total files scanned, skipped 0 paths due to permission denied errors, encountered 0 errors processing paths

A single-level subdirectory can be ignored by just specifying its name with no ^ / or \:

PS> .\log4j-sniffer-0.8.0-windows-amd64.exe crawl .\ --ignore-dir 'test1'
CVE-2021-45046 and CVE-2021-45105 detected in file test2\subdir\apache-log4j-2.14.0-bin.zip. log4j versions: 2.14.0. Reasons: jar name inside archive matched
Files affected by CVE-2021-45046 or CVE-2021-45105 detected: 1 file(s) impacted by CVE-2021-45046 or CVE-2021-45105
2 total files scanned, skipped 0 paths due to permission denied errors, encountered 0 errors processing paths

This doesn't work for nested subdirectories:

PS > .\log4j-sniffer-0.8.0-windows-amd64.exe crawl .\ --ignore-dir 'test2\subdir'
CVE-2021-45046 and CVE-2021-45105 detected in file test1\apache-log4j-2.14.0-bin.zip. log4j versions: 2.14.0. Reasons: jar name inside archive matched
CVE-2021-45046 and CVE-2021-45105 detected in file test2\subdir\apache-log4j-2.14.0-bin.zip. log4j versions: 2.14.0. Reasons: jar name inside archive matched
Files affected by CVE-2021-45046 or CVE-2021-45105 detected: 2 file(s) impacted by CVE-2021-45046 or CVE-2021-45105
3 total files scanned, skipped 0 paths due to permission denied errors, encountered 0 errors processing paths
PS > .\log4j-sniffer-0.8.0-windows-amd64.exe crawl .\ --ignore-dir 'test2/subdir'
CVE-2021-45046 and CVE-2021-45105 detected in file test1\apache-log4j-2.14.0-bin.zip. log4j versions: 2.14.0. Reasons: jar name inside archive matched
CVE-2021-45046 and CVE-2021-45105 detected in file test2\subdir\apache-log4j-2.14.0-bin.zip. log4j versions: 2.14.0. Reasons: jar name inside archive matched
Files affected by CVE-2021-45046 or CVE-2021-45105 detected: 2 file(s) impacted by CVE-2021-45046 or CVE-2021-45105
3 total files scanned, skipped 0 paths due to permission denied errors, encountered 0 errors processing paths

Windows download is in .tgz format not .exe

Thought I'd comment on this as a common/nontechnical windows user may run into issues:

image

Workaround I utilized:

Ran this command in powershell to convert the linux tgz to exe:
tar -xvzf C:\Users\USERNAMEHERE\Downloads\log4j-sniffer-1.0.0-windows-amd64.tgz -C C:\Users\USERNAMEHERE\Downloads
Then ran this (the .\ at the beginning to allow running in powershell, it's designed for the command prompt):
.\log4j-sniffer.exe crawl C:\

Support inspection of large archives in memory-constrained scenarios

In the current implementation, the detection logic for detecting vulnerabilities in nested ZIP archives requires the entire nested ZIP to be loaded into memory. If this program is run in low-memory scenarios with large nested archives, this could pose an issue.

In order to mitigate this, the implementation should either be updated such that entire nested archives are not loaded into memory or so that the nested archives (or the relevant files within them) are written out to disk and examined there so that disk is used rather than memory.

Add support for detection in nested archives

Currently, the detection mechanism only looks into outer archives or nested ZIP archives. However, there are many normal/valid instances where a vulnerability may be in a nested archive (for example, a JAR within a TGZ). The crawling/detection logic should be updated such that all nested archive types are examined recursively.

Improve detection mechanism

Currently, the program identifies vulnerabilities by matching based on file names and classes. However, this could be improved by analyzing the content to detect cases in which JARs are shaded and/or slightly modified by frameworks such as ProGuard.

Modify output format

Proposal is roughly as follows:

  • In default mode, only output vulnerabilities and summary
    • If there are no vulnerabilities, output is either empty or prints summary that states that there is no issue
  • Add support for --json flag
    • If provided, then vulnerabilities are outputted in single-line JSON that is easily consumable/parseable
    • TBD: should this mode also output a final summary line in JSON? If so, should this be configurable?

As part of this feature, we would likely remove metric logging as well.

The intent of this is to improve the user experience when a human is consuming the output (make it more readable) and to make it easier for a machine to consume the output (consume JSON and transform to a report/etc.).

The initial version used our logging and metric framework for ease of iteration, but the primary value that these frameworks provide are for services for which logger and metric output are ingested and processed -- given that this project is meant to be run as a CLI/program directly on machines, it doesn't make as much sense to use the logging and metric frameworks.

Don't flag the Gradle shadow plugin 7.1.2 (shadow-7.1.2.jar) as vulnerable

The Gradle Shadow plugin is a fairly common development tool. Versions before 7.1.2 included unsafe versions of log4j, but 7.1.2 uses log4j 2.17.1.

As of 1.9.0, log4j-sniffer flags shadow-7.1.2.jar for deletion. This is because it is itself a fat jar that contains a JndiLookup class at a non-standard location (prefixed with "shadow/"). The detection can be avoided by running with --disable-flagging-jndi-lookup, but I don't know how safe that is, and I don't control the environment where this is affecting me and other users.

This is particularly annoying in that Gradle doesn't recover from the situation in which the jar is deleted from ~/.gradle/caches/jars-9/. In that situation, it assumes the jar is still present (due to a .receipt file) and puts its missing location on the classpath; the plugin then fails to get applied on subsequent runs, with Gradle never redownloading the jar.

Here's the workaround I've been suggesting for people running into this, in the meantime:

try {
    apply plugin: 'com.github.johnrengelman.shadow'
} catch (Exception e) {
    throw new GradleException("The shadow plugin failed to apply properly. There's a known issue with the log4j sniffer " +
    "that can cause this to happen; try running this as a workaround: rm ~/.gradle/caches/jars-*/*/shadow-*.jar.receipt", e);
}

Detection should consider 2.16 also as vulnerable

The description of V0.5.0 says:
"log4j-sniffer crawls for all instances of log4j that are earlier than version 2.16 on disk"

But the recommended version of Log4J is 2.17 so 2.16 should also be considered unsafe to some degree as it is vulnerable to a DoS attack (CVE-2021-45105).

Thanks for a great tool!!!

Report on individual files rather than only resultant aggregation at top level

Currently we are combining all nested findings as a single aggregated result for a top-level file on disk.

To provide transparency into exactly why a given top-level file is being flagged as vulnerable, we should report on individual files, even when nested, along with the full nesting path at which they can be found.

Filesystem cache can be put under pressure

Under certain circumstances, the filesystem cache can be put under pressure due to cache evictions cause by log4j-sniffer opening and reading many files.

To alleviate this, we can support the use of O_DIRECT on linux systems, and equivalent features of other operating systems that allow for reading of files without the use of the filesystem cache.

Add support for plain list of vulnerable file paths

It would be helpful to add a mode that outputs vulnerable file paths in a plain format without any JSON wrappers, line labels, or other descriptor text. This could then be used as piped arguments to simple scripted followup actions, e.g. foo | xargs rm.

Support deleting of vulnerable files

We need to run a limited for of deleting some vulnerable matches.
Not all finding will be safe for immediate deletion so we will need to make this feature optional and provide a robust filtering of finding to match only a certain subset.

Exact filtering behaviour TBD and I'll update here once we have had a chance to discuss what we deem to be acceptable.

Add more details to human-readable output

The current human-readable output prints the path to the file or archive in which an issue is detected and a short summary of the issues that were found.

However, in some instances this information is not sufficient -- for example, if there is partial bytecode match in a class file in an archive, then having the information about the path/name of the class file within the archive is important (rather than just the path to the archive itself). It could also be helpful to explicitly list the issues that were not found rather than just those that were. The overall goal is for the human-readable output to provide enough output for an end user to fully understand what caused the vulnerability/issue to be flagged and to understand the causes to the most specific degree possible.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.