Code Monkey home page Code Monkey logo

container-inspector's Introduction

container-inspector

container-inspector is a suite of analysis utilities and command line tools for Docker images, containers, root filesystems and virtual machine images.

For Docker images, it can process layers and how these relate to each other as well as Dockerfiles.

container-inspector provides utilities to:

  • identify Docker images in a file system, its layers and the related metadata.
  • given a Docker image, collect and report its metadata.
  • given a Docker image, extract the layers used to rebuild what how a runtime rootfs would look.
  • find and parse Dockerfiles.
  • find how Dockerfiles relate to actual images and their layers.
  • given a Docker image, rootfs or Virtual Machime image collect inventories of packages and files installed in an image or layer or rootfs (implemented using a provided callable)
  • detect the "distro" of a rootfs of image using os-release files (and an extensive test suite for these)
  • detect the operating system, architecture and

Quick start

  • Only runs on POSIX OSes
  • Get Python 3.6+
  • Check out a clone or download of container-inspector, then run: ./configure --dev.
  • Then run env/bin/container-inspector -h for help.

Container image formats

container-inspector handles the formats of Docker images as created by the docker save command. There are three versions for this Docker image format. The latest v1.2 is a minor update to v1.1.

  • v1.1 provides improved and richer metadata over v1.0 with a top level manifest.json file and a Config file for each image with full layer history and ordeing. It also use checksum for enhanced security and traceability of images and layers.
  • v1.0 uses a simple repositories meta file and requires infering the ordering of the layers in an image based on each individual layer json meta file. This format is no longer support in the latest version of container-inspector.
  • All V1.x formats use the same storage format for layers e.g the layer format V1.0 where each layer is stored in a sub-directories named after the layer id. Each of this directories contains a "layer.tar" tarball with the layer payload, a "json" JSON metadata file describing the layer and a "VERSION" file describing the layer format version. Each tarball represents a slice or diff of the image root file system using the AUFS conventions.

At runtime, in a sequence of layers of an image, each root filesystem slice of a layer is "layered" on top of each other from the root bottom layer to the latest layer (or selected tagged layer) using a union file system (e.g. AUFS). In AUFS, any file or directory prefixed with .wh. are "white outs" files deleting files in the underlying layers.

See the image specifications saved in docs/references/

Internal data model

  • Image: this is a runnable image composed of metadata and a sequence of layers.
  • Layer: this is a slice of an image root filesystem with a payload and metadata
  • Resource: this a file or directory

Plans

  • in progress: support OCI image layout
  • improved suport for Windows containers

Related tools

  • Fetching Image from remote registry is available in ScanCode.io
  • Extracting VM Image filesystems as archives is available in ExtractCode
  • Scanning for application and system packages is available in ScanCode Toolkit

container-inspector's People

Contributors

agustinhenze avatar ayansinhamahapatra avatar backslasher avatar bboozzoo avatar chinyeungli avatar jhecking avatar johnmhoran avatar jonoyang avatar karlamrhein avatar keshav-space avatar kmf avatar larpon avatar mjherzog avatar nathanchere avatar omkarph avatar peteradowns avatar peterpitterling avatar phillipsz avatar pizzadude avatar pombredanne avatar popey avatar priv-kweihmann avatar probonopd avatar steven-esser avatar swastkk avatar tas50 avatar tdruez avatar tg1999 avatar zicklam avatar zyga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

container-inspector's Issues

AssertionError as returned value2 is None instead of {}

Traceback (most recent call last):
  File "./repos", line 11, in <module>
    load_entry_point('conan', 'console_scripts', 'repos')()
  File "/home/jqbx34/tools/conan-develop/local/lib/python2.7/site-packages/click/core.py", line 664, in __call__
    return self.main(*args, **kwargs)
  File "/home/jqbx34/tools/conan-develop/local/lib/python2.7/site-packages/click/core.py", line 644, in main
    rv = self.invoke(ctx)
  File "/home/jqbx34/tools/conan-develop/local/lib/python2.7/site-packages/click/core.py", line 837, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/jqbx34/tools/conan-develop/local/lib/python2.7/site-packages/click/core.py", line 464, in invoke
    return callback(*args, **kwargs)
  File "/home/jqbx34/tools/conan-develop/src/conan/cli.py", line 198, in conanv11
    registry.populate(loc)
  File "/home/jqbx34/tools/conan-develop/src/conan/image_v11.py", line 136, in populate
    repo.load_manifest(rd)
  File "/home/jqbx34/tools/conan-develop/src/conan/image_v11.py", line 247, in load_manifest
    image = Image.load_image_config(config_file)
  File "/home/jqbx34/tools/conan-develop/src/conan/image_v11.py", line 473, in load_image_config
    config, warns = merge_configs(ccnf, cnf)
  File "/home/jqbx34/tools/conan-develop/src/conan/image_v11.py", line 528, in merge_configs
    return merge_update_mappings(container_config, config, mapping=OrderedDict)
  File "/home/jqbx34/tools/conan-develop/src/conan/utils.py", line 198, in merge_update_mappings
    assert isinstance(value2, Mapping)
AssertionError

AssertionError as value2 returned None

Tests failing due to missing data file

When running the tests from the PyPI tarball, on openSUSE Tumbleweed, I get two failures looking for ubuntu-artful.txt-distro-expected.json in https://github.com/nexB/container-inspector/tree/main/tests/data/distro/os-release/ubuntu , but it doesnt exist.

This occurs irrespective of whether I add /etc/os-release from https://software.opensuse.org/package/openSUSE-release to the build VM.

[   32s] =================================== FAILURES ===================================
[   32s] _________________ TestDistro.test_distro_from_os_release_file __________________
[   32s] 
[   32s] self = <test_distro.TestDistro testMethod=test_distro_from_os_release_file>
[   32s] 
[   32s]     def test_distro_from_os_release_file(self):
[   32s]         test_dir = self.get_test_loc('distro/os-release')
[   32s]     
[   32s]         for test_file in resource_iter(test_dir, with_dirs=False):
[   32s]             if test_file.endswith('-expected.json'):
[   32s]                 continue
[   32s]             expected = test_file + '-distro-expected.json'
[   32s]             result = Distro.from_os_release_file(test_file).to_dict()
[   32s] >           check_expected(result, expected, regen=False)
[   32s] 
[   32s] tests/test_distro.py:41: 
[   32s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[   32s] 
[   32s] result = {'architecture': None, 'bug_report_url': 'https://bugs.launchpad.net/ubuntu/', 'build_id': None, 'cpe_name': None, ...}
[   32s] expected = '/home/abuild/rpmbuild/BUILD/container-inspector-32.0.1/tests/data/distro/os-release/ubuntu/ubuntu-artful.txt-distro-expected.json'
[   32s] regen = False
[   32s] 
[   32s]     def check_expected(result, expected, regen=False):
[   32s]         """
[   32s]         Check equality between a result collection and an expected JSON file.
[   32s]         Regen the expected file if regen is True.
[   32s]         """
[   32s]         if regen:
[   32s]             with open(expected, 'w') as ex:
[   32s]                 ex.write(json.dumps(result, indent=2))
[   32s]     
[   32s] >       with open(expected) as ex:
[   32s] E       FileNotFoundError: [Errno 2] No such file or directory: '/home/abuild/rpmbuild/BUILD/container-inspector-32.0.1/tests/data/distro/os-release/ubuntu/ubuntu-artful.txt-distro-expected.json'
[   32s] 
[   32s] tests/utilities.py:21: FileNotFoundError
[   32s] _______________________ TestDistro.test_parse_os_release _______________________
[   32s] 
[   32s] self = <test_distro.TestDistro testMethod=test_parse_os_release>
[   32s] 
[   32s]     def test_parse_os_release(self):
[   32s]         test_dir = self.get_test_loc('distro/os-release')
[   32s]     
[   32s]         for test_file in resource_iter(test_dir, with_dirs=False):
[   32s]             if test_file.endswith('expected.json'):
[   32s]                 continue
[   32s]             expected = test_file + '-expected.json'
[   32s]             result = parse_os_release(test_file)
[   32s] >           check_expected(result, expected, regen=False)
[   32s] 
[   32s] tests/test_distro.py:31: 
[   32s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[   32s] 
[   32s] result = {'BUG_REPORT_URL': 'https://bugs.launchpad.net/ubuntu/', 'HOME_URL': 'https://www.ubuntu.com/', 'ID': 'ubuntu', 'ID_LIKE': 'debian', ...}
[   32s] expected = '/home/abuild/rpmbuild/BUILD/container-inspector-32.0.1/tests/data/distro/os-release/ubuntu/ubuntu-artful.txt-expected.json'
[   32s] regen = False
[   32s] 
[   32s]     def check_expected(result, expected, regen=False):
[   32s]         """
[   32s]         Check equality between a result collection and an expected JSON file.
[   32s]         Regen the expected file if regen is True.
[   32s]         """
[   32s]         if regen:
[   32s]             with open(expected, 'w') as ex:
[   32s]                 ex.write(json.dumps(result, indent=2))
[   32s]     
[   32s] >       with open(expected) as ex:
[   32s] E       FileNotFoundError: [Errno 2] No such file or directory: '/home/abuild/rpmbuild/BUILD/container-inspector-32.0.1/tests/data/distro/os-release/ubuntu/ubuntu-artful.txt-expected.json'
[   32s] 
[   32s] tests/utilities.py:21: FileNotFoundError

Add support for Skopeo docker-archive and OCI-formatted container images

Collect rpm checksums

To enhance the reporting capabilities of Conan and the rpm inventory it generates, it would be helpful to collect the SHA1 checksums of these rpm files and output them to the csv.

This may be a separate set of commands that needs to be run on the container, or the rpm utility may have some of this functionality.

Collect files installed by packages

For each of the major distro/package managers (rpm, dep, apk, etc) collect not only the installed packages but also the set of files installed by these.

Add support for OCI spec annotations (and legacy LABEL)

These can be added as LABEL from a Dockerfile https://github.com/Tekki/docker-perl-test/blob/e52c397f9daf04ca89c416b76b610da5ae1cc15a/p16/Dockerfile#L24 and were initially specified in http://label-schema.org/rc1/ and are now specified at https://github.com/opencontainers/image-spec/blob/master/annotations.md

Some key/values are predefined:
https://github.com/opencontainers/image-spec/blob/master/annotations.md#pre-defined-annotation-keys :

  • org.opencontainers.image.created: date/timestamp
  • org.opencontainers.image.authors: contact details of the people or organization responsible for the image (freeform string)
  • org.opencontainers.image.url: homepage URL
  • org.opencontainers.image.documentation: documentation URL
  • org.opencontainers.image.source: VCS/download URL
  • org.opencontainers.image.version: version ideally semver and mapped to a VCS repo tag
  • org.opencontainers.image.revision: VCS revision e.g. git commit in org.opencontainers.image.source
  • org.opencontainers.image.vendor: author (e.g. created_by)
  • org.opencontainers.image.licenses: Should be an SPDX License Expression
  • org.opencontainers.image.ref.name: unclear... "Name of the reference for a target"
  • org.opencontainers.image.title: brief summary or "name" of the image?
  • org.opencontainers.image.description: long description

These are some examples:

"_extract_tar: skipping link with missing target:" when extracting image

From nexB/scancode.io#471

While scanning docker://quay.io/wire/alpine-deps I get this :

_extract_tar: skipping link with missing target: <TarInfo 'bin/arch' at 0x7ffa99193d00>: Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/container_inspector/utils.py", line 120, in extract_tar target = tarball._find_link_target(tarinfo)
File "/usr/local/lib/python3.9/tarfile.py", line 2418, in _find_link_target raise KeyError("linkname %r not found" % linkname) KeyError: "linkname 'bin//bin/busybox' not found" _extract_tar: skipping link with missing target: <TarInfo 'bin/ash' at 0x7ffa99193dc0>: Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/container_inspector/utils.py", line 120, in extract_tar target = tarball._find_link_target(tarinfo)
File "/usr/local/lib/python3.9/tarfile.py", line 2418, in _find_link_target raise KeyError("linkname %r not found" % linkname) KeyError: "linkname 'bin//bin/busybox' not found" _extract_tar: skipping link with missing target: <TarInfo 'bin/base64' at 0x7ffa99193b80>: Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/container_inspector/utils.py", line 120, in extract_tar target = tarball._find_link_target(tarinfo)
File "/usr/local/lib/python3.9/tarfile.py", line 2418, in _find_link_target raise KeyError("linkname %r not found" % linkname) KeyError: "linkname 'bin//bin/busybox' not found" _extract_tar: skipping link with missing target: <TarInfo 'bin/bbconfig' at 0x7ffa99193e80>: Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/container_inspector/utils.py", line 120, in extract_tar target = tarball._find_link_target(tarinfo)

Migrate repository to aboutcode-org

Procedures:

  • Announce on Gitter/Element, in a specific channel if available, to notify of the org migration process including the scheduled start time and the expected completion time.
  • Follow the following procedures instruction for the org migration process
    • 1. Transfer the nexB repo to the aboutcode-org organization from Settings -> Danger Zone -> Transfer ownership
    • 2. Set up CI for that repo on Azure Devops
      • Log into the nexB organization on Azure Devops
      • Go to the project that was moved
      • On the left hand menu, click on Pipelines, and then delete the existing pipeline
      • On the same page, click New pipeline, and then click Github, then select the repo that was transferred to aboutcode-org
      • Complete adding the pipeline
  • Announce on Gitter/Element, in a specific channel if available, to notify the completion of the org migration.
  • Update org-level permissions: Check org-level permissions for the source repo and update the repo-level permissions on the migrated repo
  • Check if there are any old references with the previous org (documentation, reference links) that need to be updated.
  • Push a new release about the org change with the updated references

Debian distro not correctly detected with this os-release

This /usr/lib/os-release

PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

was not detected correctly in rootfs pipeline.

  1. we should ensure that we can get to this path too
  2. we should be more flexible with the paths (and leading segments)
  3. in all cases, checking for system-installed packages should not be dependent on the detected distro. We should try Debian, RPM, etc. and all supported types in all FS and/or layers analyzed in a scancode.io pipeline (which impacts the code here, in scancode-toolkit and in scancode.io)

Collect installed package metadata

Beyond collecting the presence of a package as installed (and get the basic name/version) we should also collect the corresponding metadata

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.