Code Monkey home page Code Monkey logo

boxitory's Introduction

Boxitory

is repository for Vagrant's Virtual Machine boxes, which can manage box versions and provides Vagrant compatible http interface. Boxes are stored on local filesystem.

Download Latest release

For more info how it works, how to configure, ... See Wiki

Build & run

Java 11 is required

$ ./mvnw install && java -jar target/boxitory-{version}.jar

or

$ ./mvnw spring-boot:run

By default, http server will start on port 8083.

Docker

$ ./mvnw clean package docker:build docker:start

or

$ ./mvnw clean package docker:build docker:run

By default, container expose port 8083 with running app. Files with boxes needs to be stored in ./boxes dir.

boxitory's People

Contributors

dependabot[bot] avatar sparkoo avatar vtelensky avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

boxitory's Issues

Print date & time to version descriptions

When timestamp is used as version, it can be used to print in description as it is useful information.

  • new configuration boolean property box.version_as_timestamp
  • when true, version should be transferred to readable date & time and print in description
  • when false (default), don't change current behavior
  • when description exists, datetime should be concatenated in front of the description with - as separator
  • use ISO8601 format (https://en.wikipedia.org/wiki/ISO_8601) in UTC e.g.: 2017-11-02T09:42:55Z

example (version 1509628106 has it's own description. version 1507206191 has no description.):

...
"versions": [
    {
      "version": "1509628106",
      "description: "2017-11-02T13:08:26Z - This version already has this description",
      "providers": [...]
    }, {
      "version": "1507206191",
      "description: "2017-10-05T12:23:11Z",
      "providers": [...]
    }
  ]
...

Checksum precalculation

Calculating box checksums on demand has too high performance cost for http request (~1.5GB box takes about 5s to calculate md5 checksum). So we need to calculate checksums on background, persist results and on request just read those checksums.

  • checksum calculation must not block any request -> must run on background
  • BackgroundChecksumCalculator calculates checksums with given configuration for all boxes and persist them with HashStore
  • BackgroundChecksumCalculator must keep checksums in latest state -> if new box is added when Boxitory already running, it must catch it and calculate checksum
  • once checksum is calculated and persisted, it can't be replaced
  • new configuration option box.checksum_precalculate
    • boolean value - true|false
    • tells whether checksums for boxes should be precalculated on background
    • default true
  • probably there will be some other (advanced) configuration option configuring background calculation
  • probably BackgroundChecksumCalculator interface with FilesystemBackgroundChecksumCalculator implementation
  • Don't forget to test and keep it as independent as possible. Again, there must be space for replace storing checksums to file by e.g. store in database.

Add supported java detection

Maybe it would be useful to add detection of the supported version of java. When the application starts on an unsupported version it could print some message in UI or into logs.

persisting checksums

when checksum is calculated for particular box, it should persist result on backend (filesystem).
Format is known from #13

  • file with checksum is created for each box
  • filename f26_1_virtualbox.box.md5. Content is checksum with filename
524eb6427a07896be6ebc0a49a943886  f26_1_virtualbox.box.md5

This is first step of checksum performance optimizations.
This functionality will be controlled by configuration option box.checksum_persist with true|false values with true as default.
Once checksum is persisted on filesystem, it is not recalculated anymore. When user decides to replace box file, it's his responsibility to clean checksum files.

Show just valid VMs on Index page

FilesystemBoxRepository#getBoxes now returns all potential VMs (folders) in repository's directory. It should return just valid ones, which contains at least one valid box inside.

http interface to download boxes

Currently json api expose path of the box with some configurable prefix and assumes that it's accessible. It would be nice to have http interface that allows downloading boxes directly through boxitory.
Download is already implemented in DownloadController so it should not be hard to expose it to interface.
To implement this, it should be enough to expose relative path to box, instead of absolute path as it is now.
New configuration option should be implemented box.path_type with options raw (default) and boxitory. Then when exposing relative path, it should be enough to set host_prefix to http://example.com and the resulting path must satisfy DownloadController api @RequestMapping(value = "/download/{boxName}/{boxProvider}/{boxVersion}", method = RequestMethod.GET).

wrong warn for wrong box file name

WARN message for wrong box name is logged for checksum files and description.csv. Those are valid files so don't write warn for them.

Test coverage

Include test coverage tool (jacoco?) into the build

Support multiple providers

when files with multiple providers found, return them in proper way

...
  "versions": [
    {
      "version": "19",
      "providers": [
        {
          "url": "sftp://my_box_server:/boxes/f26-x64/f26-x64_19_virtualbox.box",
          "name": "virtualbox"
        },
        {
          "url": "sftp://my_box_server:/boxes/f26-x64/f26-x64_19_libvirt.box",
          "name": "libvirt"
        }
      ]
...

Current behavior returns it as different version object with same version string

http download box size

Current http download does not propagate box size. This should be enhanced by some http header with filesize.

Create github Wiki

README.md gets quite long, not much readable and hard to navigate.

  • Create Wiki here on github
  • In README.md keep just
    • simple description
    • how to build & run
    • don't forget to add link to the Wiki

get latest available version

create method on http API that returns latest box version

  • http://server:port/box_name/latestVersion should response just with number of latest version of given box
curl localhost:8083/f26-x64/latestVersion
123456789

Simple index page

Make simple index page that will contain just links to all boxes that are available.

Ensure N latest boxes to have checksum

  • new configuration option box.checksum_ensure
    • integer value
    • tells how many boxes must be returned with checksum, in descendent order by version. When checksums for that many boxes is not precalculated, they are calculated on demand. Other versions has checksum only in case these are precalculated, thus easy to get.
    • when precalculation is turned off, just N boxes have checksum calculated on demand
      • 1 means that latest version of box comes always with checksums, other versions just when already precalculated
      • 0 no on demand checksum calculations are done
    • default 1

Box version descriptions

  • check whether this format is ok for vagrant
... {
      "version": "19",
      "description": "This is description of version 19",
      "providers": [ ... ]
} ...
  • there should be file named descriptions.csv beside of box files.
    • Format of this file is following
version;;;description
1;;;this is description of version 1
2;;;this is description of version 2
  • ;;; is separator
  • Must be implemented for FilesystemBoxRepository. You must ensure that when there will be another implementation of BoxRepository, it could have another implementation of handling descriptions (e.g. database).
    • Maybe some DescriptionReader interface with FilesystemDescriptionReader implementation injected to FilesystemBoxRepository?

better logging config

configure logback so it logs to file, with some reasonable rolling. E.g. one file with max size of 10MB. Pack logback.xml config with release.

Sort output json by version

versions array is now sorted somehow undefined.
It should be sorted by field version.
Be aware that version is now String and array must be sorted numerically.

  • sort by version, ascending by default
  • make new configuration option box.sort_desc=true|false, with false as default. When true, sort should be descending.

Checksum calculation optimization

Checksum calculation takes too much cpu+memory resources for such big files as virtual machines are.

  • Change calculating from byte[] to ByteBuffer, which should have positive impact on memoru consumption and hopefully performance too.
  • If not sufficient, think of precalculating checksums on background and persist calculated checksums somehow. Create another task for that.

checksum for boxes

Result json file can have checksum for each box to check integrity of downloaded box to local machine. Format is like this:

...
"providers": [
        {
          "name": "virtualbox",
          "url": "http://somewhere.com/precise64_010_virtualbox.box",
          "checksum_type": "sha1",
          "checksum": "foo"
        }
      ]
...

source: https://www.vagrantup.com/docs/boxes/format.html

  • Find what checksum types vagrant supports and implement them to boxitory.
    -- md5
    -- sha1
    -- ???
  • Make checksum configurable
    -- box.checksum=disabled|md5|sha1|... to disable or set particular hash function
  • Checksum should be disabled by default.
  • Optional: check performance cost. If low, we may consider enable by default.
    -- Note that we must count with many (>50) quite big (>20GB) boxes.
  • Dev note: Probably should be implemented as some kind of "service" that is injected to FilesystemBoxRepository in case that configuration option is set.
  • Describe property in README.md.
  • unit tests :)

Test race condition???

This test randomly, but with very low probability, fails.

  • on slow HW/under heavy system load ???
givenOneBox_whenRequestWithEnsureChecksumInParallel_thenChecksumIsProcessedAsEnsured(cz.sparko.boxitory.test.e2e.EnsureChecksumParallelTest)  Time elapsed: 0.091 sec  <<< FAILURE!
java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException
        at cz.sparko.boxitory.test.e2e.EnsureChecksumParallelTest$VmRequest.run(EnsureChecksumParallelTest.java:121)
        at cz.sparko.boxitory.test.e2e.ConcurrentTester.lambda$new$0(ConcurrentTester.java:11)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ArrayIndexOutOfBoundsException: null
        at java.lang.System.arraycopy(Native Method)
        at java.util.ArrayList.addAll(ArrayList.java:584)
        at org.springframework.boot.test.autoconfigure.web.servlet.SpringBootMockMvcBuilderCustomizer$DeferredLinesWriter.write(SpringBootMockMvcBuilderCustomizer.java:253)
        at org.springframework.boot.test.autoconfigure.web.servlet.SpringBootMockMvcBuilderCustomizer$LinesWritingResultHandler$LinesPrintingResultHandler.write(SpringBootMockMvcBuilderCustomizer.java:190)
        at org.springframework.boot.test.autoconfigure.web.servlet.SpringBootMockMvcBuilderCustomizer$LinesWritingResultHandler.handle(SpringBootMockMvcBuilderCustomizer.java:180)
        at org.springframework.test.web.servlet.MockMvc.applyDefaultResultActions(MockMvc.java:195)
        at org.springframework.test.web.servlet.MockMvc.perform(MockMvc.java:163)
        at cz.sparko.boxitory.test.e2e.EnsureChecksumParallelTest$VmRequest.run(EnsureChecksumParallelTest.java:115)
        at cz.sparko.boxitory.test.e2e.ConcurrentTester.lambda$new$0(ConcurrentTester.java:11)
        at java.lang.Thread.run(Thread.java:748)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.