Code Monkey home page Code Monkey logo

hat-backup's Introduction

Hat Backup System

Disclaimer: This is not an official Google product.

Warning: This is an incomplete work-in-progress.

Warning: This project does currently NOT support security or privacy.

Build Status

Project

The goal of hat is to provide a backend-agnostic snapshotting backup system, complete with deduplication of file blocks and efficient navigation of backed up files.

A sub-goal is to do so in a safe and fault-tolerant manner, where a process crash is followed by quick and safe recovery.

Further, we aim for readable and maintainable code, partly by splitting the system into a few sub-systems with clear responsibility.

Disclaimer: The above text describes our goal and not the current status.

Status

This software is pre-alpha and should be considered severely unstable.

This software should not be considered ready for any use; the code is currently provided for development and experimentation purposes only.

Roadmap to a first stable release

Cleanup:

I am currently focusing on reaching a feature complete and useful state and as a result, I am skipping quickly over some implementation details. The following items will have to be revisited and cleaned up before a stable release:

  • Properly support non-utf8 paths.
  • Store and restore all relevant file metadata
    • same for symlinks.
  • Use prepared statements when communicating with SQLite.
  • Run rustfmt on the code when it is ready.
  • Reimplement argument handling in main; possibly using docopt. [thanks kbknapp]
  • Replace all uses of JSON with either Protocol Buffers or Cap'n Proto.
  • Go through uses of 'unwrap', 'expect' etc and remove them where possible; preferably, the caller/initiater should handle errors.
  • Think about parallelism and change the pipeline of threads to make better use of it.
    • Parallel snapshotting
  • Figure out how to battle test the code on supported platforms.

Functionality:

There are a bunch of lacking functionality needed before a feature complete release is in sight:

  • Commit hash-tree tops of known snapshots to external storage.
  • Add recovery function to restore local metadata from external hash-tree tops (for when all local state is gone).
    • Basic read-only recovery.
    • Full read-write recovery with GC metadata rebuilding.
    • Need to allow users to opt for read-only.
  • Add book-keeping for metadata needed to identify live hashes (e.g. reference sets in each family's keyindex).
  • Add deletion and garbage-collection.
    • Make 'commit' crash-safe by retrying failed 'register' and 'deregister' runs. Add tests as this is fragile logic.
    • GC should not be able to break the index. This can be avoided by having 'snapshot' check if hashes it wants to reuse still exist (i.e. have not been GC'ed yet).
    • GC should delete hashes top-down to avoid removing a child hash before its parent hash.
  • Have the blobstore talk to external thread(s) to isolate communication with external storage.
  • Make the API used for talking to the external storage easy to change (put it in separate put/get/del programs).
  • Add encryption through NaCL/sodiumdioxide; preferably as late as possible.

Future wishlist: (not blocking first release)

  • Output a dot graph over current hash trees to show dependencies and reuse.
  • FSCK style metadata verification ("check" subcommand?).
  • Commit snapshots while indexing them (possibly through "weak" snapshots that are ignored by GC). The purpose is to allow checking out a partial snapshot.
  • Add "--pretend" to all subcommands and have it give a signal as to what would happen without it.

Building from source

First, make sure you have the required system libraries and tools installed:

  • libsodium
  • libsqlite3
  • capnproto (at least version 0.5.3)
  1. Install rust (try nightly or check commit log for compatible version)
  2. Checkout the newest version of the source:
    • git clone https://github.com/google/hat-backup.git
    • cd hat
  3. Let Cargo build everything needed:
    • cargo build --release

Try the hat executable using Cargo (the binary is in target/release/)

  • cargo run --release snapshot my_snapshot /some/path/to/dir
  • cargo run --release commit my_snapshot
  • cargo run --release checkout my_snapshot output/dir

License and copyright

See the files LICENSE and AUTHORS.

Contributions

We gladly accept contributions/fixes/improvements etc. via GitHub pull requests or any other reasonable means, as long as the author has signed the Google Contributor License.

The Contributor License exists in two versions, one for individuals and one for corporations:

https://developers.google.com/open-source/cla/individual https://developers.google.com/open-source/cla/corporate

Please read and sign one of the above versions of the Contributor License, before sending your contribution. Thanks!

Authors

See the AUTHORS.txt file.

This project is inspired by a previous version of the system written in Haskell: https://github.com/mortenbp/hindsight

hat-backup's People

Contributors

brinchj avatar kbknapp avatar tethyssvensson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hat-backup's Issues

RPC framework

If the client is going to be coded in Rust, then a RPC framework like tarpc could be better.

Benchmarks with less variance

The current benchmarks have very high variance -- it is almost impossible to assert whether a code change results better, worse or unchanged performance.

This could potentially be done as part of #23.

Flexible backend API

Currently, the backend is defined in Rust. We should add a CMD backend to allow users to define their backend via small programs in PATH. This will make reuse of existing tooling around storage backends easier.

Handle commit of partial snapshot gracefully

Currently, we abort when a hash from the index is not known, causing resume to auto-retry and abort again.

We should either skip files with unknown hashes (with a warning) or abort gracefully (without resume).

Test resume functionality

Currently not all threads propagate errors when handling incoming requests. Changing this should make the code easier to follow by removing premature panics and allow us to test the resume functionality.

The idea is to trigger an error, reset the hat system and then resume. At present, just triggering the error causes a panic, which wipes the in memory databases used during tests.

An alternate option: put the test databases in local temporary files and recover the panic. Error propagation seems cleaner, although a true panic would probably be more authentic.

Build instructions should mention the dependency on libsodium.

I couldn't build on Mac without installing libsodium first. I fixed it with brew install libsodium. Before installing libsodium, I received this error:

$ cargo build
   Compiling libsodium-sys v0.0.4
   Compiling rustc-serialize v0.3.12 (https://github.com/rust-lang/rustc-serialize#031333c0)
   Compiling libc v0.1.5
   Compiling gcc v0.3.4
   Compiling threadpool v0.1.3 (https://github.com/rust-lang/threadpool#b095eec8)
   Compiling log v0.3.1
   Compiling sodiumoxide v0.0.4 (https://github.com/dnaq/sodiumoxide#5b162cfe)
   Compiling time v0.1.24 (https://github.com/rust-lang/time#0ed5596e)
   Compiling rand v0.3.7
   Compiling sqlite3 v0.1.0 (https://github.com/linuxfood/rustsqlite#d586703e)
   Compiling quickcheck v0.2.11
   Compiling quickcheck_macros v0.2.11
   Compiling hat-backup v0.0.1-pre (file:///Users/_/Documents/rust/hat-backup)
error: linking with `cc` failed: exit code: 1
note: "cc" "-m64" "-L" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib" "-o" "/Users/_/Documents/rust/hat-backup/target/debug/hat" "/Users/_/Documents/rust/hat-backup/target/debug/hat.o" "-Wl,-force_load,/usr/local/lib/rustlib/x86_64-apple-darwin/lib/libmorestack.a" "-Wl,-dead_strip" "-nodefaultlibs" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/libtest-4e7c5e5c.rlib" "/Users/_/Documents/rust/hat-backup/target/debug/deps/librand-7b0a3af7ae4685dc.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/libterm-4e7c5e5c.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/libserialize-4e7c5e5c.rlib" "/Users/_/Documents/rust/hat-backup/target/debug/deps/libthreadpool-ae7965650183f63e.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/libgetopts-4e7c5e5c.rlib" "/Users/_/Documents/rust/hat-backup/target/debug/deps/libtime-f16786cac15c85f3.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/liblog-4e7c5e5c.rlib" "/Users/_/Documents/rust/hat-backup/target/debug/deps/libsqlite3-0e1cfb4ed5b5eebb.rlib" "/Users/_/Documents/rust/hat-backup/target/debug/deps/liblog-54cf393d3c69686f.rlib" "/Users/_/Documents/rust/hat-backup/target/debug/deps/libsodiumoxide-ca7d5de7b525208e.rlib" "/Users/_/Documents/rust/hat-backup/target/debug/deps/liblibsodium_sys-9aa26d59237d8949.rlib" "/Users/_/Documents/rust/hat-backup/target/debug/deps/liblibc-8c77960f0e8d4e86.rlib" "/Users/_/Documents/rust/hat-backup/target/debug/deps/librustc_serialize-9bffcb354c400db9.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/libstd-4e7c5e5c.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/libcollections-4e7c5e5c.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/libunicode-4e7c5e5c.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/librand-4e7c5e5c.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/liballoc-4e7c5e5c.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/liblibc-4e7c5e5c.rlib" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib/libcore-4e7c5e5c.rlib" "-L" "/Users/_/Documents/rust/hat-backup/target/debug" "-L" "/Users/_/Documents/rust/hat-backup/target/debug/deps" "-L" "/Users/_/Documents/rust/hat-backup/target/debug/build/time-f16786cac15c85f3/out" "-L" "/usr/local/lib/rustlib/x86_64-apple-darwin/lib" "-L" "/Users/_/Documents/rust/hat-backup/.rust/lib/x86_64-apple-darwin" "-L" "/Users/_/Documents/rust/hat-backup/lib/x86_64-apple-darwin" "-lsqlite3" "-lsodium" "-lc" "-lm" "-lSystem" "-lpthread" "-lc" "-lm" "-lcompiler-rt"
note: ld: warning: directory not found for option '-L/Users/_/Documents/rust/hat-backup/.rust/lib/x86_64-apple-darwin'
ld: warning: directory not found for option '-L/Users/_/Documents/rust/hat-backup/lib/x86_64-apple-darwin'
ld: library not found for -lsodium
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Make hat build on stable

Remaining issues:

  • Make the diesel schemas run on stable. (Fixed in #15)
  • Stop depending on the FnBox from stable. (Fixed in #12)
  • Stop depend on unstable features from the test crate. (Fixed in #20).
  • Make Travis test stable as well.

Make benchmarks run on stable

The current benchmarks only run on nightly. To be able to compile on stable, they have been put the feature-gate benchmarks. This is not ideal, as it would be nice to have benchmarks on stable as well.

We should not have linear memory usage

Listing all our hashes or keys is using linear memory and could potentially be too large for RAM.

Whenver we list the contents of an index, the result should be streamed out the underlying SQLite database. There is currently no way to do this with Diesel (except manually with limits), so we will revisit this later when Diesel and our codebase have both matured a bit more.

multi-platform?

It could be nice if you specify in the README which platforms hat should run on.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.