Code Monkey home page Code Monkey logo

kopia / kopia Goto Github PK

View Code? Open in Web Editor NEW
6.3K 51.0 323.0 29.25 MB

Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.

Home Page: https://kopia.io

License: Apache License 2.0

Makefile 0.84% Go 97.24% HTML 0.32% Shell 0.63% JavaScript 0.94% SCSS 0.01% Dockerfile 0.02%
deduplication backup google-cloud-storage encryption cloud hacktoberfest

kopia's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kopia's Issues

implement extract command

I have tested this and its a great tool, especially because of the sftp and webdav, very fast and modern, but unfortunately there is one major flaw: there is no extract command, which is present in all other backup programs.

I was thinking:

kopia extract [snapshot_id] [PATH] [EXTRACT_PATH] <--test> <--verify> <-a>

Problem with FUSE: I am backing up on a VPS, as many people I am sure do, in the case of OpenVZ, there is no fuse by default and many providers are unwilling to enable it because it uses too much resources, I tried also to mount remotely but it will not work, so I am unable to use your program because I cannot restore the files.

Second upload of an unchanged directory seems to hash files again

I ran a snapshot create on a directory with 10 ~1GB files (AWS S3 repository). The first run took:

real	5m38.102s
user	4m49.238s
sys	0m14.174s

I then created another snapshot on the same directory and it looks like it scanned all the files again. CPU usage was quite high. Running time:

real	0m39.730s
user	3m23.857s
sys	0m2.873s

From my understanding of the docs and options for snapshot create, I don't think it should be hashing the files.

SFTP Storage

Are you looking into implementing this ?

What would it take to be implemented ?

[Feature] Define a Kopia CLI output format

This becomes part of the API contract with systems that interact with the Kopia CLI and parse its output.

Some commands have a well defined output, such as blob list, for others it is not completely clear and some of the interesting and relevant information may go instead to the logs.

First step is to review the output of the different commands, document the format for the ones that we consider are well defined, and identify the command outputs that need improvement.

As a related feature (that should probably be tracked in a separate ticket), is to add structured output for those commands, for example using JSON.

Robustness testing toolset implementation

Umbrella issue to track progress on implementation of robustness and data verification test framework and tests.

Plan for initial implementation:

  • Wrap fio to in a helper library for modifying and manipulating files and file contents
  • Wrap fswalker in a helper library for capturing file system state and checking integrity
  • Build framework to track verification metadata and issue actions against the repository and the data set
  • Implement a test suite making use of the test framework to validate data after kopia restore

Improve test coverage

Right now most tests are focusing on github.com/kopia/repo while this repo has just a few e2e tests

kopia create snapshot should refuse working in case repository-path points on empty dir

I was playing with Kopia on mac-laptop with SMB-mounted volume on NAS for the repository.
I had a script that does:

mount /mnt/smb/kopia-repo
kopia snapshot create ... 
mount /mnt/smb/kopia-repo

[yes it should be:
mount /mnt/smb/kopia-repo && kopia snapshot create ... && umount /mnt/smb/kopia-repo
but that's still not perfect]

It happened that "kopia snapshot create" got executed when the volume was not yet mounted,
so started to dump all the backup on local disk from scratch.

Expected behavior:
kopia fails with error:
"/mnt/smb/kopia-repo does not contain valid repository. Please initialize it using:
kopia repository create filesystem --path=/mnt/smb/kopia-repo".

Moreover after such 'backup' to a 'new repo' I think that kopia's cache got out of sync, as next
kopia snapshot create was really slow.

Test failure: Timeouts in blob/s3 tests

Manifests as:

2020/02/20 02:39:41 got error unable to determine if bucket "kopia-test-618dbaf8f3ffe4e6" exists:
  Access Denied. when New() S3 storage (#0), sleeping for 1s before retrying

All the retries fail, and eventually the test times out.

FAIL github.com/kopia/kopia/repo/blob/s3 90.041s

kopia server: automatic snapshots not created

I have set up kopia server to back up a few directories each night. Most of the time, this succeeds but occasionally kopia will not have sufficient permissions to access one or two sub-directories of a directory to back up.

When running kopia snapshot create /var/data/ manually, the directories are skipping and the snapshot is created anyway, e.g.

12:11:13.381 [kopia/upload] unable to hash file "./postgres/pg_stat_tmp/db_0.stat": unable to open file: open /var/data/postgres/pg_stat_tmp/db_0.stat: permission denied, ignoring
12:11:13.381 [kopia/upload] unable to hash file "./postgres/pg_stat_tmp/db_1.stat": unable to open file: open /var/data/postgres/pg_stat_tmp/db_1.stat: permission denied, ignoring
12:11:13.381 [kopia/upload] unable to hash file "./postgres/pg_stat_tmp/db_13067.stat": unable to open file: open /var/data/postgres/pg_stat_tmp/db_13067.stat: permission denied, ignoring
12:11:13.381 [kopia/upload] unable to hash file "./postgres/pg_stat_tmp/db_16384.stat": unable to open file: open /var/data/postgres/pg_stat_tmp/db_16384.stat: permission denied, ignoring
12:11:13.381 [kopia/upload] unable to hash file "./postgres/pg_stat_tmp/db_16386.stat": unable to open file: open /var/data/postgres/pg_stat_tmp/db_16386.stat: permission denied, ignoring
12:11:13.381 [kopia/upload] unable to hash file "./postgres/pg_stat_tmp/db_16388.stat": unable to open file: open /var/data/postgres/pg_stat_tmp/db_16388.stat: permission denied, ignoring
12:11:13.381 [kopia/upload] unable to hash file "./postgres/pg_stat_tmp/db_16516.stat": unable to open file: open /var/data/postgres/pg_stat_tmp/db_16516.stat: permission denied, ignoring
12:11:13.381 [kopia/upload] unable to hash file "./postgres/pg_stat_tmp/db_17183.stat": unable to open file: open /var/data/postgres/pg_stat_tmp/db_17183.stat: permission denied, ignoring
12:11:13.381 [kopia/upload] unable to hash file "./postgres/pg_stat_tmp/db_18529.stat": unable to open file: open /var/data/postgres/pg_stat_tmp/db_18529.stat: permission denied, ignoring
12:11:13.381 [kopia/upload] unable to hash file "./postgres/pg_stat_tmp/db_24972.stat": unable to open file: open /var/data/postgres/pg_stat_tmp/db_24972.stat: permission denied, ignoring
12:11:13.381 [kopia/upload] unable to hash file "./postgres/pg_stat_tmp/global.stat": unable to open file: open /var/data/postgres/pg_stat_tmp/global.stat: permission denied, ignoring

For the automatic snapshot to be created by the server, an upload error occurs instead. From the output is appears that some chunks are uploaded, but no actual snapshot is created (kopia snapshot list shows no entries).

[...]
Jan 26 03:00:19 h2594255 kopia[19861]: 03:00:19.726 [kopia/cli] processing upload 'pb6a88a0c30adeb5cb815e98b7722bf01' 21.3 MB of 22.8 MB (93%)#033[0m
Jan 26 03:00:19 h2594255 kopia[19861]: 03:00:19.847 [kopia/cli] processing upload 'pb6a88a0c30adeb5cb815e98b7722bf01' 22.3 MB of 22.8 MB (97%)#033[0m
Jan 26 03:00:20 h2594255 kopia[19861]: #033[32m03:00:20.612 [kopia/cli] completed upload 'pb6a88a0c30adeb5cb815e98b7722bf01' 22.8 MB#033[0m
Jan 28 03:01:02 h2594255 kopia[19861]: #033[31m03:01:02.364 [kopia/server] upload error: unable to process directory "influxdb": unable to process directory "wal": unable to process directory "_internal":
 unable to process directory "monitor": unable to process directory "15": open /var/data/influxdb/wal/_internal/monitor/15: permission denied#033[0m
[...]

There appears to be a lot of unused or unreferenced objects:

$ kopia snapshot gc
12:18:31.362 [kopia/snapshot/gc] looking for active contents
12:18:31.363 [kopia/upload] processed(0/6) active 1
12:18:32.029 [kopia/snapshot/gc] looking for unreferenced contents
12:18:32.047 [kopia/snapshot/gc] found 123 unused contents (103.5 MiB bytes)
12:18:32.047 [kopia/snapshot/gc] found 31 unused contents that are too recent to delete (12.8 MiB bytes)
12:18:32.047 [kopia/snapshot/gc] found 4310 in-use contents (6.2 GiB bytes)
12:18:32.047 [kopia/snapshot/gc] found 3 in-use system-contents (3.5 KiB bytes)

Is there any chance to force kopia to create automatic snapshots even if some directories are not accessible?

runtime error: invalid memory address or nil pointer dereference

When creating a snapshot of a data directory today, kopia failed with a segmentation fault:

~$ kopia snapshot create /var/data/
11:59:20.142 [kopia/cli] processing upload 'n1f09d82dfeb1faa4f86080ef179983ae' 0 B of 173.8 KB (0%)
11:59:20.453 [kopia/cli] completed upload 'n1f09d82dfeb1faa4f86080ef179983ae' 173.8 KB
11:59:22.130 [kopia/cli] snapshotting user@host:/var/data
11:59:22.144 [kopia/cli] processing upload 'q7e4befa5a4b88243ddf983177d7eac6c' 0 B of 6 KB (0%)
11:59:23.056 [kopia/cli] completed upload 'q7e4befa5a4b88243ddf983177d7eac6c' 6 KB
11:59:23.056 [kopia/cli] processing upload 'n4b66f86549dd18b2511a542e074abe83' 0 B of 1.9 KB (0%)
11:59:23.489 [kopia/cli] completed upload 'n4b66f86549dd18b2511a542e074abe83' 1.9 KB
11:59:23.491 [kopia/cli] uploading user@host:/var/data using 1 previous manifests
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x87ffd7]

goroutine 96 [running]:
github.com/kopia/kopia/fs/localfs.entryFromChildFileInfo(0x0, 0x0, 0xc004644a74, 0x9, 0xf12700, 0xc0044ee5d0, 0x7, 0xc00009eb40)
        /home/travis/gopath/src/github.com/kopia/kopia/fs/localfs/local_fs.go:277 +0x37
github.com/kopia/kopia/fs/localfs.(*filesystemDirectory).Readdir.func2(0xc004644ab0, 0xc0000a3200, 0xc004644a74, 0x9, 0xc0000a3260)
        /home/travis/gopath/src/github.com/kopia/kopia/fs/localfs/local_fs.go:185 +0x1a8
created by github.com/kopia/kopia/fs/localfs.(*filesystemDirectory).Readdir
        /home/travis/gopath/src/github.com/kopia/kopia/fs/localfs/local_fs.go:175 +0x2ba

The directory in question is a data directory including persistent data from redis, influxdb, mariadb, postgres and some other containers. After investigating for a little bit, missing directory permissions (+x in particular) seem to be the cause for this crash.

sudo chmod -R u=rwx,go=rx /var/data fixes the error for now, but I think kopia should not crash at this point.

Create a PMC-only mailing list

We need a PMC-only mailing list for Code of Conduct violations and PMC discussions. The list should be private but have archival options enabled.

High memory usage seen with a large number of small files

I created a 5 test directories (on an NVME device) with 2M files each. 100 bytes of random data in each one of those files.

$ kopia repository create s3 --bucket=<test-bucket> --prefix=kopia/
$ time kopia snapshot create /mnt/ssd/smallfiles
...

real	13m36.700s
user	72m46.181s
sys	5m23.254s

I found the memory usage to be quite high. See pidstat below but I saw it grow to ~21GB on one of these runs.

$ pidstat -r -p 5049 1
Linux 4.15.0-1044-aws (ip-172-31-10-90) 	08/24/19 	_x86_64_	(16 CPU)

04:13:01      UID       PID  minflt/s  majflt/s     VSZ     RSS   %MEM  Command
04:13:02     1000      5049     36.00      0.00 16696284 15266768  47.87  kopia
...

Dataset Generation Script (not my own but found elsewhere):

!/usr/bin/env python3
import os
import sys
import argparse
from multiprocessing import Pool
from functools import partial


def create_files(args, i):
    path = os.path.join(args.path, 'my-dir-name-%d' % i)
    os.mkdir(path)
    for j in range(args.filenum):
        with open(os.path.join(path, 'my-random-file-%d.data' % j), 'wb') as f:
            f.write(os.urandom(args.filesize))


def main(argv):
    parser = argparse.ArgumentParser(description='Create directories and files for backup test')
    parser.add_argument('-p', '--path', type=str, required=True, help='Path to the directory')
    parser.add_argument('-d', '--dirnum', type=int, required=True, help='Number of subdirectories')
    parser.add_argument('-f', '--filenum', type=int, required=True, help='Number of files per directory')
    parser.add_argument('-s', '--filesize', type=int, required=True, help='File size')
    args = parser.parse_args()

    if os.path.exists(args.path):
        print('ERROR: destination directory %s already exists' % args.path, file=sys.stderr)
        return 1
    elif not os.path.exists(os.path.dirname(args.path)):
        print('ERROR: parent directory for destination %s does not exist' % args.path, file=sys.stderr)
        return 1

    os.mkdir(args.path)
    with Pool(min(args.dirnum, 8)) as p:
        p.map(partial(create_files, args), range(args.dirnum))


if __name__ == '__main__':
    sys.exit(main(sys.argv))

and I then ran

python3 files.py -d 5 -f 2000000 -s 100 -p /mnt/ssd/smallfiles

s3 with specific region

I'm trying to use s3 compatible object storage of scaleway.com.

$ kopia repository create s3 --access-key ... --secret-access-key ... --bucket test --endpoint=s3.fr-par.scw.cloud
kopia: error: unable to get repository storage: 
The authorization header is malformed; 
the region 'us-east-1' is wrong; expecting 'fr-par'

Is it possible to configure the region ?

edit: I have the same problem with aws if i'm not using us-east-1

Develop macOS desktop client

Write a desktop client for macOS that lives somewhere in the system notification area and allows starting/scheduling and browsing of snapshots using Kopia's server mode.

Using the parallel option with create snapshot doesn't seem to make a difference

I tried to upload 10 1GB files using snapshot create and I tried to use --parallel 10 to speed things up. However, it didn't seem to make that much of a difference. I am pretty sure I am not memory, CPU, or network bottlencked for this experiment.

time kopia snapshot create /mnt/ssd/bigfiles
...

real	5m53.927s
user	4m54.245s
sys	0m13.971s
time kopia snapshot create --parallel 10 /mnt/ssd/bigfiles
...

real	5m52.159s
user	5m46.167s
sys	0m13.246s

Am I misunderstanding what --parallel is supposed to do?

WebDav: ReadDir /n0a/: 429 Too Many Requests - PROPFIND /n0a

There is a web storage service, Yandex.Disk with some free tier, that can be used for debugging. I try to connect to existing repo, I use it with davfs2 fine like '/mnt/yandex/news/kopia`. But when I try connect directly I see this:

$  kopia repository connect webdav --url=https://webdav.yandex.ru/news/kopia --webdav-username=<secret> --webdav-password=<secret> -p <secret> --log-level=debug
14:06:20.118 [kopia] log file time cut-off: 0001-01-01 00:00:00 +0000 UTC max count: 1000
14:06:20.615 [kopia/repo] Creating cache directory '/root/.cache/kopia/c19a931bfbab8846' with max size 5242880000
14:06:20.880 [kopia/content] finished sweeping directory in 48.11µs and retained 0/5242880000 bytes (0 %)
14:06:20.880 [kopia/content] finished sweeping directory in 28.301µs and retained 0/5242880000 bytes (0 %)
14:06:20.880 [kopia/content] CompactIndexes({MinSmallBlobs:20 MaxSmallBlobs:64 AllIndexes:false SkipDeletedOlderThan:0s})
14:06:25.224 [kopia/content] found 0 index blobs from source
14:06:25.224 [kopia/repo] failed to open repository: unable to open content manager: error initializing content manager: error loading indexes: ReadDir /n0a/: 429 Too Many Requests - PROPFIND /n0a/
kopia: error: unable to open content manager: error initializing content manager: error loading indexes: ReadDir /n0a/: 429 Too Many Requests - PROPFIND /n0a/, try --help

Add support for parallel object storage PUTs

For any relatively high-bandwidth network and a repository that is object storage based (e.g., in AWS S3), kopia can be a lot faster with parallel object puts. In my testing, kopia was not limited by either CPU or disk bandwidth and other benchmarking tools could easily get to 5X the throughput (similar object sizes) to the same bucket.

Investigate memory usage during snapshot of 100GB+ tree

Follow up to #93

The setup is:

  • 100K files (single directory)
  • Each file is 1MB.

Then create a snapshot of this directory.

  • Multi-GB RSS (1-8GB) was observed in a few initial runs.
  • Running with memory profiling enabled results in RSS ~ 0.5GB.

Implement automatic maintenance at repository level

  • drop deleted contents older than 1 month
  • rewrite all contents in short packs (both p and q)
  • garbage-collect all blobs older than 1 day that are not referenced
  • rewrite all contents in packs with 'holes'

Allow specifying the hostname during snapshot creation

It should be possible to specify/override the OS hostname during snapshot creation.

Scenarios where this is useful or even required include:

  • Cloud-native (containerized) platforms, such as Kubernetes, where the hostname is not stable, and may change across different executions of a Pod / container associated with the same "logical volume" (PVC/PV in k8s terms)
  • Mobile clients, such as laptops, that may get their hostname assigned via DHCP and thus the hostname may change depending on configuration, network connection and so on.

This could be implemented via an optional flag to the snapshot create command.

Allow for an insecure no-verify SSL mode

We need support in Kopia to disable SSL verification for self-signed certificates. This is insecure but very useful in reducing friction for test/dev scenarios, for testing with in-house object stores, and for when there is an internal CA being used but correct getting the root CA might be complex for users.

Add HTML-based UI for`kopia server`

kopia server can optionally serve additional HTML files in addition to serving JSON API.

It would be great to have some HTML-based UI for browsing and triggering snapshots & policies, so that people can go http://localhost:51515.

We can then develop very thin Windows/Mac clients (in Swift/C#) that run in the background in the system tray and use native OS browser (WebKit, MSHTML) to quickly present the UI.

We can use Angular (https://angular.io/start) to build a modern single-page HTML5 app.

Compatibility tests

Starting with v0.3 Kopia will offer compatibility guarantee, which needs to be tested.

  • Each released version of Kopia will a suport reading some number of previous and current repository formats while writing using the current format (also possibly previous format).
  • Data written using repository format N and kopia version V can be read using the any previous v<V that supports reading format N

We should write a test, that will run for each release of Kopia:

  1. Create filesystem repositories using all possible combinations of crypto and hashes
  2. Create few interesting snapshots (perhaps of the source code itself)
  3. Store the entire repo in a zip file in some long-term storage bucket (in GCS)
  4. Load repositories from previously written zip files and verify that all snansphots can be read using the current version.
  5. Use previous versions of Kopia that should be able to read the repository created using current version and verify this is the case

We can package each version of Kopia in docker for long-term storage and compatibility.

Compression

This backup tools looks a lot like borg and restic. Borg cannot backup directly to object storage and restic doesn't allow compression.
Will kopia allow compression to become the ultimate backup tools ?

Thanks

Snapshot verify fails for "invalid object length" - off by 32

If I snapshot a directory and run snapshot verify --all-sources, I'm seeing the command fail when the expected size is 32 B less than the actual value.

To reproduce:

mkdir testdir
cat >> ./testdir/testfile.txt <<EOL
some test file
EOL
./kopia repo create filesystem --path ./testrepo -p test
./kopia snapshot create ./testdir
./kopia snapshot verify --all-sources

Example output:

redgoat@DESKTOP-DOLV2TM:~/repo/kopia$ ./kopia snapshot verify --all-sources
Found 1 objects, verifying 1, completed 0 objects.
06:17:49.653 [kopia/cli] failed on redgoat@desktop-dolv2tm:/home/redgoat/repo/kopia/testdir@2019-12-11 06:17:47 PST/testfile.txt: invalid object length "082493011c49fb96d689d32cd18819b6", 47, expected 15
kopia: error: encountered 1 errors, try --help

Tested with larger directory structures as well. Output is always off by 32.

Kopia commit tested: 39a2cad

testing compression

I'm trying to test compression but i don't see any compression.

$ kopia repository create filesystem --path /ocean/kz
$ kopia policy set --compression=gzip-best-compression /ocean/kz
$ kopia policy show /ocean/kz
Policy for wilk@thinkpad:/ocean/kz:
...
Compression:
  Compressor: "gzip-best-compression" (defined for this target)
  Compress files regardless of extensions.
  Compress files of all sizes.

$ kopia snapshot ~/projets/flibuste

The repository has exactly the same size without setting compression (580M for 617M of source, 259M with targz).
Did I missed some configuration ?

I also tried setting compression with --global

edit: I build kopia with today clone

DESIGN: Implement auto-update mechanism or update notification mechanism

This requires discussion to make sure it's not privacy-invasive.

Options in no particular order:

a) Kopia (official builds only) automatically fetches latest github release info every N days and if newer version is available, displays a message to the user prompting to manually upgrade.

b) Every N days, Kopia checks for updates and downloads new release automatically, and next time it runs, it will pick up the later build.

c) Every N days, Kopia only reminds the user to check for updates (manually) but does not do anything automatically.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.