Comments (9)
thank you very much @fin-ger , that's an interesting test.
I run your test script and can see it stopped at 7th round but cannot reproduce the error you posted. That crash happened during writing file, not during opening the repo. From your error log, it looks like the error happened when reading super block, which is during open an existing repo. Could you share how you build the executable file on Alpine Linux? I tried built one on Ubuntu but it cannot run on Alpine.
Also, the published zbox v0.6.1 is quite old, the latest code on master branch has been refactored a lot and with many bugs fixed and performance improvement, could you test using the latest code instead? Just use below dependency line in your Cargo.toml:
zbox = { git = "https://github.com/zboxfs/zbox.git", features = ["storage-file"] }
Another tip is you can turn on zbox debug output by setting environment variable in filerun-test.exp
:
RUST_LOG=zbox=trace
Looking forward to seeing more result, thanks.
from zbox.
The error happens when running zbox-fail-test --file data check
in the previously forcefully stopped VM (run-check.exp)
The executable was automatically build by the travis-ci configuration. I am using the official alpine:edge docker container:
docker run --rm -v $(pwd):/volume alpine:edge /bin/sh -c 'cd /volume && apk add rust cargo libsodium-dev && export SODIUM_LIB_DIR=/usr/lib && export SODIUM_STATIC=true && cargo build --target x86_64-alpine-linux-musl'
The executable can be found in ./target/x86_64-alpine-linux-musl/debug/zbox-fail-test
.
I will create a new version of my test now which uses the latest master of zbox and the RUST_LOG
configuration.
The error is only happening when running the test inside a VM that gets forcefully stopped. The repository is afterwards (booting the VM again) checked against the previously generated data
file for differences (the check command).
from zbox.
I built a new version (0.4.0) that uses zbox from the current git master and added RUST_LOG=zbox=trace
to the run and check action of the test (run-test.exp, run-check.exp).
from zbox.
Thanks @fin-ger . What I found is it looks like QEMU didn't flush write data to its driver. After the test crashed on the 7th round, the repo folder is like this:
zbox:~# ls -l zbox-fail-test-repo
total 8
drwxr-xr-x 2 root root 4096 Apr 4 15:15 data
drwxr-xr-x 4 root root 4096 Apr 4 15:15 index
-rw-r--r-- 1 root root 0 Apr 4 15:15 super_blk.1
So you can see there is only one super block and it is empty. And the wal folder is not even created at all. Super block and wal must be guaranteed persistent to disk. The correct one should like this:
/vol # ls -l zbox-fail-test-repo
total 16
drwxr-xr-x 5 root root 160 Apr 4 11:26 data
drwxr-xr-x 5 root root 160 Apr 4 11:26 index
-rw-r--r-- 1 root root 8192 Apr 4 11:26 super_blk.0
-rw-r--r-- 1 root root 8192 Apr 4 11:26 super_blk.1
drwxr-xr-x 8 root root 256 Apr 4 11:26 wal
So that means QEMU lies to zbox the write() and flush() are completed but it is actually not. The possible reason could be the cache mode not specified when starting the QEMU VM. You can try add it in run-test.exp line 10
-drive file=qemu/zbox.img,format=raw,cache=directsync
Different cache mode explanation can be found here. I've tried some but still cannot see the files are guaranteed written to disk.
from zbox.
Okay, so if this is a qemu issue than it is not relevant for zbox. Have you tested failures of real machines with zbox?
from zbox.
Honestly, I haven't tested the real machine failure because I can't find a good reproducible way to do that test. But I did some random IO error fuzz tests by using a special faulty storage. That storage will generate IO error and the fuzzer will reopen repo randomly but deterministically.
Your test makes me think maybe I can use QEMU to do the fuzz crash test, just like this guy did for OS testing, but still need to figure out how to make persistent write in QEMU first.
from zbox.
I tested if a dd if=dd-test-src of=dd-test-dst status=progress
would also produce a dd-test-dst
of 0 bytes. And indeed, no matter which -drive ...,cache=something
I provided, it was always 0 bytes in size. Than I tried doing
dd if=dd-test-src of=dd-test-dst status=progress iflag=direct oflag=direct
without any cache
flag provided for qemu, and than the dd-test-dst
has roughly the size reported by the dd progress. I also tried oflag=dsync,nocache
and it also worked. I am currently trying to setup an expect script for the dd command. The VM needs coreutils
to run the above command as the provided dd by alpine does not support the iflag
and oflag
. I am also looking into comparing the dst
and src
file but did not come up with a good solution yet.
from zbox.
I have added run-dd-test.exp
and run-dd-check.exp
. The test writes a generated string file (can be diffed 😅) with dd and oflag=direct
to dd-test-dst
and the check looks if there are any lines in dd-test-dst
that are not in dd-test-src
. The expected result is only one additional line (the one not completed during the write) in the dd-test-dst
file. So it looks like with dd qemu is handling the I/O correctly or maybe just "better". I will look into the dd source code later!
from zbox.
QEMU file io looks so tricky, I might test the dd
using different images later on.
from zbox.
Related Issues (20)
- Absolute file URLs on Windows HOT 5
- Zbox storage needs destroy method
- RepoInfo.uri() should not mask secretive data
- Is zbox secure agains storage file tampering? HOT 3
- Confusing abort if omit zbox::init_env HOT 7
- zbox::Error is not accessible from within std::io::Error HOT 1
- Stress-testing multiple reads, writes and seeks produces an unreadable file (IO error: File not found)
- Problems when `file://` storage runs out of space HOT 42
- Panic when writing to `Mutex<zbox::File>` from multiple threads. HOT 1
- Panic when trying to write a file after interrupted previous attempt. HOT 1
- FR: `zbox::Repo::ongoing_transaction` HOT 5
- Is zbox::Repo::rename atomic? HOT 1
- Add dedup file option when creating repo
- RepoInfo lack of dedup_file entry
- Snapshots? HOT 2
- Failed open repo in file mode
- Question: reason not support symbolic link?
- storage-sqlite RepoOpener not creating path
- Future plans and continuation HOT 1
- Python support
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zbox.