Comments (25)
Maybe instead of splitting the files into a lot of small chunks, split them in half, or into two random pieces based on the IV or encryption key etc.
from securefs.
Are there plans to also support this in the full format? Yes, for the lite format the padding is certainly an advantage and may provide metadata hiding close to full format - however full format + padding will be practically unbreakable by metadata attacks.
Like, to somewhat effectively hide CD track lengths by up to 5 seconds (which should be enough to no longer be able to identify directories by querying CDDB a few times), one needs a padding of up to 882kB per file.
However, in the full format, this attack is infeasible to begin with; however a different attack is very feasible: e.g. identifying large ISO files by their exact length alone. Having a padding scheme there too would be ideal.
Of course, with this change, one can already have that by layering securefs full on top of securefs lite; that's however quite a performance penalty.
from securefs.
Implemented at 9f42ecb.
from securefs.
What kind of obfuscation? Pad to integer multiples of block size (default 4KiB)? Append a random sized garbage data?
from securefs.
how about storing them in chunks?
from securefs.
Storing in chunks is padding. Unless you mean something different.
from securefs.
Yes, i think that's it. It would take more space with lots of small files, but would be great for situations with a few big files you want to hide. Maybe a command-line option to enable it only where you need it.
from securefs.
The feature you ask is small and useful, but I have been adding optional features a lot of times leading to increasingly unmaintainable codebase. I need to refactor it before I can add this feature. But that won't happen until my internship ends late September.
from securefs.
I would be VERY careful with doing this. It is the cause of many scaling issues with CryFS, which uses a fixed 32Kb file chunks (padded if smaller) for storing the files.
With small files this means you get a a LOT of extra padding, so a file system with lots of small files becomes extremely large in disk usage. With storing large files (like videos) you end up with a large number of files, which even with the 'psuedo-directory' structure, causes a lot of overhead, and depending on the file system, file search times.
Your B-tree directory structure may also have file search time problems, on large encrypted file systems, especially if you just place all the files into the one 'flat' directory. However it does seem to be a better compromise than going all out.
If you were to implement file size obfuscation I would suggest you look at something that would let you use variable chunk sizes. That is use chunks that can be larger or smaller as needed for what is being stored. EG: small chunks for small files, larger chunks for larger files, perhaps merging so 4 consecutive 16Kb chucks become a 64Kb chunk. Or perhaps if you allocate larger files (1Mb?), and pack multiple smaller encrypted files into the one larger storage file (though an occasional cleanup of fragmentation, and removal of empty storage files may be needed)
from securefs.
I have not found a solution (except CryFS) how to hide file size when the storage provider is not trusted. Think of it in a legal case. The Investigation has found evidence in a file that needs to be connected to someone. Prosecutor is doing search and seizure at a bunch of location trying to get lucky. One person have everything encrypted with securefs but has that file and prosecutor finds this repository. What I have come to understand, this leads to that a security expert can be sitting in court saying "yes, this securefs encrypted file is the same as the not encrypted file prosecutor has given to the court." I then need to answer court why am having this file in my possession without anyone having to break encryption. For me, this is a failing in privacy that is as important as encryption.
from securefs.
@blinkiz But is the file size evidence for anything other than the file size itself? Because its length is 234, you are guilty?
from securefs.
@netheril96 Yes. I do think that the court will see it as strong evidence if a security expert says that the encrypted file will be (for example) 22909943808 bytes in length not encrypted. Same file size in length that the investigator has found not encrypted.
If the encryption tools I use can make the security expert say to the court that the file is probably the same as evidence as shown that would in my option be a significant privacy advantage. Best is of course not being able to see file size at all, but I have understood that it is difficult to implement and still keep good performance.
from securefs.
Deathtrip, that is brilliant! If there is not possible to identify which individual files belong together without breaking encryption, privacy issue is solved!
it sounds so easy.. netheril96, is this simple splitting possible and still keep good performance?
from securefs.
Is the splitting simple? securefs
cannot know in advance how large a file will be, and each file may grow dynamically at any time. Splitting early may lead to fragmentation, and splitting late may result in no obfuscation at all.
from securefs.
Maybe you could take some cues from how cryfs handles such things. Also if you split files anyway, i dont think fragmentation is a huge problem.
from securefs.
cryfs just splits files into same size chunks.
from securefs.
What i meant was not how cryfs splits files, but how it handles changing file sizes.
from securefs.
It seems to me the only way of truly dealing with meta-data (including filesizes) is to implement a file system that uses plain files as 'chunks' of encrypted storage. The 'chunk' files should be large, but perhaps like modern file systems allow multiple small files to fit inside a single 'chunk'.
The whole package would in many cases be extremely similar to modern file systems, just using plain files as storage, instead of blocks of disk space, so that it can stored in networked storage systems.
from securefs.
I will implement a padding based scheme when I find the time. That is, the underlying file size will always be a multiple of block size, hiding some of the size information from onlookers.
The split and chunk designs are better in its outcome. However, to properly design and implement them without drastic performance hit is a daunting task. I am not willing to undertake them now.
from securefs.
FWIW, though CryFS has been mentioned in various lights so far on this issue, I wanted to set a few things straight:
- CryFS allows for multiple chunk sizes. The default is 32K, but I'm using 4M
- In large chunk sizes, performance is quite good for large repositories. When I started this repo of data I copied 32GB into it. Over time a lot of data has been added and it is now 94.7GB and growing. I'm still able to maintain good read and write speeds.
Another option for obfuscation is deduplication. This solves a repository size issue and obfuscates everything. Borgbackup can be looked at for an example. It's performance issues are related to the fact that it keeps multiple versions of data, which securefs doesn't have to do. If reading/writing only a handful of files, the approach Borgbackup uses would be decent. They are still working on multi-threading, but even single threaded performance would be more than enough to watch a video or listen to (or record) audio on any modern system (last 3 years) with good throughput (MB/s not IOPs) to storage. If one uses hdparm to test random read and write, then devides that number by 10, you should get your worst case read/write. HD (Blu-Ray 1080p24) needs up to 8MB/s. Most multimedia applications don't need more than 40MB/s which is more than the worst case for spinning disks, but still very achievable in typical cases.
When restoring data from Borgbackup, typical throughput is around 70MB/s for a file with multiple versions and in excess of 110MB/s for files without additional versions, so it's worth a look.
from securefs.
Another idea to obfuscate file sizes could be e.g. storing the length in the meta file and simply appending a random amount of random bytes to the end (ideally configurable). Then if you store your mp3 collection, 0..128 kB of random padding may be sufficient to prevent figuring out which CDs you ripped using cddb, even in format 4.
from securefs.
What if the entire directory is stored into something like encrypted equivalent of level-DB .SST files. The files sizes will be only available when decrypted.
from securefs.
Implemented in 2e6b099.
Run securefs create data-dir --max-padding 4096
to create a padded repo. In this case, all files will have crypto random number of padding in the range of [0, 4096].
It has a lot of performance impact due to how frequently FUSE queries file sizes.
Only available in the lite format.
from securefs.
Nicely done. Thank you for building securefs 🙂
from securefs.
Also using this now - thanks! Appears to work well. Not noticing any performance degradation when used with my flash drives, BTW (stating files is slow either way).
from securefs.
Related Issues (20)
- Issue while compiling on raspberry pi 4b or cross-compile HOT 1
- How to deal with long file names HOT 2
- 希望能帮助完成一些工作 HOT 4
- brew install failing on MBP M2 Apple Silicon HOT 2
- Add the option to NOT encrypt filenames HOT 2
- Failing to compile on FreeBSD 13.2 HOT 23
- CryptoppConfig.cmake not found HOT 12
- Use with SFTP HOT 15
- I Forgot the password HOT 3
- followed build instructions, still getting "--vcpkg_root must point to a directory." HOT 3
- Crash. libc++abi: terminating due to uncaught exception... HOT 3
- How to build on Windows? HOT 16
- Destination Path Too Long HOT 5
- Has securefs been proven or analyzed for security? HOT 2
- 有关format 3存储时间戳的作用问题 HOT 2
- Consider not using the string "securefs" in metafiles HOT 3
- Install on Windows HOT 7
- mount in daemon mode like gocryptfs HOT 1
- Perf issue on external usb-c ssd HOT 12
- failed to fully extract large tar file HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from securefs.