Code Monkey home page Code Monkey logo

Comments (25)

deathtrip avatar deathtrip commented on May 30, 2024 1

Maybe instead of splitting the files into a lot of small chunks, split them in half, or into two random pieces based on the IV or encryption key etc.

from securefs.

divVerent avatar divVerent commented on May 30, 2024 1

Are there plans to also support this in the full format? Yes, for the lite format the padding is certainly an advantage and may provide metadata hiding close to full format - however full format + padding will be practically unbreakable by metadata attacks.

Like, to somewhat effectively hide CD track lengths by up to 5 seconds (which should be enough to no longer be able to identify directories by querying CDDB a few times), one needs a padding of up to 882kB per file.

However, in the full format, this attack is infeasible to begin with; however a different attack is very feasible: e.g. identifying large ISO files by their exact length alone. Having a padding scheme there too would be ideal.

Of course, with this change, one can already have that by layering securefs full on top of securefs lite; that's however quite a performance penalty.

from securefs.

netheril96 avatar netheril96 commented on May 30, 2024 1

Implemented at 9f42ecb.

from securefs.

netheril96 avatar netheril96 commented on May 30, 2024

What kind of obfuscation? Pad to integer multiples of block size (default 4KiB)? Append a random sized garbage data?

from securefs.

deathtrip avatar deathtrip commented on May 30, 2024

how about storing them in chunks?

from securefs.

netheril96 avatar netheril96 commented on May 30, 2024

Storing in chunks is padding. Unless you mean something different.

from securefs.

deathtrip avatar deathtrip commented on May 30, 2024

Yes, i think that's it. It would take more space with lots of small files, but would be great for situations with a few big files you want to hide. Maybe a command-line option to enable it only where you need it.

from securefs.

netheril96 avatar netheril96 commented on May 30, 2024

The feature you ask is small and useful, but I have been adding optional features a lot of times leading to increasingly unmaintainable codebase. I need to refactor it before I can add this feature. But that won't happen until my internship ends late September.

from securefs.

antofthy avatar antofthy commented on May 30, 2024

I would be VERY careful with doing this. It is the cause of many scaling issues with CryFS, which uses a fixed 32Kb file chunks (padded if smaller) for storing the files.

With small files this means you get a a LOT of extra padding, so a file system with lots of small files becomes extremely large in disk usage. With storing large files (like videos) you end up with a large number of files, which even with the 'psuedo-directory' structure, causes a lot of overhead, and depending on the file system, file search times.

Your B-tree directory structure may also have file search time problems, on large encrypted file systems, especially if you just place all the files into the one 'flat' directory. However it does seem to be a better compromise than going all out.

If you were to implement file size obfuscation I would suggest you look at something that would let you use variable chunk sizes. That is use chunks that can be larger or smaller as needed for what is being stored. EG: small chunks for small files, larger chunks for larger files, perhaps merging so 4 consecutive 16Kb chucks become a 64Kb chunk. Or perhaps if you allocate larger files (1Mb?), and pack multiple smaller encrypted files into the one larger storage file (though an occasional cleanup of fragmentation, and removal of empty storage files may be needed)

from securefs.

blinkiz avatar blinkiz commented on May 30, 2024

I have not found a solution (except CryFS) how to hide file size when the storage provider is not trusted. Think of it in a legal case. The Investigation has found evidence in a file that needs to be connected to someone. Prosecutor is doing search and seizure at a bunch of location trying to get lucky. One person have everything encrypted with securefs but has that file and prosecutor finds this repository. What I have come to understand, this leads to that a security expert can be sitting in court saying "yes, this securefs encrypted file is the same as the not encrypted file prosecutor has given to the court." I then need to answer court why am having this file in my possession without anyone having to break encryption. For me, this is a failing in privacy that is as important as encryption.

from securefs.

netheril96 avatar netheril96 commented on May 30, 2024

@blinkiz But is the file size evidence for anything other than the file size itself? Because its length is 234, you are guilty?

from securefs.

blinkiz avatar blinkiz commented on May 30, 2024

@netheril96 Yes. I do think that the court will see it as strong evidence if a security expert says that the encrypted file will be (for example) 22909943808 bytes in length not encrypted. Same file size in length that the investigator has found not encrypted.
If the encryption tools I use can make the security expert say to the court that the file is probably the same as evidence as shown that would in my option be a significant privacy advantage. Best is of course not being able to see file size at all, but I have understood that it is difficult to implement and still keep good performance.

from securefs.

blinkiz avatar blinkiz commented on May 30, 2024

Deathtrip, that is brilliant! If there is not possible to identify which individual files belong together without breaking encryption, privacy issue is solved!
it sounds so easy.. netheril96, is this simple splitting possible and still keep good performance?

from securefs.

netheril96 avatar netheril96 commented on May 30, 2024

Is the splitting simple? securefs cannot know in advance how large a file will be, and each file may grow dynamically at any time. Splitting early may lead to fragmentation, and splitting late may result in no obfuscation at all.

from securefs.

deathtrip avatar deathtrip commented on May 30, 2024

Maybe you could take some cues from how cryfs handles such things. Also if you split files anyway, i dont think fragmentation is a huge problem.

from securefs.

netheril96 avatar netheril96 commented on May 30, 2024

cryfs just splits files into same size chunks.

from securefs.

deathtrip avatar deathtrip commented on May 30, 2024

What i meant was not how cryfs splits files, but how it handles changing file sizes.

from securefs.

antofthy avatar antofthy commented on May 30, 2024

It seems to me the only way of truly dealing with meta-data (including filesizes) is to implement a file system that uses plain files as 'chunks' of encrypted storage. The 'chunk' files should be large, but perhaps like modern file systems allow multiple small files to fit inside a single 'chunk'.

The whole package would in many cases be extremely similar to modern file systems, just using plain files as storage, instead of blocks of disk space, so that it can stored in networked storage systems.

from securefs.

netheril96 avatar netheril96 commented on May 30, 2024

I will implement a padding based scheme when I find the time. That is, the underlying file size will always be a multiple of block size, hiding some of the size information from onlookers.

The split and chunk designs are better in its outcome. However, to properly design and implement them without drastic performance hit is a daunting task. I am not willing to undertake them now.

from securefs.

MagnetoOptical avatar MagnetoOptical commented on May 30, 2024

FWIW, though CryFS has been mentioned in various lights so far on this issue, I wanted to set a few things straight:

  1. CryFS allows for multiple chunk sizes. The default is 32K, but I'm using 4M
  2. In large chunk sizes, performance is quite good for large repositories. When I started this repo of data I copied 32GB into it. Over time a lot of data has been added and it is now 94.7GB and growing. I'm still able to maintain good read and write speeds.

Another option for obfuscation is deduplication. This solves a repository size issue and obfuscates everything. Borgbackup can be looked at for an example. It's performance issues are related to the fact that it keeps multiple versions of data, which securefs doesn't have to do. If reading/writing only a handful of files, the approach Borgbackup uses would be decent. They are still working on multi-threading, but even single threaded performance would be more than enough to watch a video or listen to (or record) audio on any modern system (last 3 years) with good throughput (MB/s not IOPs) to storage. If one uses hdparm to test random read and write, then devides that number by 10, you should get your worst case read/write. HD (Blu-Ray 1080p24) needs up to 8MB/s. Most multimedia applications don't need more than 40MB/s which is more than the worst case for spinning disks, but still very achievable in typical cases.
When restoring data from Borgbackup, typical throughput is around 70MB/s for a file with multiple versions and in excess of 110MB/s for files without additional versions, so it's worth a look.

from securefs.

divVerent avatar divVerent commented on May 30, 2024

Another idea to obfuscate file sizes could be e.g. storing the length in the meta file and simply appending a random amount of random bytes to the end (ideally configurable). Then if you store your mp3 collection, 0..128 kB of random padding may be sufficient to prevent figuring out which CDs you ripped using cddb, even in format 4.

from securefs.

AGenchev avatar AGenchev commented on May 30, 2024

What if the entire directory is stored into something like encrypted equivalent of level-DB .SST files. The files sizes will be only available when decrypted.

from securefs.

netheril96 avatar netheril96 commented on May 30, 2024

Implemented in 2e6b099.

Run securefs create data-dir --max-padding 4096 to create a padded repo. In this case, all files will have crypto random number of padding in the range of [0, 4096].

It has a lot of performance impact due to how frequently FUSE queries file sizes.

Only available in the lite format.

from securefs.

blinkiz avatar blinkiz commented on May 30, 2024

Nicely done. Thank you for building securefs 🙂

from securefs.

divVerent avatar divVerent commented on May 30, 2024

Also using this now - thanks! Appears to work well. Not noticing any performance degradation when used with my flash drives, BTW (stating files is slow either way).

from securefs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.