Code Monkey home page Code Monkey logo

Comments (3)

eikek avatar eikek commented on June 13, 2024

Hi, I'm not quite sure tbh. For postgres there is already a virus check possibility via snakeoil (https://eikek.github.io/sharry/doc/configure#example-for-snakeoil). Running an external command on each uploaded file sounds a bit expensive? Especially when the file is somewhere else (like S3 or the database). It would involve transferring it to the machine running sharry no?

from sharry.

ShaunMaher avatar ShaunMaher commented on June 13, 2024

Thanks for taking the time to consider my suggestion.

Yes, I'm aware that the snakeoil extension for postgres exists and is supported but it doesn't provide protection for those using the filesystem or S3 backends.

I haven't looked at all the Sharry code, and I don't know the Scala language, but is there not already code that generates a checksum of the uploaded file? I assume it does this by streaming each upload chunk, in order, until it has all chunks? If so, could we "tee" that same stream to an external command? It would be up to the external command run whatever process on the data stream, and report back an exit code (0 = accept the upload, otherwise deny/delete the upload) and a text string to return to the user. For the CalmAV use case, the stream could be scanned, without needing to write all the chunks to disk (I think).

My other thought, which is far less efficient, would be to just provide an "upload complete" hook with some environment variables or command line arguments (file_id, a URL to download the file, a one time token to download the file without auth, etc.) and it would be up to the external process to fetch the file from Sharry, scan it and then issue an API call back to Sharry that tells it to delete the upload if it's bad. This does mean double handling the file, doesn't provide real time feedback to the uploader, etc. so it's not ideal.

Thanks again.
Shaun.

from sharry.

eikek avatar eikek commented on June 13, 2024

Thanks for the explanation! Yes, so it kind of works as you described. But there is one detail that makes it a bit difficult: the files are uploaded in chunks, but there is no guarantee that they come in correct order (only how many chunks exists). The chunk "10" could arrive first, then "8" etc. But you are right, sharry creates a checksum. This is optional though, because it may be very expensive on large files. The checksum is created in the background once the last chunk is received. The user doesn't wait for it when uploading.

I like the idea, but I'm not sure how to nicely implement this. Would it make sense to hand each chunk to the command? Not sure if that is enough for virus scanning, but I think that is what happens when using the snakeoil extension. Then it might be too tedious to write scripts that deal with the complete file, should someone wants to do different things.

The "upload complete hook" could be done, perhaps, and sharry could wait for the result to take some action. But this then could be very inefficient, if the file needs to be transferred (especially for very large files) and this could lead to timeouts. OTOH that is the problem of the script author :).

Another direction could be to think about using features of other systems: snakeoil is part of postgres and sharry only translates specific errors messages. Perhaps something similar can be done for S3? Then the virus scan needs to be done on the S3 side (why not :)). It is less generic, of course.

from sharry.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.