Code Monkey home page Code Monkey logo

Comments (6)

zelon88 avatar zelon88 commented on May 20, 2024

Sounds fun! I'll give it a shot.

from hrconvert2.

zelon88 avatar zelon88 commented on May 20, 2024

Ok, I have an update on this.

I have scoured Github for POCs of this being done... recently. Most of the results seem to be from several years ago. The most promising method seems to be;
http://pdfcrack.sourceforge.net/

You can install it with sudo apt-get install pdfcrack.
It uses a CPU based brute force method to try and crack the USER password. One fault I noticed is that it takes an insanely long time to guess a password. I tested this with a PDF using a password of 123456 and a wordlist of 123456 and it was fast. I think if we were to implement a feature like this we would need to do some heuristics beforehand. Either try using only numbers, or wordlists of common passwords before either moving on to a full brute force attempt or simply giving up. One option would be to start the scan and then instruct the user to come back later to check on the status. Give them a unique 32 digit code they can enter to check the status of their file later, and then defer automatic file deletion until they come back (or a min threshold has elapsed). Either that or the user keeps the page open and we keep refreshing the status.

However this opens the server up to a potential DDOS attack. This eats up a ton of CPU and might realistically still never find a password. One user could keep submitting these requests until the server has no more resources left. It looks like there's no way to set execution cap in pdfcrack, which would buy us a little more time. Or create a queue with a limited number of workers. That means we would have to cap execution for each request at some point even if we haven't found the password.

Ultimately this is a TON of programming and debugging for a feature that is ill-placed in HRConvert2. If it were a fast process that the user didn't have to wait for then I would say lets go for it, but there is no guarantee the operation will succeed (infact most requests would probably fail or time out) and there's no good core mechanism for making the user wait. HRConvert2 was meant to create temporary scratch space for anonymized users. This feature would be better suited to HRCloud3, which is in process. In fact, the recent refactor of HRConvert2 will probably end up serving as the basis for the HRCloud3 cloudCore. When this happens I will experiment with PDF cracking some more, because in that environment it makes more sense to ask the user to wait for the operation to complete.

Then I tried;
https://github.com/machine1337/pdfcrack

This was really flashy and promising looking. It was obviously made to work on Kalli linux, as the pdfid and pdf-parser packages are on Kali and this script tries to install them using aptitude. No worries, we just remove the dependency installer code, download the pdfid.py and pdf-parser.py scripts from https://blog.didierstevens.com/programs/pdf-tools/ and hard code the paths. Now we get to see that this is just another CPU based brute force approach. This approach actually just supplies it's own wordlist to regular PDFCrack. It can also generate passwords with Hashcat and PDF2John (both of which utilize the GPU) but then it just supplies those back to PDFCrack to see if they are valid. It basically just combines several of the methods one would use to brute force a PDF password into one script. I like the methodology but if this is just calling a bunch of dependencies we can do the job better in PHP and cut out the middle man. At this point I stopped testing this program because I know what the results will be. On it's best day this program will be able to crack a PDF password somewhat faster than pure PDFCrack, and probably comparable to whatever heuristics we apply using PHP.

https://github.com/philpem/cuda_pdfcrack
I reviewed the code for cuda_pdfcrack first and found that it obviously requires nVidia CUDA support with a full graphics stack on the server. This would be problematic to add to HRConvert2 as a dependency because many home servers do not have the hardware support for something like this. Even I am developing this stuff in virtual machines where there is no VGA passthrough. But I kept going a little bit because I was curious. Usage is somewhat hacky. You need to first run vanilla PDFCrack to get the password hashes. Then you submit those to cuda-pdfcrack and it uses the GPU to brute force the password. If we used this method we would get that information using the pdfid and pdf-parser tools instead of PDFCrack.

Some other notable mentions that specifically mention that they cannot bypass a Document Open password;
https://github.com/SeppPenner/PdfPasswordRemover
https://github.com/jakepetroules/littlebirdy

An important note about ALL of these tools is their age. These tools all suppose a 4 digit minimum password length, which was changed to 6 digits in Acrobat DC version 21.005.20048. This seems to have been a client side change, meaning it's impossible to tell by version number which files have a 4 digit password and which files have a 6 digit password length.

In conclusion, it is possible to crack the passwords in a PDF, although extremely time and resource consuming. The duration of a PDF cracking operation would require me to develop at least 500 lines of additional code just to perform heuristics on the PDF, then another 500 to try and crack it using whatever hardware means are available. Then another 500 lines of code to handle the user waiting for the operation to complete or leave and come back. Even then the success rate might be 15% in cases where the server has no GPU and maybe... MAYBE 50% on servers that have GPU capabilities. I suspect the only passwords we would ever discover would be generic ones. Zip codes, common words, ect. If the PDF is using modern 128 or 256 bit AES encryption you can just forget about opening it.

This research was very valuable to the HonestRepair product line, even if it doesn't make a good fit for HRConvert2. At the moment. Thank you for your suggestion!

from hrconvert2.

zelon88 avatar zelon88 commented on May 20, 2024

More reading...
https://blog.didierstevens.com/2017/12/26/cracking-encrypted-pdfs-part-1/
https://blog.didierstevens.com/2017/12/27/cracking-encrypted-pdfs-part-2/
https://blog.didierstevens.com/2017/12/28/cracking-encrypted-pdfs-part-3/
https://blog.didierstevens.com/2017/12/29/cracking-encrypted-pdfs-conclusion/

from hrconvert2.

Oredna avatar Oredna commented on May 20, 2024

@zelon88 i dont know how they do it, but https://smallpdf.com/unlock-pdf unlock the file much faster. Also if you have a locked pdf and open it in Firefox you can bypass the printing restriction and then just save it as pdf - is this something that can be replicated?

from hrconvert2.

zelon88 avatar zelon88 commented on May 20, 2024

Thanks for supporting the project with your suggestion.

I will give this a test and report back.

from hrconvert2.

Oredna avatar Oredna commented on May 20, 2024

Any update? Can we expect it to be included in the 3.2 version? Any way we can support the development - through money or other means

from hrconvert2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.