Code Monkey home page Code Monkey logo

pdfrip's Introduction

i make. i break. computer witchcraft at asterisk.so.


reach out:

mufeed [at] asterisk [dot] so

pgp fingerprint:

49B7 4F49 C33A 02A9 7536 257F 45BE E76A 9562 CB5E

pdfrip's People

Contributors

jorpic avatar lets-go-worker avatar limitedatonement avatar mkvo-pts avatar mufeedvh avatar zasekle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pdfrip's Issues

Add Regex expressions to Custom Queries

Expand the capabilities of the custom query builder in PDFRip by integrating full regular expression (regex) support. This enhancement will allow for more complex and versatile pattern matching capabilities in password format specification, making the tool more effective for advanced password cracking scenarios.

Proposed Enhancement
Integrate full regex support within the custom query builder, allowing users to specify intricate password patterns using the powerful syntax of regular expressions. This would enable:

Complex pattern matching with optional elements, alternations, and nested conditions.
Conditional expressions and assertions (lookahead, lookbehind) for dynamic pattern specification.
Enhanced flexibility with character classes and quantifiers to address a broader variety of password formats.

Example Use Cases:
Complex Date Formats: Match passwords that include full or abbreviated month names, optional separators, and variable year formats (e.g., 01-Jan-2020, 1 January 20).
--regex "([0-2]?[1-9]|3[01])(-| )?(Jan|Feb|Mar|...|Dec)(-| )?(19[8-9][0-9]|20[0-2][0-9]|202[0-4])"

Optional Prefixes and Postfixes: Handle passwords with optional beginning or ending phrases, with multiple format variations (e.g., prefix12345, 12345postfix, prefix12345postfix).
--regex "(prefix)?\d{5}(postfix)?"
Advanced Alphanumeric Combinations: Efficiently tackle passwords composed of a mix of letters, numbers, and special characters in unpredictable configurations.
--regex "[A-Za-z]{3}\d{2,4}[!@#\$]"

Split PDFRip into multiple crates

I think this would be useful in reference to #17 and #18 as well as to improve maintainability.

PDFRip is basically a big monolithic crate right now, which is fine for smaller software but It will probably not age well as the number of features (and therefore probably the number of dependencies we have) increase, potentially causing issues regarding different versions of crate dependencies being incompatible.

Additionally splitting it into multiple crates allows us to enforce a stricter dependency chain between our different parts.

E.g. We can ensure Clap (argument parsing) is only accessible from our main crate, guaranteeing engine.rs is unable to somehow depend on it.

I propose to split PDFRip into the following crates with the following responsibilities:

  • Main binary - Essentially becoming a presentation layer responsible for interactions with the user. i.e. Argument parsing and/or Api related things as well as selecting a logging frontend and, if possible with the proposed dependencies, the progressbar.
  • Engine crate, responsible for the business logic of cracking passwords. It should depend on our Producer and Cracker crates.
  • Producer crate. Defines the "Producer" trait and contains the different cracking methods, i.e. "custom-query", "default-query" and such.
  • Cracker crate, implements the PDFCracker struct. Allowing us to decouple the rest of the crate from depending on the PDF crate we're currently using for our decryption logic.

We can probably do all this inside this one repository by utilizing Cargo's workspace feature

The proposed structure can be represented in the following way:

image

The image was generated using https://structurizr.com/dsl with the code from the attached file.
structurizr.txt

Missing Documentation for Passwords with Letters

It looks like the modes of operatior are:

  • Provide a dictionary of all the passwords I want checked
  • Provide a number range to use as passwords to check (either using range or custom-query STRING{start-end})
  • Provide a year from which to create eight-digit numbers as passwords to check

Other operational modes are not listed. I would like to check all possible passwords with characters from a provided character set. I tried

for i in {a..z} {A..Z} {0..9}; do
    echo "$i" >> dictionary;
done;
./pdfrip -n 15 --filename ~/my.pdf wordlist dictionary;

but that only checked the 62 passwords provided in my word list. I'm guessing custom-query may be able to do what I want, but I don't see any documentation on it.

It would be nice if the default experience was just like pdfcrack, but utilizing multiple cores. At least documentation explaining how to do such a thing would be greatly appreciated.

Allow dd.mm.yyyy date format and others

It would be cool be to be able to specify the date format.

This small patch changes it to dd.mm.yyyy but of course it needs to be made configurable

diff --git a/crates/producer/src/dates.rs b/crates/producer/src/dates.rs
index ed11325..2ce0ec7 100644
--- a/crates/producer/src/dates.rs
+++ b/crates/producer/src/dates.rs
@@ -25,7 +25,7 @@ fn pregenerate_dates() -> Vec<String> {
                 month.to_string()
             };
 
-            results.push(format!("{}{}", date, month))
+            results.push(format!("{}.{}", date, month))
         }
     }
 
@@ -63,7 +63,7 @@ impl Producer for DateProducer {
 
             let next = self.inner.next().unwrap();
 
-            let password = format!("{:04}{:04}", next, self.current).into_bytes();
+            let password = format!("{:04}.{:04}", next, self.current).into_bytes();
             debug!(
                 "Sending {} from DateProducer",
                 String::from_utf8_lossy(&password)

wanted new feature

need musk attack where we can define position of digit numeric special charater uppercase and lower case

Cannot run custom-query without any integer ranges.

Found this while working on tests for #23 .

thread 'main' panicked at crates/producer/src/custom_query.rs:54:70:
called `Result::unwrap()` on an `Err` value: ParseIntError { kind: Empty }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The prompt I used was
cargo run -- -f examples/ALICE_BANK_STATEMENT.pdf custom-query "I'M BATMAN"

Removing the single-tick (') still caused the same error.

Runnning
cargo run -- -f examples/ALICE_BANK_STATEMENT.pdf custom-query "I'M BATMAN{1337}"
produces

thread 'main' panicked at crates/producer/src/custom_query.rs:56:33:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I suggest either resolving this or ensure we report a cleaner error message.

Feature request, password contains word

Hi,
I forgot my password, I am sure that I used one of 5 words, maybe a comination of it and it is 8-10 chars. As far as I see it, there is no way to bruteforce but with at least one word included.

I would love something like this:
pdfrip -f encrypted.pdf contains words.txt --max-length 10 --min-length 8

Where it fills the words from words.txt into the string at each possible position and fills the rest with arbitrary chars. Optionally it combines multiple words of the file as well.

Parse command for Year Range in ddmmyyyy format

Let's say I know the password is in DDMMYYYY format but I don't know the year.
So how do I parse the command for the year range 1900 to 2000 so that pdfrip checks all dates between 1900 & 2000 in DDMMYYYY format?

Package Versions Wrong

I cloned and run cargo build and got the following:

     Updating crates.io index
error: failed to select a version for the requirement `clap = "^4.3.21"`
candidate versions found which didn't match: 3.2.25, 3.2.24, 3.2.23, ...
location searched: crates.io index
required by package `pdfrip v2.0.0 (/home/lawsa/source/pdfrip)`

Using Numbers in custom query

Hello, it seems the custom query is only allowing a string value to be fixed. Like if i want to search a query like...
-q {0-990}1234 then the program searches 0-9901234 and not the hundred possibilities. Can you please take a look. Thanks

Password of '123B' was not cracked by default query.

Hi - I have a PDF secured with the password '123B'. Using the query:
pdfrip -f file.pdf -n 12 default-query --max-length 4 --min-length 4

I'd expect the tool to find the password no problem - but it doesn't. If I reverse the password to B123, it finds it. I assumed the default query would run all permutations of a-zA-Z0-9 but it seems not?

Thanks!

performance across PDF versions

What's the expected performance across PDF versions / types? (Edit: maybe have a table added to the README?)

For comparison, GPU performance under hashcat on a single GTX 1080, across supported PDF versions, is as follows:

$ for hashtype in 10400 10420 10500 25400 10600 10700; do \
    hashcat -b -w 4 -O -m $hashtype --quiet; done

-------------------------------------------------
* Hash-Mode 10400 (PDF 1.1 - 1.3 (Acrobat 2 - 4))
-------------------------------------------------

Speed.#1.........:   414.1 MH/s (403.66ms) @ Accel:1024 Loops:256 Thr:32 Vec:1

--------------------------------------------------------------
* Hash-Mode 10420 (PDF 1.1 - 1.3 (Acrobat 2 - 4), collider #2)
--------------------------------------------------------------

Speed.#1.........:  6510.0 MH/s (102.36ms) @ Accel:1024 Loops:1024 Thr:32 Vec:1

------------------------------------------------------------------
* Hash-Mode 10500 (PDF 1.4 - 1.6 (Acrobat 5 - 8)) [Iterations: 70]
------------------------------------------------------------------

Speed.#1.........: 14044.6 kH/s (30.67ms) @ Accel:1024 Loops:70 Thr:32 Vec:1

----------------------------------------------------------------------------------------
* Hash-Mode 25400 (PDF 1.4 - 1.6 (Acrobat 5 - 8) - user and owner pass) [Iterations: 70]
----------------------------------------------------------------------------------------

Speed.#1.........: 14294.5 kH/s (30.56ms) @ Accel:1024 Loops:70 Thr:32 Vec:1

-----------------------------------------------
* Hash-Mode 10600 (PDF 1.7 Level 3 (Acrobat 9))
-----------------------------------------------

Speed.#1.........:  3128.5 MH/s (423.99ms) @ Accel:128 Loops:512 Thr:1024 Vec:1

----------------------------------------------------------------------
* Hash-Mode 10700 (PDF 1.7 Level 8 (Acrobat 10 - 11)) [Iterations: 64]
----------------------------------------------------------------------

Speed.#1.........:    34875 H/s (586.47ms) @ Accel:32 Loops:8 Thr:256 Vec:1

Race condition in src/core/engine.rs

Developing #13 revealed there is a race condition in the engine where the engine exits before receiving a correct password from a producer, despite it eventually sending one.

Running
env LOG_LEVEL=info cargo run --release -- --filename examples/datetime-15012000.pdf date 1900 2000
does not successfully crack the PDF, while
env LOG_LEVEL=info cargo run --release -- --filename examples/datetime-15012000.pdf date 1900 2001
succeeds despite this generator sending passwords inclusively. I suspect there is a bug somewhere else that will need to be investigated

Adding debug logging to the DateProducer next() function shows the correct password is being sent but not recognized.

2023-12-03T21:34:13.151Z DEBUG pdfrip::core::production::dates > Sending 15012000 from DateProducer

This means there is a race condition somewhere. Likely in engine.rs.

I imagine the bruteforcer should be simplified since it is currently complex, clunky, inefficent and annoying.

Crash resulting from malloc error

$ pdfrip -f pw.pdf -n 128 default-query --min-length 7 --max-length 7
           .___ _____       .__        
______   __| _// ____\______|__|_____  
\____ \ / __ |\   __\\_  __ \  \____ \ 
|  |_> > /_/ | |  |   |  | \/  |  |_> >
|   __/\____ | |__|   |__|  |__|   __/ 
|__|        \/                 |__|    2.0.1

 2024-01-17T01:29:12.322Z INFO  pdfrip::core::engine > Starting password cracking job...
⠚ [2d 15:01:46] [███████████████░░░░░░░░░░░░░░░░░░░░░░░░░] 30932726000/78364164096 39% 136332/s ETA: 4d
pdfrip(21967,0x16c9a7000) malloc: *** error for object 0x600001910f50: pointer being freed was not allocated
pdfrip(21967,0x16c9a7000) malloc: *** set a breakpoint in malloc_error_break to debug
pdfrip(21967,0x1725c3000) malloc: Heap corruption detected, free list is damaged at 0x60000190c040
*** Incorrect guard value: 10107014426694143842
pdfrip(21967,0x1725c3000) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

Rewrite PDFRip to take an Async approach

PDFRip currently performs it's tasks with traditional threads causing certain logic to become more cumbersome compared to how I think it could be implemented in an async environment.
This is supported by how engine.rs has seen bugs such as #14 suggesting it's a tad too complicated.

I propose reimplementing PDFRip to be async instead by utilizing the Tokio ecosystem.

The benefits are:

  • We have access to Tokio's async runtime, potentially improving performance.
  • We have access to tokio-util's CancellationTokens and TaskTrackers as well as Tokio's Select! macro and ctrl-c signal handling.
    • This Could be useful when implementing #17.
  • We can remove our dependency on Crossbeam and utilize the channels in Tokio.

The current problems that I think need to be resolved are

  • We will probably need to make the Producer traits Async, which is supported in rust since 1.75. This is to allow cancelling production of passwords when implementing #17 .
  • We must lock our minimum rust version to 1.75 if we use async traits.

100%, ETA 0s, but it continues going higher

Trying to crack the password of a PDF file, and this is the current output:

username@apps:~$ pdfrip -f file.pdf default-query --max-length 4
           .___ _____       .__
______   __| _// ____\______|__|_____
\____ \ / __ |\   __\\_  __ \  \____ \
|  |_> > /_/ | |  |   |  | \/  |  |_> >
|   __/\____ | |__|   |__|  |__|   __/
|__|        \/                 |__|    2.0.1

 2024-08-06T23:22:45.945Z INFO  engine > Starting password cracking job...
⠈ [02:09:30] [████████████████████████████████████████] 175570000/78074896 100% 22594/s ETA: 0s

The "175570000" number keeps getting higher, so I'm not sure if it's stuck in some weird loop, or if it's actually checking more characters than what it reported that it's checking. Here is my OS:

root@apps:~# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

And here is my rust info:

root@apps:~# rustup --version
rustup 1.25.2 (17db695f1 2023-02-01)
info: This is the version for the rustup toolchain manager, not the rustc compiler.
info: The currently active `rustc` version is `rustc 1.68.2 (9eb3afe9e 2023-03-27)`

custom-query help

my pdf file has password in format XXXX0000 like 4 characters followed by 4 digits how to write a custom query for this?

Colors not working on Windows 10

Some colors works, others doesn't, here a screenshot of it:

image

I've tested both with powershell and cmd on Windows 10 Pro 21H1.

User name hidden due of privacy reasons.

can't crack the aes256 ones

I can't crack the pdf i created. and password is 123456.
it can only crack acrobat 6 and 7 versions. not X versions.

settings

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.