Code Monkey home page Code Monkey logo

emails-html-to-pdf's Introduction

Email to PDF to email

This script will check an imap folder for unread emails. Any unread email that does not have an attachment will be converted to a pdf and then emailed to the address you specify. The script is run at a configurable interval.

This was built to integrate with paperless-ng which works with pdf attachements. However, I get many documents that are html only, so I wanted them converted to pdf for storage in paperless-ng.

Usage

The following parameters are used (defaults in parentheses):

  • IMAP_URL
  • IMAP_USERNAME
  • IMAP_PASSWORD
  • SMTP_USERNAME (optional) uses imap username if not provided
  • SMTP_PASSWORD (optional) uses imap password if not provided
  • IMAP_FOLDER Which folder to watch for unread emails
  • SMTP_URL
  • MAIL_SENDER: Address the mail with pdf should be sent from
  • MAIL_DESTINATION: Where to send the resulting pdf
  • SMTP_PORT: (587)
  • SMTP_TLS: (True)
  • INTER_RUN_INTERVAL: Time in seconds that the system should wait between running the script
  • PRINT_FAILED_MSG: Flag to control printing of error messages
  • HOSTS: Semicolon separated list of hosts that should be added to /etc/hosts to prevent dns lookup failures
  • WKHTMLTOPDF_OPTIONS: Python dict (json) representation of wkhtmltopdf_options that can be passed to the used pdfkit library
  • MAIL_MESSAGE_FLAG: Flag to apply to email after processing.
    Must be one of: SEEN (default), ANSWERED, FLAGGED, UNFLAGGED, DELETED
  • IMAP_FILTER: Criteria to use when searching for mail to be processed. If no value is provided, a suitable value is determined based on the MAIL_MESSAGE_FLAG. See imap-tools search criteria documentation for how to specify the filter. This should be in the text format (e.g. (TEXT "hello" NEW) rather than AND(text="hello", new=True))

Docker-Compose

1. Use prebuilt image

This image is stored in the github registry, so you can use it without downloading this code repository. The image address is ghcr.io/rob-luke/emails-html-to-pdf:latest. So to use it in a docker-compose it would be something like...

version: "3.8"

services:

  email2pdf:
    image: ghcr.io/rob-luke/emails-html-to-pdf:latest
    container_name: email2pdf
    environment:
      - IMAP_URL=imap.provider.com
      - [email protected]
      - IMAP_PASSWORD=randompassword
      - IMAP_FOLDER=Paperless
      - SMTP_URL=smtp.provider.com
      - [email protected]
      - [email protected]
      - INTER_RUN_INTERVAL=600
      - HOSTS=127.0.0.1 tracking.paypal.com
      - WKHTMLTOPDF_OPTIONS={"load-media-error-handling":"ignore"}

2. Build image yourself

Open the docker-compose file and enter your details in the environment. This will run the script every minute.

docker-compose up -d

Python

Or if you prefer you can run the script manually by running these commands.

poetry install
poetry run src/main.py

Hints

Possible Errors

PayPal Mail with HostNotFoundErrors

  • try adding 127.0.0.1 tracking.paypal.com to the HOSTS env (check for missing domain in error log)
  • add {"load-media-error-handling":"ignore"} as WKHTMLTOPDF_OPTIONS option (could be the tracking pixel that is not beeing loaded
  • append "enable-local-file-access":true or "load-error-handling":"ignore"to WKHTMLTOPDF_OPTIONS if you get a file://... error
  • add 127.0.0.1 true to the HOSTS env if you get a http:///true/... error

Development

The recommended editor for development is either IntelliJ or Visual Studio Code

Visual Studio Code

For Visual Studio Code, it is recommended to use the devcontainer included in the repository. With the Remote - Containers extension installed, you should be prompted to open the devcontainer when opening the folder.

For debugging, copy the env.example file and rename it to just env. Then edit the variables inside to the required values for testing. These will be automatically configured when launching via either the debug menu or by pressing F5. The env file is included in the gitignore.

Formatting issues will cause the github build to fail. To fix formatting issues in your script, open the file and run the "Format Document" command.

emails-html-to-pdf's People

Contributors

chirmstream avatar deosrc avatar rob-luke avatar smseidl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

emails-html-to-pdf's Issues

Handling Inline Image?

I thought I had a failing installation because my first test email was not being converted.

The email had an inline image that was attached to the email. I guess I didn't know that in some cases (Gmail in this instance) an HTML image in the body of the email is also attached to the email itself.

Could there be a way to handle attached images in this manner? Perhaps it allows attachments, but only if the attachment is also used in the body of the message? That way it skips legitimately attached files but does act when the attachment is part of the email body?

My Paperless-ng(x) filters out all non-PDF attachments already, so it wouldn't import the email with the inline images anyways.

Use Alpine baseimage

Currently, the derived base image (python 3.9) is based on a default debian install and therefore is about 1,5GB tall.
We should create an image based on alpine-linux to reduce image size.

I will create an PR tomorrow probably.

Moved from Windows/WSL/Ubuntu to RPi 4b Ubuntu 64 and container fails on startup

It's not producing much error details only:
email2pdf | standard_init_linux.go:228: exec user process caused: exec format error

My compose section:

  email2pdf:
    image: ghcr.io/rob-luke/emails-html-to-pdf:latest
    container_name: email2pdf
    environment:
      - IMAP_URL=xxxx
      - IMAP_USERNAME=xxxx
      - IMAP_PASSWORD=xxxx
      - IMAP_FOLDER=paperless
      - SMTP_URL=xxxx
      - MAIL_SENDER=xxxx
      - MAIL_DESTINATION=xxxx
      - INTER_RUN_INTERVAL=300
      - PRINT_FAILED_MSG='false'

Store Documents locally

Hi,

would it be possible to add an option to store the converted PDFs locally instead of sending them via SMTP again?
I deployed this container in my Paperlass Compose stack and would like to store the documents directly in the import folder (shared folder) for paperless.

Alternatively we could post the document directly to the paperless api.

Save PDF to a folder

Hi, Thanks for a great project.
I wonder if it would be possible to simply save the new PDF to a folder I specify, compared to sending a new mail.
Best regards
Benjamin

Possible to add the email header to the PDF?

This looks perfect in combination with paperless, thanks for spending your time creating this!

Sometimes, I do not only get HTML emails but simply text emails where I would need the actual headers (from, date, subject, etc.) - is it possible to include these in the generated PDF?WKHTMLTOPDF options appear to control the PDF's layout, not the content.

Use Docker and Github Tags for releases

Hi,

since this project got some good improvements over the last days, the image should be shipped with a versioning system.
Since only the latest Tag is currently used, clients will not pull a later image because the tag name did not change.

My Recommendation:

  • create a Github Release (Tag) for every new Feature / Group of Features
  • Modify the Buildprocess to get triggered by the "onTag" action of GitHub and use the tagname as Docker tag
  • Build the latest tag with the current master (independently of current releases) branch.

generated PDF is missing the mailheader

Coming from paperless-ng and hoped to find a solution to convert mails to PDF and feed them into paperless.

A PDF gets generated, but it's only the body of the mail, the headers are missing. To handle Mails in a good way in paperless, there should be an option to include all the header data - to be able to filter for subject, sender, recipient, ...

Non-standard characters break file naming

I setup a forward to move my html only emails to an email address that this script processes. The colon in the subject line (Fwd:) breaks the file naming because files cannot contain non-standard characters, the file is saved simply as "Fwd".

Implement Delete option for Mails

It would be cool to have an option to delete the mails instead of marking them as read.
Mails I want to consume will be moved into a "Processing" folder where this container picks them up. Afterwards, they are sent to the paperless instance and can therefore be deleted to keep the mail inbox empty.

Errorhandling of Tracking Pixels

I've found another edge case:

I started converting my PayPal EMails to PDFs to import them to paperless.
for most of them i get the HostNotFoundError.
This relates to the tracking pixel in their mails.

How did I fix this?:

  1. append 127.0.0.1 tracking.paypal.com to /etc/hosts
  2. pass pdfkit options to pdfkit.from_string:
options = {
    'load-media-error-handling': 'ignore'
}
pdfkit.from_string(pdftext, filename, options=options)

Login Fails

My account has 2FA enabled on it, and I don't have the option to disable 2FA. I'm assuming that is why I'm getting this error, but I don't even get a prompt on my Microsoft Authenticator app to approve the login attempt, so it doesn't seem to be getting that far.

600
Skipping virtualenv creation, as specified in config file.
Running emails-html-to-pdf
Starting mail processing run
    raise self.error(dat[-1])
imaplib.IMAP4.error: b'LOGIN failed.'
Traceback (most recent call last):
  File "/app/main.py", line 257, in <module>
    process_mail(
  File "/app/main.py", line 102, in process_mail
    with MailBox(imap_url).login(imap_username, imap_password, imap_folder) as mailbox:
  File "/usr/local/lib/python3.9/site-packages/imap_tools/mailbox.py", line 44, in login
    login_result = self.box.login(username, password)
  File "/usr/local/lib/python3.9/imaplib.py", line 612, in login
    raise self.error(dat[-1])
imaplib.IMAP4.error: b'LOGIN failed.'

emails are not read

have set up the container and set all variables correctly, however no emails are read or processed - no log entries....where can I troubleshoot this ? I have no idea, anymore. I set the imap port to 465, no different. The imap folder i set to inbox.
Please help, i really need this working.
Thnak you

exec /app/runner.sh: no such file or directory

trying to understand what's going wrong here - even with all variables disabled and just the image starting I get this

exec /app/runner.sh: no such file or directory

Is this an arm64 error or am I missing something ? :(

Emoji characters break file naming

Continuation of #1 ... but today it was failing on emojis in the filenames (thanks Ebay and Walmart):
PDF: ✅-ORDER-CONFIRMED:-___.pdf
PDF: 👍-Your-order-was-delivered.pdf

BUG: Attachments send via Outlook.com (Microsoft Exchange) are converted to ATT00001.bin

Hi,

i tried converting and sending Documents which worked fine.
BUT when i use an microsoft exchange email server, the pdf attachment gets renamed to ATT000001.bin

Seems like this guy had the same problem and got it fixed this way:

https://stackoverflow.com/questions/52323022/pythons-email-message-library-output-not-getting-accepted-by-outlook-365-when-i

or

https://stackoverflow.com/questions/59989806/email-attachment-file-names-are-removed-with-at00001

SMTP authentication fails, Need to use STARTTLS

So from my experience the SMTP server that I use only works when I use STARTTLS. I tried the parameter SMTP_STARTTLS set to true but it doesn't work.

raise SMTPAuthenticationError(code, resp)
smtplib.SMTPAuthenticationError: (535, b'5.7.8 Error: authentication failed: UGFzc3dvcmQ6')

I have tried SMTP_TLS true and false but haven't been able to get it to connect.

Add continuous integration tests

I would like to offer a big thanks to @mirisbowring @smseidl and @deosrc for contributing code.

One thing that I think would make it easier for people to contribute and for me to evaluate PRs would be some continous integration testing. But I am not sure how to achieve this on github actions. I have experience setting up scientific software tests on GHA, but not email related things. So this will remain open until I find a way to implement this.

Response status "OK" expected, but "NO" received. (Mailbox name should probably be prefixed with: INBOX.)

Hi all

do you know this error?

600
Skipping virtualenv creation, as specified in config file.
Running emails-html-to-pdf
Starting mail processing run
Traceback (most recent call last):
  File "/app/main.py", line 257, in <module>
    process_mail(
  File "/app/main.py", line 102, in process_mail
    with MailBox(imap_url).login(imap_username, imap_password, imap_folder) as mailbox:
  File "/usr/local/lib/python3.9/site-packages/imap_tools/mailbox.py", line 48, in login
    self.folder.set(initial_folder)
  File "/usr/local/lib/python3.9/site-packages/imap_tools/folder.py", line 20, in set
    check_command_status(result, MailboxFolderSelectError)
  File "/usr/local/lib/python3.9/site-packages/imap_tools/utils.py", line 54, in check_command_status
    raise exception(command_result=command_result, expected=expected)
imap_tools.errors.MailboxFolderSelectError: Response status "OK" expected, but "NO" received. Data: [b'Client tried to access nonexistent namespace. (Mailbox name should probably be prefixed with: INBOX.) (0.001 + 0.000 secs).']

This part is the important, I think:
imap_tools.errors.MailboxFolderSelectError: Response status "OK" expected, but "NO" received. Data: [b'Client tried to access nonexistent namespace. (Mailbox name should probably be prefixed with: INBOX.) (0.001 + 0.000 secs)

But I don't use the folder name INBOX but Paperless instead. And INBOX must exist due to the fact that I cannot create it.

My compose is:

 email2pdf:
    image: ghcr.io/rob-luke/emails-html-to-pdf:latest
    hostname: email2pdf
    environment:
     IMAP_URL: "${EMAIL_SERVER}"
     IMAP_USERNAME: "${EMAIL_TO}"
     IMAP_PASSWORD: "${EMAIL_PWD}"
     IMAP_FOLDER: "Paperless"
     SMTP_URL: "${EMAIL_SERVER}"
     MAIL_SENDER: "${my_user}+paperless@${domain_WAN}"
     MAIL_DESTINATION: "${EMAIL_TO}"
     INTER_RUN_INTERVAL: 600
     HOSTS: "127.0.0.1 tracking.paypal.com"
     WKHTMLTOPDF_OPTIONS: '{"load-media-error-handling":"ignore"}'
     SMTP_PORT: 587
     SMTP_TLS: 'true'
     TZ: ${TZ}
    deploy:
      restart_policy:
        condition: on-failure
      mode: replicated
      replicas: 1
      placement:
        constraints:
        - node.labels.netreachable == false
    networks:
     cloud-edge: 

Thank you very much!

br
Stephan

Error while loading/opening URL

I tried to send a Paypal receipt email to Paperless via the Email to PDF and got the below error. I don't understand why it's trying to load this page instead of the actual email... thoughts?

email2pdf    |
email2pdf    | No attachments in: You have authorized a payment to XXXXXXXX Inc.
email2pdf    |
email2pdf    | PDF: You-have-authorized-a-payment-to-XXXXXXXX_.pdf
email2pdf    |
email2pdf    | PDF: You-have-authorized-a-payment-to-XXXXXXXX_.pdf
email2pdf    | Traceback (most recent call last):
email2pdf    |   File "/app/main.py", line 98, in <module>
email2pdf    |     process_mail(imap_url=server_imap,
email2pdf    |   File "/app/main.py", line 72, in process_mail
email2pdf    |     pdfkit.from_string(html, filename)
email2pdf    |   File "/usr/local/lib/python3.9/site-packages/pdfkit/api.py", line 72, in from_string
email2pdf    |     return r.to_pdf(output_path)
email2pdf    |   File "/usr/local/lib/python3.9/site-packages/pdfkit/pdfkit.py", line 156, in to_pdf
email2pdf    |     raise IOError('wkhtmltopdf reported an error:\n' + stderr)
email2pdf    | OSError: wkhtmltopdf reported an error:
email2pdf    | QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
email2pdf    | Loading page (1/2)
Error: Failed to load https://t.paypal.com/ts?xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, with network status code 1 and http status code 0 - Connection refused
Printing pages (2/2)                                                        ] 25%
Done                                                                        ]
email2pdf    | Exit with code 1 due to network error: ConnectionRefusedError
email2pdf    |

Email Creds in Plain Text

Is there anyway to protect the email creds more instead of having them in plain text in the docker-compose file? Would this be able to inherit the paperless email setup?

Supported email providers ?

Hi

I’m looking to use something just like this, however it seems the likes of gmail and outlook (hotmail, live) etc. are now no longer supporting basic authentication (username,password) is that your understanding too ? If so what providers can be used with your set up ?

Many thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.